The following article takes a closer look at the relationship between part A VOC data and statistics.

Therefore, it is not logical to assume that Voxelotor "failed" to statistically significantly reduce VOC.

Theoretically, Voxelotor could've reduced VOC by 25% but this would not have been considered "significant" if the sample size was too small.

To be a keen biotech investor, one must have a basic understanding of statistical concepts.

Preface

Global Blood Therapeutics (GBT) released HOPE part A data in June that was met with mixed feelings. Part of the mixed feelings was associated with a lack of insight into a key secondary endpoint: vaso-occlusive crises [VOC].

I have argued before that "numerical reductions" in VOC was the best-case scenario for part A. This is because "statistical significance" would've been an unreasonable goal to achieve with such a small sample size. Knowing this, HOPE part A was never designed to provide insight on whether or not secondary endpoints met statistical significance or not.

The following article will look into the relationship between data of this size and statistical significance.

What Is Significance?

Researchers use statistical tests to determine the odds of whether or not a difference is associated with chance. In medicine, it is very important to assess if a drug is actually making a difference. For example, let's say drug X was studied against placebo for the treatment of blindness. A 30% improvement in eyesight was considered a treatment success. 20 patients received drug X and 20 patients received placebo.

Results:

12/20 patients receiving drug X experienced a treatment success

9/20 patients receiving placebo experienced a treatment success

While drug X saw a better response, was it actually significant? Meaning, if we performed the same exact study several times, would we likely end up with similar results? Can we safely assume that these results did not simply occur by chance?

In medicine, a p value < 0.05 is often used as a gold-standard for significance. What does this mean?

The P value, or calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H 0 ) of a study question is true – the definition of ‘extreme’ depends on how the hypothesis is being tested. P is also described in terms of rejecting H 0 when it is actually true, however, it is not a direct probability of this state. The null hypothesis is usually an hypothesis of "no difference" e.g. no difference between blood pressures in group A and group B. Define a null hypothesis for each study question clearly before the start of your study.

Let's apply this to our situation.

Our null hypothesis would be that Voxelotor makes no difference in the reduction of vaso-occlusive crises.

In order to reject our null hypothesis, we need to generate a p value < 0.05.

T-Test For 2 Independent Means

There are a number of statistical tests out there that are used for varying reasons (depending on the type of data, amount of groups, etc.). For our example, a T-Test was appropriately used to determine whether or not a set of data is considered statistically significant.

The t-test assesses whether the means of two groups are statistically different from each other.

We need to first look at the data provided from Global Blood Therapeutics. Here's what we know:

154 patients were randomized to receive either 1500mg Voxelotor, 900mg Voxelotor, or placebo. Therefore, we can theorize that 51 patients were assigned to 1500mg Voxelotor and another 51 patients were assigned to placebo.

were assigned to 1500mg Voxelotor and another were assigned to placebo. In this study, the average patient has ~ 2.8 VOC/year. Therefore, we can theorize that the average patient has ~ 0.65 VOC every 12 weeks (HOPE part A data is limited to 12 weeks).

every 12 weeks (HOPE part A data is limited to 12 weeks). Therefore, we can theorize that the average cohort would have a total of 33 VOC (0.65 multiplied by 51).

We will assume the placebo cohort has 33 VOC (no effect on VOC):

Placebo: 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 2, 1, 2, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 2, 0, 0, 1, 0, 0, 1, 1

The data above provides insight on the number of VOC experienced by each patient (n=51). Each number represents the amount of VOC experienced by each patient within 12 weeks. The sum of each VOC totals 33.

Let's assume the Voxelotor 1500mg cohort had a "numerical reduction" in VOC (as management stated was the case). Let's say the Voxelotor arm experienced only 24 VOC events:

Voxelotor 1500 mg: 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 2, 1, 2, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1

In total, patients in the Voxelotor arm experienced a 27% reduction in VOC. Pretty impressive, right? Endari (recently approved to prevent VOC), for example, reduced VOC events by ~ 25% in a phase 3 trial (from ~ 4 VOC/year to ~ 3 VOC/year).

But is the 27% reduction in this example considered "significant"? When you plug in the numbers, the answer is NO.

Let's take a closer look:

Figure 1: Treatment 1 is the Voxelotor arm while Treatment 2 is the placebo arm. For both arms, every patient is listed, represented by X, according to the number of VOCs experienced within 12 weeks (0, 1, or 2 VOCs within 12 weeks) (Source: Social Science Statistics)

Figure 2: Note, both arms included 51 patients, represented by N1 and N2. Notice, M(2) in the placebo cohort is 0.65 (mean number of VOC episodes) while M(1) in the Voxelotor cohort is 0.47 (Source: Social Science Statistics)

Figure 3: Although the results are elegant, they are not considered statistically significant, as the p value is > 0.05 (Source: Social Science Statistics)

Based on this statistical test, even though Voxelotor reduced VOC by over 25%, we have to accept the null hypothesis that Voxelotor doesn't make a difference in VOC.

Why is this the case despite a, seemingly, robust reduction (25%+)?

It's simple. Sample size influences p values:

Sample size strongly influences the P-value of a test. An effect that fails to be significant at a specified level alpha in a small sample can be significant in a larger sample.

Taking this into account, let's add, for example, 40 patients to each arm and redo the statistical test. The placebo arm will report 20 additional VOC events in 12 weeks (50%). The Voxelotor arm will report 16 additional VOC events in 12 weeks (20% reduction compared to placebo). When more patients/events are plugged in, the results suddenly become "significant" with a p value of 0.04. We didn't do anything (we continued to assume Voxelotor reduces VOC by 25%) but add more patients/events.

Summary

It is possible that Global Blood Therapeutics had the same exact data as the data I presented above (25%+ reduction in VOC). However, as we witnessed, even a 25% reduction was not considered significant in our test because of a small sample size. Instead, we can state that the Voxelotor arm had numerically fewer VOCs compared to placebo. More VOC events are required to claim "significance" for a total VOC reduction of 25%.

When interpreting data, it is vastly important for biotech investors to have a basic understanding of statistics. An investor without a basic understanding of statistics may have read the HOPE part A data as negative because it didn't reach "statistical significance". However, this was quite possibly the fault of a low sample size. Also recall, HOPE part A was never designed to seek statistical significance in secondary endpoints like VOC. That's what part B is/was designed for.

For now, it appears shares of GBT are trading at discounts that are likely the result of misunderstandings. If Voxelotor is a drug that can significantly reduce hemolytic anemia while provide numerical reductions in VOC (even if it's not considered "statistically significant), it is more than likely to provide clinical benefits in sickle cell disease patients. Despite the weakness in share price, GBT remains a conviction buy with a price target of $75 to be reached within 12-18 months.

