Enobosarm fights cachexia in cancer patients.

A brief summary of Ostarine (enobosarm) and cachexia can be found here.

Muscle wasting is a common cancer related symptom which can begin early in the course of a patient's malignancy resulting in decline in physical function and other detrimental clinical consequences. Muscle wasting and muscle weakness are also side effects of many chemotherapy drugs. There are no drugs approved for the prevention and treatment of muscle wasting in patients with cancer.... Muscle weakness and functional limitations are highly prevalent, with 88% of NSCLC patients reporting difficulty climbing stairs, lifting and carrying 10 lbs, walking a quarter mile, or stooping, crouching or kneeling. Muscle loss is a predictor of performance status, tolerability to cancer treatment, progression free survival and overall survival. Limitations in physical function predict the ability of a NSCLC patient to tolerate chemotherapy, and patients with functional limitations are less likely to be offered treatment. Functional status is also a predictor of survival. -- GTx, Inc. (GTXI)

Enobosarm has had an extensive history of successful trials. To list a few:

  • "The selective androgen receptor modulator GTx-024 (enobosarm) improves lean body mass and physical function in healthy elderly men and postmenopausal women: results of a double-blind, placebo-controlled phase II trial". The poster is available here.
  • "A 12-week Pharmacokinetic and Pharmacodynamic Study of MK-3984 and MK-2866 in Postmenopausal Subjects"
  • The initial PR for the results of the Phase II Ostarine (MK-2866) Cancer Cachexia Clinical Trial can be seen here. Further data can be found in the prestigious Lancet Oncology, with the lead author being none other than Adrian Dobs, M.D. M.P.H., Professor of Medicine and Oncology, Division of Endocrinology and Metabolism and a Vice Chair at The Johns Hopkins University School of Medicine.

Considering that athletes are already abusing Enobosarm, there is little doubt as to whether or not Enobosarm is an active oral, non-steroidal selective androgen receptor modulator.

The details of one of the on-going P3 trials can be found here. The primary endpoints are statistically significant increases in the number of responders who have increased lean body mass, and a 10% or more improvement in physical function. These 2 separate endpoints are being compared against a placebo.

Market potential

In many of the recent conference calls, management has guided for at least $700M in peak sales in the USA alone, at a pricing of >$1500/month for 5 months.

The company estimates that the prevalence of severe muscle loss at diagnosis in NSCLC patients is approximately 50%, and 70% will lose muscles before death. Again, these estimates can be seen here from the drug profile page.

With at least ~200k stage III/IV NSCLC patients in the US, we can expect that 50% will have use for Enobosarm upon their diagnosis. Since Enobosarm will potentially be the first and only approved option to tackle cachexia in NSCLC, ~50% market capture is a fair estimate. We are left with 50k peak patients in the US alone, and at expected fair pricing, that corresponds to ~$750M in US peak sales. Note that the 50% estimate is derived from all NSCLC patients, and not just sicker stage III/IV patients.

During the Q1 2013 quarterly conference call, the CEO mentioned that the EU is generally more interested in increasing quality of life in these cancer patients, rather than getting some extra time. An EU partnership is very much a reality and would substantially increase the market potential for Enobosarm. Further details can be found on the Seeking Alpha transcript here.

Finally, cachexia does not only affect NSCLC patients, but all cancer patients. Successful P3 results this quarter will justify expansion and off-label use towards a patient population >10x larger than the currently sought-after indication.

Price targets

As stated by Cora Schlesinger, in their Seeking Alpha article covering GTx posted here:

...following TheStreet article, GTx analysts reiterated their price targets. WedBush says Outperform, PT $9, Buy on weakness; Jefferies (NYSEARCA:CRBQ) rates GTx as a Buy, PT $8, and $13- $14 on a successful P3. SeekingAlpha analyst Natty Greene has a PT of $11.

At an expected 30% royalty rate, our 12-month then post-catalyst PTs:

  • Worst-case: Barely non-inferior OS + Primary endpoints are barely met + Safety data is contentious → $6-7 and $5-6
  • Most likely case: Positive OS trend + Positive primary endpoint results → $12-14 and $9-11
  • Magical case: Statistically significant OS benefit + Positive primary endpoint results → $28 and $20

We are not likely to see the third scenario during the first look at survival. However, if initial OS results contain a low p-value for the hazard ratio, then it may suggest to the market that the possibility exists in future releases, which would make the second scenario's PT estimate too conservative.

Note that these PTs do not factor in off-label use or future expanded indications.

What to expect for survival results in P3

Given the historical survival data in the stage III/IV NSCLC population being treated with chemo, we can estimate how many patients will be alive point during the trial based off the enrollment projections and NDA guidance.

We can expect at least 40% of the patient population to have died when the results are reported in the immediate future, but still enough death events that there may very well be divergence in the OS curves between the treatment arms. However, median OS could potentially not be reached for another 5 months given the potential survival rates, and it may be too soon to see any potential statistically significant OS benefit in the Enobosarm arm.

Any positive OS trend is a very good result, as it will be unexpected and greatly enhance the marketability of the product. If there is in fact a statistically significant OS benefit, well, let's not go too overboard... Something to keep in mind is that the FDA requested the OS endpoint just to be sure that Enobososarm does not cause inferior OS. As long as there is no negative OS trend seen in the upcoming results, then the OS endpoint will have been met.

The primary function of Enobosarm when it comes to getting it approved is to increase the quality of life for patients, which will also reduces costs. Someone who is able to move around comfortably is less likely to be injured, hospitalized, bedridden, etc... expensive and dreadful prospects for patients already dealing with a lot on their plate.

Changes to the trial design

During P2b, 30% of the patients either died or dropped out before the end of the trial. Since the trial only enrolled 159 patients, a lot of statistical power was lost. To counter this issue, Gtx has designed a more refined exclusion/inclusion criteria, and enrolled 650 patients.

Some of the key differences in the trial design are as followed:

  • Multiple cancer types, from stage 2 to 4. Has been shrunk to just NSCLC III/IV. This will heighten death rate, which will accelerate death events and mean the NDA can be filed sooner.
  • Patients in P3 will be chemo-naive, and younger. Reducing drop-out rate and allowing for more flexibility on improving the relevant primary endpoints.
  • No recent history of cancer and a baseline minimum for stairclimb power, further reducing drop-out rate and weeding out patients unlikely to perform.
  • Be non-obese, which will presumably assist in achieving the LBM endpoint because they aren't likely to be LBM responders. They are also less likely to complete trial requirements if they are overweight and are losing muscle mass.
  • A reduction in the number of steps in the test, from 12 to 8. As explained by Jack Schaible, in his Seeking Alpha article here, "[by] reducing trend count, the Ph3s will be a more pure evaluations of the expected result - increased strength and acceleration. Whereas a test with 50% more trends is relatively more about endurance, momentum, and lung capacity (VO2max) - three parameters SARMs are less likely to impact. As patients fatigue, it obscures strength results. And as one would expect, lung cancer patients - many of whom were smokers - don't have great endurance."

It has been stated this year at ASCO that the drop-out rate has been drastically reduced, meaning that it is far less than 30%. For the responder endpoints in P3, any benefit is more likely to be statistically significant with less drop-outs.

P3 data is expected to be released in third quarter.

Management reported second quarterly results weeks before their usual date which is generally during early August, nor did they host a conference call.

They have subsequently cancelled at least 1 presentation during this self-imposed quiet mode in anticipation of results, which was for Wedbush 2013 Life Sciences Conference.

As of August 14th, the data is not late, contrary to some opinion makers, because management has been guiding for the third quarter for quite some time, as stated here. However, we can still expect results to be imminent considering that it will be nearly a month since the quiet period began.

Recent chart action

Starting near the end of April, market awareness of the pending Enobosarm results began to escalate, breaking through a resistance level, and thus encouraging the start of a classic run-up to the catalyst. Buying interest fizzled above $7, and then a series of myths hit the airwaves that shook out a substantial number of run-up traders and weak investors. These myths will be dissected later in this article.

During the period that the stock dropped, short interest nearly doubled from 2.3M to 4.5M. Now, about a month after the sharp correction, the stock bounced off support at the historical $4 level.

The collapse in share price cannot be entirely attributed to newly released public analysis of Enobosarm's P2 results as reported by TheStreet. It had happened to be a timely publication which coincided with historical resistance levels at around $7. What you got was a massive dump by technical traders and weak investors, combined with both the wipe-out of stop loss orders and the doubling of the short interest.

Debunking the bear thesis

Myth #1: Merck stopped their collaboration with Gtx due to disappointing Enobosarm's phase 2b results.

In Brett Chase's article "GTx Stock Will Move On Cancer Drug", he states that:

[the] companies were collaborating to advance the experimental drug Ostarine to treat muscle deterioration in cancer patients when a disagreement ultimately ended the partnership. Merck wanted to focus the drug on another indication, muscle wasting related to aging, according to Steiner's account.

While reporting 2009 year-end financial results, the CEO, Dr. Steiner, is quoted saying:

[it] was a difficult decision to dissolve our SARM collaboration with Merck. GTx's near term objective is to generate revenue so that we can transition to a self-sustaining company. Reacquiring Ostarine moves us toward this objective by allowing us to advance our lead SARM into Phase III clinical studies in cancer cachexia which is a large commercial opportunity for our company and a critical unmet medical need for cancer patients.

The CEO's comments align with what unfolded during the previous 12 months.

  1. "Merck & Co., Inc. and GTx are evaluating … Ostarine™ (designated by Merck as MK-2866) and MK-0773 for ... sarcopenia and cancer cachexia." - First Quarter 2009 Results
  2. "GTx Presents Phase II Ostarine..." results, shown here.
  3. "The GTx and Merck SARM collaboration is planning to advance in the following indications in 2009 and 2010: chronic sarcopenia and muscle loss in patients with chronic obstructive pulmonary disease..." - Second Quarter 2009 Results
  4. "GTx Receives Complete Response Letter from FDA for Toremifene..." Posted here.
  5. "Collaboration revenue for the first quarter of 2010 included the recognition of approximately $54.9 million as a result of the termination of our license and collaboration agreement with Merck in March 2010." - First Quarter 2010 Results

With the failure of Toremifene, Gtx's financial stability was threatened and so they slashed their staff to reduce operational expenses. Further, Merck wanted to explore more indications with Enobosarm. This meant more phase 2 trials, which would delay the phase 3 trials. Gtx did not have the cash flow to delay the introduction of Enobosarm into the market for another couple years. When Gtx ended the collaboration, they automatically received $55M from Merck.

In conclusion, Gtx made the choice to terminate the agreement and pursue the cachexia indication on their own.

Myth #2: "Weaker" and "sicker" patients were disproportionately randomized into the placebo arm.

If we look at figure 1, we will see most of the relevant baseline characteristics of the patients enrolled into the trial. 159 were enrolled, and most of their baseline stats and characteristics are listed.

Figure 1, from The Lancet Click to enlarge

The "weaker" patients would imply a weaker baseline stairclimb power. Listed in the table is the range and median baseline values, and we can see no evidence of stronger Enobosarm arms.

If we want to compare who had the "sicker" patients, we would look at the cancer types of all those enrolled, seen under the Safety cohort. The placebo arm has about 18-38% more IV patients in NSCLC and Colectoral subset, vs Enobosarm arms. If we compare the proportion of III/IV vs total, they are practically the same. When we speak about III/IV patients, we're talking about both NSCLC and Colectoral patients unless otherwise specified.

However, for the efficacy cohort, which contains the data that were used in the calculations for change in lean body mass and stairclimb power, each arm had about the same proportion of IV patients, while the Enobosarm arms both had >9% more III/IV patients. Thus, the trial efficacy endpoints listed all over the internet were not biased in favor of Enobosarm in terms of who had the sicker or weaker patients. On the contrary, Enobosarm arms may have had weaker patients at baseline.

In terms of "strength", we find that the data from this table suggests a rather even playing field. Sure, the median and range suggests the Enobosarm arms were slightly "weaker" in general, but these types of data are not reliable indicators without more statistical information. We will revisit the issue later in this article.

In summary, when it came to calculating the efficacy endpoints, all the arms started off with approximately the same level of strength, and the Enobosarm arms were slightly sicker. In addition, the baseline LBM values show no significant bias in any of the arms.

The results of the phase 2b trial can be seen here, and note that these are mean results.

  • "Specifically, the change from baseline in LBM for the placebo, 1 mg and 3 mg treatment groups was 0.1 kg (p=0.874 compared to baseline), 1.5 kg (p=0.001) and 1.3 kg (p=0.045), respectively, at the end of the 16-week trial."

  • "The change from baseline in stair climb power in the placebo, 1 mg and 3 mg treatment groups was 0.23 watts (p=0.66 compared to baseline), 8.4 watts (p=0.002) and 10.1 watts (p=0.001), respectively. Statistically significant decreases from baseline in stair climb time were also observed."

For a trial that was designed for the LBM endpoint, the stairclimb results are impressive. However, it is statistically misleading to attempt interpretation of raw mean results with just median baseline data. So while the trial was a success, it was merely an exploratory trial. Later we will go into more detail as to why interpretation of just median and raw mean data-points can be misleading.

Only recently has Gtx released additional data with regards to the NSCLC subset. The placebo arm had 60% more IV patients with respect to the entire arm, compared to Enobosarm. On the flip side, all the treatment arms were relatively similar when we combined III/IV patients. Keep in mind that cancer type is not a fool-proof way to gauge the strength of a patient. The gender, weight, and fitness of a patient may have more of an impact on strength. Since we are without any baseline characteristics for NSCLC patients, we cannot confirm whether or not the "sicker" arm actually has "weaker" patients. Given the small trial, and the lacking mean baseline data for the NSCLC patients, it remains just a hypothesis. However, recall that when we looked at all the patients in the trial under the safety cohort, there was no observable trend showing that the sicker arm were weaker when using the unreliable median baseline values.

Keep in mind that the trial's exclusion criteria prevents those with <6 month expected survival from enrolling, which eliminates the typical stage IV patient who may have a median survival closer to 4 months. See here and here. This analysis does not control for alkaline phosphatase, ALT/SGOT or AST/SGPT, and so we would expect median survival to significantly increase from 6 months. No matter how one analyzes the survivability of the IV patients with respect to their potential "strength" during trial, Gtx enrolled only the "healthiest" stage IV patients.

Gtx would have had all this data in hand when they designed and initiated the P3 trials. We could presume that they thought the P2b data was good enough for P3 in the NSCLC indication, otherwise they would not have terminated their agreement with Merck.

Myth #3: There is no explanation for why NSCLC was picked for P3 over Colectoral cancer.

From page 344 of the Lancet article: "We noted a greater increase in lean body mass in patients with colorectal cancer whereas we recorded the greatest increase in stair climb power in patients with NSCLC."

If we look at the primary endpoints of the P3 trial, the 1st endpoint is no loss in lean body mass, while the 2nd endpoint is >10% change in stairclimb power. As long as there is a statistically significant increase in the percentage of responders in the Enobosarm arm, the P3 trials will succeed. If we presume that Gtx looked at the data and saw that NSCLC had the best performance for increasing stairclimb power, choosing NSCLC would be obvious.

Myth #4: Colectoral cancer subset failed in P2b trial.

As far as we know, the Colectoral subset did not fail. NSCLC offered the best chance of successful P3 results and Gtx went with it.

Myth #5: One can determine the wattage baseline characteristics by dividing the mean raw stairpower change with the mean percentage stairpower change.

See figure 2, and from the Lancet report: "Absolute changes in stair climb power represented a mean of 18.0% (SD 31.1) improvement compared with baseline for Enobosarm 1 mg and 21.7% (65.7) for Enobosarm 3 mg (vs placebo, 4.8% [SD 23.2])".

Figure 2, from The Lancet

If we look at the placebo arm and divide the 2.21 watt improvement to stairpower by 4.8%, we get 46 watt baseline stairpower. Refuting this mistake can be done in many ways, a mathematical theorem, calling up management, or a 34 row long table perhaps. Brace yourself for the wall of numbers.

Click to enlarge

First things first, this is one of an incalculable number of scenarios. Keep in mind that this is not controlled to be a normal distribution, but that should not defeat the purpose of this exercise.

Here is a scenario where we got reasonably close to the mean raw stairpower improvement of 2.21 watts with our 1.52 watts, and 4.8% improvement with our 4.86%. Got the expected increase in the median values and we didn't even have to control the baseline values. All raw numbers are within the expressed range listed in figures 1 and 2. With enough computational power and controls, such as SDs, we could come up with the most likely scenario seen in the trial, but this computer can barely run a light-bulb.

Now, let's divide 1.52 watts by 4.86% to get the mean placebo baseline value of 275.8 watts. Wrong, it comes to 31.3 watts. The reason why one cannot find the mean baseline value is because each pair of raw and percentage stairpower improvement figures are scaled depending on broad spectrum of baseline values. A small baseline value with a huge percentage improvement would result in a raw improvement smaller than had a large baseline value experienced a small percentage improvement. This relationship is also why the interpretation of the P2b mean and median results is largely a waste of time given the small patient population used in the efficacy cohort.

Could have saved a lot of time had we simply pointed out that in figure 1, the lowest baseline was 72.8 watts for the placebo efficacy cohort, thus 31.3 watts is simply impossible and would raise some red flags. Likewise, the 46 watt figure making rounds is flawed for the same exact reasons.

As it turns out, if you ask Gtx management they will tell you that the mean baseline stairclimb power for all the patients in the efficacy arms were 166.2 watts for placebo, 136.9 watts for 1mg, and 151.4 watts for 3mg.

Myth #6: You can compare the percentage improvements in different treatment arms by dividing median stairpower improvement with the median baseline stairpower.

Click to enlarge

Again, the controls are loose here, accounting only for ranges and relatively similarity for the mean/median baselines and mean percentage improvement.

Take 17.42 , and divide by 155... It's not 18%, but rather 11%. When dealing with so few patients that have such a wide range on both baseline values and raw stairpower change, medians are absurdly unreliable.

Myth #7: That Gtx should be using median, not means, when assessing the P2b efficacy endpoints.

Means are just as common in clinical trials as are medians, either are accepted depending on the scenario and conditions. Medians are used exclusively when dealing with cancer endpoints because of the survival curve. Consider the impracticality of having mean endpoints in a cancer trial when there will still be survivors after 5 years. On the other hand, in a trial such as Gtx's P2b, it didn't really matter which one they picked because it was an exploratory trial. Do you think they were going to design a P3 that enrolled the same patient population as seen in the P2b? Not likely. Gtx and Merck were looking for responsive populations to Enobosarm and, as it turns out, when one uses the P3 >10% improvement to stairpower endpoint on the efficacy arms in the P2b trial, you get the following results as shown on page 343 of the Lancet article:

The percentage of participants with substantial clinically meaningful increases in physical function (≥10% increase in stair climb power) was 39% (14 of 36) with placebo, 61% (19 of 31) with Enobosarm 1 mg, and 61% (17 of 28) with Enobosarm 3 mg.

Legitimate concerns

The cash situation of Gtx is not desirable, and management should not have let it get to where it has gotten. However, we only know what management has disclosed to us. As of June 30th, they have about $32M in cash, and are burning $12.8M per quarter. The burn rate is likely to be cut substantially with the conclusion of the expensive 650 patient trials. Gtx should have enough cash until end of Q1 2014 at least, or until after they submit their NDA.

Gtx could avoid a dilutive fund-raising altogether with either lucrative partnership deals or non-dilutive financing. The extensive successful business history of Chairman Hyde suggests a capability by management to accomplish either task, but they have perhaps decided it would be best to negotiate from a position of strength with good P3 data in-hand.

Uncharacteristic of small-cap bios that have faced regulatory hardships, Gtx remains without financial overhang in the manner of warrants, convertibles, and/or ATMs, a testament to the management's ability to raise money for the company while avoiding needless damage to existing shareholders. Combine this with a lack of insider sales, and Gtx management is a confident bunch, but it could be their undoing if trial fails. It should also be stated that the insider ownership of the 3 largest individual shareholders make for a combined 28% of the outstanding shares, and that hasn't changed in years.

Upon any P3 failure, the downside is quite substantial. However, if we consider that Capesaris is another drug in their pipeline currently under-going P2, the possibility of statistically significant post-hoc subgroups, and, more importantly, the very sizable short position, then we can identify the strong possibility of a mega bounce. Refer to recent trial failures in the oncology sector, and you'll see >50% bounces. Therefore, positioning your bets leading into the catalyst to allow one to cover their losses on the bounce would be a very prudent strategy.

Special thanks to Jake Schaible, Natty Greene, Cora Schlesinger, HSChoi, and Biopeon. I'd also like to thank the Seeking Alpha editors for their assistance.

