The following proves Mr. Feuerstein's assessment of GTx, Inc. (NASDAQ:GTXI) enobosarm Ph2b data, reported by Dobs et al. in Lancet Oncology, is invalid, by examining in detail his faulty math and logic.
It also provides the Reader with the ACTUAL mean values which Mr. Feuerstein failed to estimate.
"When you come at the king, you'd best not miss"
- Omar Little, The Wire.
Proof of these errors may prove embarrassing to Mr. Feuerstein. But my aim is not to embarrass. Several have shared much of the following with him already, and I repeat here several key parts that have been shared with him in private where the primary request was to make corrections. While he has confirmed receipts of these messages, he has refused to correct even his simplest of his typos. Evidence can now be shown Mr. Feuerstein has been made aware of key flaws with the thesis as early as July 11th, however, instead of undertaking his duty to set the record true, he has engaged in feeble defensive maneuvers to censor exhibits of divergent facts, smear reputations, and now constructed a biased poll that, if anything, is only further evidence of the degree to which his failure to correct the record has damaged the market's opinion on the reputation of GTx as a company and its existing data on enobosarm. In spite of market perceptions, the facts remain Mr. Feuerstein has failed to undertake even the simplest efforts to independently assess the veracity of his short seller source's thesis prior to posting his piece. But this err could have been chalked up to sloppy journalism. What is far more serious now is the mounting evidence that he is attempting to cover up what he knew was an error and when he knew it, thus exhibiting a knowing and reckless disregard for the truth, and callus to the ongoing damage his effort has had to the shareholders of GTx.
Mr. Feuerstein's tactic is now clear. He seeks to delay. His last hope is to be bailed out by poor Ph3 enobosarm results. However, res ipsa loquitur, such cannot save him.
"If the stories are controversial, it's all the better, because those are usually the most fun." - Adam Feuerstein
And so, to Mr. William Inman, Editor in Chief of TheStreet.com (TST), I respectfully ask you to personally investigate the "fun" your Mr. Feuerstein is having at the expense of so many, including your own company's reputation. If you find his story based on error, I request you immediately retract his defamatory effort.
Backgrounder:
In the July 10th opinion piece entitled, "The GTx Cancer Muscle-Wasting Drug Studies Will Fail. Here's Why", TheStreet.com journalist Adam Feuerstein (hereinafter "AF") ambushed Memphis-based biopharma GTx, Inc., breathlessly claiming the firm "manipulated" their Ph IIB results with enobosarm (also known as GTx-024 and Ostarine®).^{1} His claim caused the stock price to crater 25.4% that day alone. From $7.00/sh where it traded just prior the release of his piece, to close the day at $5.20/sh, with most losses occurring in mere seconds of when it post hit the wires. Price action at technical support levels, combined with option transactions in the sessions prior and following, have all the tell tale tracks of a coordinated of a bear raid. And this makes perfect sense as AF admits he got his thesis from one of his "favorite fund managers" who is short, and via Twitter alerted followers of his piece before he posted it.
With no material news since, the stock has bounce around and drifted still lower, settling two days after AF's August 7th poll of his cherry picked panel of anonymous investment professionals, the price per share hit a new swing low of $4.01. This is 42.7% deduction or $189 Million of GTx's market cap from when AF published his thesis on no other news, or almost 3 times the entire market capitalization of TheStreet.com.
For those not yet following this Grisham-esque melodrama, enobosarm is a Selective Androgen Receptor Modulator (SARM), a new drug class in late stage clinical development. SARMs are oral, nonsteriodial testosterone mimetics with true anabolic effects on both muscle and bone, but are not substrates for the steroidogenic enzymes that convert testosterone into androgenizing DHT or feminizing estrogen, and furthermore are also thought to have differential nuclear receptor cofactor recruitment leading to enhanced tissue selectivity.^{2}
SARMs promise to protect muscle in the frail, yet (in exaggerated laymen's terms) not turn Grandma into Grandpa, or Grandpa's prostate into a softball.
As you've no doubt heard by now, enobosarm is being evaluated in two placebo (PBO) -controlled, randomized, double blind Ph3 studies for the prevention and treatment of muscle wasting (cachexia) associated with advanced non-small cell lung cancer (NSCLC) know as POWER 1&2. Cachexias, and the age-related version of frailty known as sarcopenia, are extremely common, growing in prevalence and substantially unmet medical needs. As such, enobosarm has been awarded Fast Track status by the FDA, and has first to market potential with top line results of expected this (3Q13) quarter.
The study AF claims "GTx manipulated" is the "Effects of enobosarm on muscle wasting and physical function in patients with cancer: a double-blind randomised controlled phase 2 trial." by Dobs et al., published in the April 2013 issue of the peer-reviewed Lancet Oncology, which reports the successful achievement of the primary endpoint; a quantitative measure of skeletal muscle known as Lean Body Mass by Dual-Energy X-Ray Absorptiometry (LBM by DXA). A close read of AF's piece shows he's not just calling into question the integrity of this small company, but equally all "the authors of the Lancet Oncology paper", thus including lead author Adrian Dobs, M.D. M.P.H., Professor of Medicine and Oncology, Division of Endocrinology and Metabolism and a Vice Chair at The Johns Hopkins University School of Medicine.
His work is actually a thing of beauty, as it is quite a skillful tour de force of Aristotle's classic fallacies of logic. And since it appeared, we've seen a parade of brave contenders aiming to bring it down. But cataloging errors makes for dreadful reading.
So, like el toro en el plaza, AF's thesis stands. Bloodied con banderillas. But defiant.
It's time for a new approach.
We have watched the lancers harass, and from his reaction have spied the weakness he's long sought to hide. As the bugle sounds, I dedicate this effort to those who have come before, and propose to you, Dear Reader, that with three passes with the cape, and then two hard stabs of logic and math to its very heart, AF's thesis will be proven invalid.
But whom am I to challenge the Feuericane?
Fair question. No one really. AF himself predicted all would ignore me. Perhaps he's right. I don't have his hoard of Twitter minions. Nor do I work for a company co-founded by Mr. Jim Cramer of CNBC's Mad Money fame. (Big fan, Mr. Cramer. Please keep fighting for us little guys.) What I have, if it counts for anything anymore, is over a decade working on SARM R&D ... longer than AF has been a blogger covering all biopharma. Too be clear, I've never worked for GTx or any of their vendors. In fact, I'm an odd defender of GTx, as I was their principle competition. At Ligand Pharmaceuticals (NASDAQ:LGND) I was an architect and alliance manager of their 2001-6 SARM R&D collaboration with TAP Pharmaceuticals. Then I was the President of Astrenia Therapeutics, a private stealth-mode spinout co, focused exclusively on the clinical development of next gen SARMs competitive to enobosarm. And now I make my living performing biopharma business and corporate development and new venture consulting, as well as managing my family's equity investments.
But really, I'm just one "little guy" who sees the Emperor has no clothes.
Who am I? It really should not matter, as you will see next.
OK, my stretchy pants are on... time for the show.
Tandas #1: The Fallacy.
Recall upfront AF states his source for his bear thesis is the "same fund manager who was most recently short Ziopharm and Celsion in anticipation of negative clinical trial data. He was correct both times." Logicians call this trick the "Fallacy of Genesis"; the inappropriate use of source to defend or refute logic. It's entirely irrelevant.
Yet for those lazy enough to use this device understand you do have a choice of which source to believe; a Vice Dean of the Johns Hopkins School of Medicine who has no potential for gain either way, or an anonymous short that may already be using options to collar ill got gains.
Tandas #2: The Misdirection.
AF's thesis has a challenge from the start. Dobs finds positive results and the study has only one primary endpoint.^{3} But success doesn't fit his short's preordained narrative.
No bother. With a bit of misdirection our skilled weaver of words woos your eyes off success, and devotes the vast majority of his missive to discussing minutia of something even Dr. Dobs didn't know was in her study. A co-primary endpoint! Voilà!
It stretches incredulity beyond the breaking point when a seasoned "columnist" plays fast with the truth and says not once, but twice, that a study with just one primary endpoint actually as two. "Let's take a closer look at the data on the co-primary endpoints of the enobosarm Ph2b study." And elsewhere, "You should see now why GTx chose to analyze the Ph2b co-primary endpoints using mean values."
You don't need the Dobs article to fact check this one. Check this fact right from the public abstract. Still think AF is correct? Perhaps GTx changed endpoints during the study, you say? Then look at the archive listing on clinicaltrials.gov at the start of dosing:
Primary outcome |
Measure: To assess the efficacy of GTx-024 on total body lean mass. |
Secondary outcome |
Measure: To assess the effect of GTx-024 on body weight, muscle function and total body fat mass. |
No way around it. AF's mistaken.
And so he is basing his contrary opinion on results from an under powered yet statistically significant result in an exploratory endpoint, while ignoring the powered and statistically significant primary endpoint result.
Sure, assessing muscle function via a stair climb test is a co-primary endpoint in the Ph3s, and there it is powered to do so. I've long been on the record to say the enobosarm Ph3 trials are not a sure thing. Still, one of my largest concerns was their ability to get less than the 30% ITT drop out rate, which is a key assumption to reach the sample size required for the target confidence level. I thought they might rather see the 35-40% dropout as seen in Dobs. Luckily for GTx, as reported at ASCO this year, the blinded dropout rate was in fact lower than 30% ITT in each of the two studies.
Further optimism can be drawn from the design of the new stair climb endpoint in the Ph3s, which is significantly improved over the Phase 2/2b design. Dobs et al. used a 12 step stair climb test, but the Ph3s wisely swapped to an 8 step version. A minor change? Actually, no. By reducing trend count, the Ph3s will be a more pure evaluations of the expected result - increased strength and acceleration. Whereas a test with 50% more trends is relatively more about endurance, momentum, and lung capacity (VO2max) - three parameters SARMs are less likely to impact. As patients fatigue, it obscures strength results. And as one would expect, lung cancer patients - many of whom were smokers - don't have great endurance.
Tandas #3: The Innuendo.
Pssst...Did you hear the one about Merck (NYSE:MRK) "abandoning" GTx?
I could write a dozen pages on this one alone. For the sake of brevity, I'll include a quote from Merck's SARM team, direct from their poster on enobosarm (MK2866) presented at ENDO in June of 2010, several months after the collaboration ended.
"MK-2866 demonstrated acceptable safety and efficacy for continued development."^{4}
Merck was always focused outside of cancer cachexia, but in this collaboration the call was not Merck's alone. Corporate terminated the project, rather than be on the hook for the cost of the Ph3s, a victim to R&D rationalization after the Schering-Plough merger. So Merck went back to the well with a weak SARM (MK-0778) in female sarcopenia, and in a recently published Ph2a showed 50mg BID for 6 months failed to provide any functional benefit over PBO, using the non FDA-favored test, the 1-RM BLP.
However, while not a side by side test, Fig 2 of the Merck ENDO10 poster (below) shows enobosarm 3mg QD has efficacy in this same functional test in that same population at 3 months (n=22), on par with 125mg QD of MK-3984, a prior Merck SARM terminated for tox.
(click to enlarge)
SOURCE: Marcantonio, et al. ENDO 2010.
Three passes of the cape.
Three redirects made.
And not one fatal blow to AF's thesis. Yet.
The Main Event. El Tercio de Muerte:
"This is where it gets interesting..."-AF
Feuerstein's Thesis on GTx's Enobosarm Phase 2b Data is Invalid.
Here's Why.
Even AF admits his ESTIMATES of mean stair climb power (SCP) at baseline (Pending:BL), a value not reported in Lancet Oncology, is a logically necessary condition on which rests the entire foundation of his simplistic linear-structured bear thesis.
- But for his equation to deduce mean SCP at BL, our bear has no estimates of mean SCP at BL.
- But for his estimates of mean SCP at BL, and more so a material difference between which favors the active arm, our bear has no rational basis from which to make his claim of an imbalance SCP BL in Dobs.
- But for his claim of an imbalanced SCP BL, our bear has no rational basis from which to make his claim that the positive results reported by Dobs et al. should be rejected in favor of his own null hypothesis that the study is flawed by a false positive/type I error.
- But for his claim of a false positive, our bear has no rational basis from which to make his (shrill and otherwise utterly unsupported5) claim that "enobosarm is basically a placebo".
- But for his claim that "enobosarm is basically a placebo" our bear has no rational basis from which to make his leap to his conclusion and predict the null hypothesis will be the outcome of the pending Phase III results.
Therefore, logic dictates that IF;
A. AF's equation used to ESTIMATE his means SCP at BL can be shown to be erroneous, or
B. the ACTUAL mean SCP at BL of the relevant arms can be shown not to have a statistically significant differences,
Then, AF's bear thesis must be rejected for the finding of logical invalidity. This work will accomplish both. In doing so, by attacking its foundation, his Jenga-like thesis will collapse.
Stab #1, La Estocada:
A. Challenges to AF's Estimate.
In my first read of AF's piece, the most glaring of his many misstatements was his estimate of 46w for the mean baseline power for the PBO-e group. In a blink, I knew something was a miss.
AF's Estimate of 46w mean is suspect.
46w is very light, very low, very slow or all three.
To those familiar with the test, 46w is a notably low result by itself - and even more so as a mean. Consider the power equation:
SCP = (9.8[m/s^{2}]× stair height[m] × weight[kg])/ stairclimb time(s).
For sake of argument, if one assumes typical 12 step stair height is about 2.4 m, as well as a "rule of thumb" weight of approx 70 kg., this yields about 1650 joules. For this set up, a power of 46 watts equates to about 36 sec task time ... about 3 sec per step. Now go to a flight of stairs and walk this slow. Yes, these are sick cancer cachexia patients, but even so this rate is considerable slower than one would expect for the mean.
AF's est. 46w is incongruent with other data in Dobs.
Digging deeper, one would think in a treatise attempting to prove a baseline imbalance, the author making such a heretical claim would provide his readers the data of the actual demographics and clinical characteristics at baseline. But oddly, AF doesn't. So I will.
Source: Table 1. Dobs et al., Lancet Oncology
If he did, it would be all too easy to see more red flags with his 46 w mean estimate, as it is incongruent with the range of raw values at baseline reported by Dobs. See Table 1 (above), specifically second column, down to the stair climb power grouping to Stairs 1-12 (watts). The median baseline is reported as 150.2w and in the brackets, the range of raw values reported as 72.8 to 442.3 w. Ask yourself; is it reasonable to have an estimate of an average (mean) for a dataset, that is lower than the lowest value in that dataset?^{6} AF's estimates, though not yet proven in error, must be seen as highly suspect.
In the absence of the actual means, to reject the AF estimate requires one to either, a) confirm the invalidity of his equation, or b) confirm a material arithmetic error with at least one of his two BL estimates.
1. The equation AF used to arrive at 46w is erroneous.
Unfortunately, AF did not provide his equations in his piece. In it, he only said "these numbers can be calculated using the data available in the Lancet Oncology paper" and more generally "I don't want to get too bogged down into mathematical concepts".
As I'm one of those types who is often willing to get "bogged down into mathematical concepts" on July 11th I began requesting AF make his equations public. At first he ignored my requests, so I and several others kept at it. Remember what I said about watching the lancer harass? One of the most effective of the lancers, and the real hero of this story is Twitter user @biopeon. He first appeared in our Corrida the day after AF dropped his piece and was the first to report the mean discrepancy to AF.
On July 11th, @biopeon tweeted:
"$gtxi AF "calculated" the placebo group's MEAN BL as 46 watts. Uh.. the study states range is 72.8 w - 442 watts. 46w = #MATHFAIL"
After about a half dozen such tweets, AF's eventually totted out a weak defense;
On July 12th, @adamfeuerstein replied to @biopeon:
"Can someone please double check this math: 2.21/.048 = 46 @biopeon calculator appears to be broken. Thanks."
Seconds later Patrick Crutcher, Cheif Research Analyst at Chimera Research Group (who is working on his Ph.D. in Statistics at UCLA and otherwise is someone to watch, even if he harbors Feuerstein fanboy tendencies) confirmed the obvious. "@biopeon that's right" Yep. 2.21/0.048 = 46.
After which @adamfeuerstein arrogantly tweeted:
"@biopeon You can go away now. You have no legit rebuttal, just WAH WAH WAH!"
And a bit later:
"@biopeon For placebo patients: mean change in stair climb power 2.21 divided by 4.8% mean improvement equals mean baseline of 46. $GTXI"
...however before @biopeon could reply, AF blocked @biopeon from responding to AF on Twitter and censoring from that point on to any other of AF's tweets.
Obviously, @biopeon was not questioning the AF's division. Just the veracity of the result and the algebra behind it.
But with these tweets, AF let slip his equation.
AF's est. of mean SCP at BL for PBO= 2.21/0.048 = 46w
To get his estimate for the PBO - efficacy group, AF took 2.21 [the mean of the absolute changes in power (Table 3, shown below)] and divided that 4.8% [the mean of the percent changes for each patient from BL (reported in the text^{7}on pg. 340)].
(click to enlarge)
Source: Table 3. Dobs et al., Lancet Oncology
If one glances quickly, AF's simple equation is seductive. He thinks he's dividing an absolute mean increase from baseline by the percentage that equates to that absolute increase over its baseline. And if that was that case, it would yield him the baseline.
While he thinks this is but ninth grade algebra, he fails to realize he's comparing apple and oranges. What his equation is actually doing is taking the mean of absolute change from baseline for the population, divided by the mean of percentage changes for each individual (not the percent change of the mean for the population). Anonymous Twitter user @biolong has written an elegant hypothetical example8 using AF's equation, which shows why the algebra behind it does not compute.
In general, AF's original failure is in reading comprehension and arrogance not to both check his work with the authors. But it gets worse for AF. To make his baseline imbalance case, his thesis requires an accurate estimate for BOTH mean of the PBO-e group and the 3mg-e group. We've already shown he uses a faulty equation to get his means, so with this result alone, his thesis must be considered suspect. But let's now examine his estimate for the 3mg-e group.
2. AF has an arithmetic error in one of his BL estimates (3mg).
With his above Tweet, AF disclosed the (now discredited) equation he used to estimate of the PBO-e mean for SCP at BL. And while he has now refused to confirm it is the same equation he used for the 3mg-e group, I can find no logic that would suggest he used a different equation, just different values for its variables.^{9}
If so, it would be:
AF's est. of mean SCP for 3mg at BL = 16.81 / 21.7%,
again from Table 3 and the Dobs text, and if so......the math yields 77.4 watts! Not the AF's 80watts.
And yet both are wrong, given the equation is wrong.
The above proves both AF's equation for stair climb power at baseline, as well as his estimates suggesting his imbalance are erroneous. And thus his thesis can now be declared invalid.
Without accurate estimates of each base line mean, he can not make his claim that they are different and thus imbalanced. Without accurate estimates of each base line mean, he can not make his claim that there is a skew. Without claims of baseline imbalance or skew he can not claim a type 1 error. Without claims of a type 1 error, he can not claim the statistically significant results are in fact a false positive. And finally, without his false positive claim, he has no rational basis from which to make any predictions on the Ph3s.
quod erat demonstrandum
Oh, but it gets better still!
Stab #2, El Descabello:
B. The ACTUAL Means for SCP at BL.
Even with the AF's logic and math now bleeding out before your eyes, in the absence of the actual means from Dobs it was still theoretically possible there was a mean imbalance. Yet any statistician worth his "piled higher and deeper" should care less of the mean descriptives any way, given the reported balanced median is actually the appropriate measure of the central tendency of absolute power improvement given the non normal distribution of values. Such values are typically not normally (Gaussian) distributed (two tailed bell curve), but more Poisson (one tail to the right). For this same reason, the appropriate measure of statistical significance is the exact paired Wilcoxon - the method used by Dobs.
This is where I thought our tale would end (mercifully). At least until the Ph3 results were released.
But then, in a post which briefly appeared in the comment section under AF's original article - but was later mysteriously deleted - @biopeon posted what he claimed were the ACTUAL means for SCP at BL from the company and suggested AF call to confirm for himself. While that post was removed, days later @biopeon essentially repeated the same in the comment section of a different article which openly questions AF's ethics in this matter. An article which AF has also tried to get this third party to remove.
But July 23rd, unable to stop himself, our social media addicted adamfeuerstein replied:
"@Biopeon I see my recent favorite Internet troll stalker has found his way to this web site. Biopeon has a hard time accepting the truth so he spreads his own lies around the interwebs. He's ignored everywhere. Have a nice day!"
Now I'm not one to trust the comments of some anonymous Tweeter, short or long. But I'm also not one to ignore the promise of data that could double check my work and tell me once and for all if my proof on the invalidity of AF's thesis is right or wrong.
And so I did as @biopeon suggested. I called GTx. And I called Dr. Dobs too.
I few minutes later I got a return call from GTx's Marc Hanover, Pres. & COO with Mayzie Johnston, Pharm.D., VP, Medical Affairs. Dr. Johnston patiently answered my all my questions and confirmed Dobs Table 1 above is correct. She then confirmed as I'd already concluded that one could not accurately estimate the means using the results in Dobs, but she was happy to give the actual means to me and stated the authors have long given them to all who've asked. She also stated GTx has no record that AF ever called to fact check his story.
So here are the ACTUALS from GTx:
Means for Stair Climb Power at BL in Efficacy Subgroups.
PBO: 166.2 watts (not AF's 46 w)
1mg: 136.9 watts
3mg: 151.4 watts (not AF's 80 w)No statistical difference noted between the means for SCP at BL between efficacy subgroup, except between the 1mg and the PBO group (p=0.029).
GTx confirmed values from the Phase 2b study
Source: Personal Communication.
This effort has now shown, using both methods described, that AF's thesis is invalid and his mean estimates are utterly erroneous. Thus @biopeon is correct. Still don't believe? Call GTx or Dr. Dobs for yourself.
And yet - look at that - there is a statistically significant baseline imbalance at the mean, between the PBO and the 1mg! The PBO has a "head start" over the 1mg. Opps. This is the exact opposite of what AF's flawed "head start" thesis requires to have merit. Of course, it's only with the 1mg dose, which the Ph1 showed to be an inferior dosage^{10}, and is not relevant to the Ph3's.
What's more, Dr. Johnston confirmed that these means, combined with the absolute mean improvement value reported in Dobs Table 3, one could calculate the final mean SCP results. By my math:
Means for Stair Climb Power at Final in Efficacy Subgroups.
PBO: 168.41 watts (166.2w + 2.21w) 1.3%
1mg: 151.16 watts (136.9w + 14.26w) 10.4%
3mg: 168.21 watts (151.4w + 16.81w) 11.1%
As I see it, it's unlikely there is any statistically significant difference at the final means. To which I say, who cares. Again, mean improvement of the population does not tell this story, even if there is a material difference. To me the mean of the percentage improvements for individuals is more interesting, and already reported in Dobs as 18.0% with 1mg and 21.7% with 3mg and just 4.8% with PBO. And again, this was an exploratory endpoint in a widely varied mixed population, using the inferior 12SSC test and not powered to confidence to show significant results. And yet it did.
One might ask why Dr. Dobs did not report means at BL. Or the modes, for that matter. When I asked, she stated that given the distribution of the values, the median is the most appropriate descriptor.
Given the wide range of the raw values at both baseline and final, a strong case can be made that ALL measure of central tendency (mean, median, mode, or any of the 10 or so others) are woefully inadequate to tell the story. The real possibility of responders and non-responders in each arm introduce the plausibility of a bimodal distribution. Yet, even without an outlier analysis, the actual results in Dobs proved to be significant using the appropriately applied exact paired Wilcoxon method. This works as the Wilcoxon doesn't depend on any measure of central tendency. It draws its conclusion on the actual raw results. Which is why it is used here, and why the FDA requested the Responder Analysis method for the Ph3s.
As for Dr. Dobs, though AF's accusations were an obvious annoyance, she seemed to take it all in stride and quipped something to the effect, "This poor fellow must not have any experience with statistics or clinical trials."
Quite insightful as she hadn't even seen AF's bio.
In Conclusion, the above proves that AF's opinion piece is based on blatant errors and faulty logic. While he is entitled to his opinion, he is not entitled to his own facts or math. Therefore, its existence, without even the smallest correction, is ongoing damage to the reputation of GTx and enobosarm, but also that of AF's employer, TheStreet.com.
Caution: I am long on GTXI as I believe the probability of ultimate FDA approval with enobosarm is now far better than the equity market currently believes. To me, this has all the making of a worthy speculative market dislocation situation. But success in POWER 1&2 is by no means a sure thing. AF's opinion, even if his thesis could not be more worthless if he rather based it on a coin flip, still could come to pass. For the most part, I believe the stock analysts who cover GTx have a far better grasp of the issues surrounding the enobosarm trials. Several have since reiterated their recommendations and bullish predictions of the Ph3 studies following AF's piece. I'll just add that while AF thought he saw a PBO effect that he felt was too weak (thus the type I error), to my eyes and that of cachexia experts I spoke with when this data was first presented at the 2009 ENDO meeting, Dobs reports a PBO effect that is surprisingly strong. Better than many other studies in cancer cachexia that often show 1 to 2% loss of LBM per month. In Dobs, LBM did not materially change, and average power actually increased though not significantly, which is not in keeping with many other studies. As such, part of my bet is based on my expectation of reversion to the mean for a lower PBO response rate.
Thank you for reading. All comments welcome.
-----
El Paseíllo. (Further reading and acknowledgements.)
Banderillero #1 was clearly Wedbush's analyst David Nierengarten who on July 11th was the first analyst to come to GTx's defense in a short note^{11}, reiterating his pre-data GTXI $9.00 price target, and the first analyst to publicly state that AF's assertions were "unfounded". While not addressing the math or logic errors, his most effective blow was to lay bare that AF had little grasp of the Ph3 design on which he was opining. After Nierengarten's work, the price of GTXI recovered somewhat. At least until AF replied in his July 12th "Mailbag" declaring Nierengarten's defense "weak" and "not persuasive" (yet failing to discuss even one of Nierengarten's points). Instead AF devoted the rest of his retort to impugning the FDA requested Responder Analysis statistical method, calling it "one of the tricks companies use to rig the analysis of a clinical trial", though admitting its use was news to him, despite it being public for several months.
By the way, anyone with experience with the responder analysis method already knows how ridiculous it is for AF to lambaste it as he does. It is the FDA - not the sponsors - who are encouraging use of this method. And yet, specifically in cachexia, it does hold promise to generate the composite endpoint the agency seeks to show both muscle well as a functional benefit, linked to clinically validated cutoffs of effect. But more to the point, industry has actually made an extensive effort to resist use of Responder Analysis. As evidence, I point to the official PhRM position paper which aims to stem the tide of FDA requests to use it more. PhRM is cool on the method, in part, as it requires taking a statistical hit to get similar levels of confidence for an endpoint, requiring larger more expensive clinical trials.
Banderillero #2 is a newcomer to the ring. Dr. H. S. Choi, a practicing physician and, like me, writing his first article for Seeking Alpha. While his instablog was not widely picked up as at the time he laid out some of the qualitative reasons for rejecting AF's baseline imbalance theory. If reading his piece, I suggest one starts with the comment section first, as he corrects a small but key error (he swapped median to mean) which requires him to withdrawal his point #6, and while the text of his #7 is impacted by the error, this effort shows he is factually correct. Dr. Choi, I look forward to your next article.
Banderillero #3 is a tough one to award to one person, so it goes to the rest, as in the last few days there has been a rush of articles on this topic. Cora Schlesinger's piece is a great read, particularly as she makes mention of Kenneth Fearon's review which accompanied Dobs paper in the Lancet Oncology. For those who do not know, Dr. Fearon is considered by those in the frailty field to be the principle thought leader on cancer cachexia world-wide. For example, Dr. Fearon is the lead author of the 2011 definition and classification cancer cachexia consensus effort. And in my opinion there is no better expert to give a review of the Dobs work than the guy who has spent much of his career trying to untangle the Gordian knot that is the optimal way to treat cancer cachexia. Here, I'll also mention Jefferies analyst Biren Amin July 17th brief note where he reiterates his $8 pre-data price target, as well as fine effort by Natty Greene's July 26th article focused on a Merck collaboration rehash and various predictions of the Ph3s results. I've also already mentioned the note of @biolong, and the July 20th and 21st articles by Kevin McKenzie, which raise questions on AF's ethics. (Not my thing. I'm more about laying bare the facts and allow each to judge for them self. I merely note others are looking into this.)
But to me, top honors goes to @biopeon, which is actual an very apropos handle for our Corrida analogy. For it is a special peon called the Puntillero (not the Matador) who skillfully employs a small dagger to administer the coup de grâce.
So, while AF has correctly suggested the following element is not central to his now invalidated thesis, I close by giving the honor to @biopeon, and quote him as he fillets the last of AF's math errors:
"Adam claims: "The median improvement in stair climb power for placebo patients: 7.5 percent. The median improvement in stair climb power for 3mg enobosarm patients: 8.3 percent." Again-- these numbers are completely faulty.
To arrive at this faulty improvement percentage Adam divided the avg. median change data in Table 3, by the RAW baseline data in Table 1. For example: 12.84/154.5=8.3%. In a vacuum, this simple calculation makes sense. But if you have any knowledge of statistics for a clinical trial, then it's naive at best.
Adam's problem is that his data from Table 3 (avg. median) can't be directly computed with the RAW data from Table 1. In the data from Table 3-- the average (median) percentage change is observed among individuals. Then each individual's percentage change from that subject's baseline is computed, then within an arm, the average (median) of these individual percentage changes are computed.
This average (median) change cannot be divided by the average baseline (raw wattage) of the entire group (in Table 1) to ascertain a raw wattage improvement that corresponds to this percentage change.
Adam is not a mathematician/statistician and he should not be masquerading as such."
¡Ole!
It remains a mystery why AF appended this little side adventure at the end of this thesis. Dobs already reports the relevant metric for improvements in this secondary endpoint. While AF again commits a similar error of algebra, it doesn't fit the structure of his logic, other than perhaps being disembodied anecdotal (yet erroneous) support. Again, measures of central tendency are not the goal with the Ph3s.
=====
1 To quote; "Remember above when I said GTx manipulated the phase IIb data using mean values (instead of medians) to make enobosarm look more effective than it really is?"
2 Expert Opinion on Drug Discovery, February 2013, Vol. 8, No. 2 : Pages 191-218
3 "The evaluable efficacy population included 100 participants (placebo, n=34; enobosarm 1 mg, n=32; enobosarm 3 mg, n=34). Compared with baseline, significant increases in total lean body mass by day 113 or end of study were noted in both enobosarm groups (enobosarm 1 mg median 1·5 kg, range -2·1 to 12·6, p=0·0012; enodosarm 3 mg 1·0 kg, -4·8 to 11·5, p=0·046). Change in total lean body mass within the placebo group (median 0·02 kg, range -5·8 to 6·7) was not significant (p=0·88)
4 ENDO 2010: "A 12-week Pharmacokinetic and Pharmacodynamic Study of MK-3984 and MK-2866 in Postmenopausal Subjects", by Marcantonio, et al. Available on request.
5 It is obvious that AF has not reviewed the other GTx studies with enobosarm. Or even the YouTube videos and blog reports of body builder taking black market Ostarine. While not the most potent compared to some banned steroids, it is certainly no placebo.
6 The footnote reports "not all patients had a baseline value", which make sense as the group is defined by the primary endpoint. So it is a reasonable hypothesis that null baselines might be the reason for AF's low mean. Backtesting for worst case (1 at 442.3, rest at min val of 72.8 or 0 for n=36) shows need zero for half (n=18 with a null BL) to obtain his estimate. However, such a distribution can still be rejected as it fails to comply with the reported 150.2 w median w/ or w/o the null values. Missing values is not the issue.
7 "Absolute changes in stair climb power represented a mean of 18.0% (SD 31.1) improvement compared with baseline for enobosarm 1mg and 21.7% (SD 65.7) for enobosarm 3mg (vs placebo, 4.8% [SD 23.2])".
8 http://i.imgur.com/8h4sC8s.jpg; Note however Dobs does not support his statement "it is true the placebo arm in the IIb trail was sicker than the enobosarm arm". If there is interest I can add more on this later.
9 I have made multiple requests to AF to describe how he obtained his 80w est and to make his 3mg equation public. In Twitter, he has replied with his usual, but ignored the question. On July 22nd, he then blocked me on Twitter, thereafter preventing me from repeating my questions. I followed up with an email to the address he provided just the week prior. To my surprise, he replied, confirming receipt, but his only comment was to accuse me of stalking him and demand that I no longer try to contact him.
10 GTx Enobosarm ASCO 2007 http://i.imgur.com/MHkpoqn.jpg
11 More detail on the Wedbush note posted on my Twitter homepage @JJSchaible
Disclosure: I am long GTXI, LGND. I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it. I have no business relationship with any company whose stock is mentioned in this article.
Additional disclosure: Note LGND may make an interesting, lower risk way to play the enobosarm event, given their next gen SARM LGD-4033 has shown promise in a phase 2a test.