If you read enough about algorithms and backtesting, you come across story after story of false positives, or type I errors. A backtest shows that such-and-such a factor or strategy will produce outsized returns, and then, when people actually invest real money in it, the returns are dismal. People have become so scared of curve-fitting that they radically simplify their system to avoid it, or they completely discount the value of backtesting, or they won’t rely on any backtests at all.
But what they seem to be rarely scared of is the absence of a factor. In fact, that absence can do just as much harm, or more. It’s the opposite problem, a type II error, a false negative. Missing something that actually works can be more harmful than using something that doesn’t.
Let’s say you’re a doctor diagnosing a complaining patient. A false positive is when you tell the patient that he has pneumonia, but there’s nothing really wrong with him that a few days in bed won’t cure. A false negative is when you tell the patient that there’s nothing really wrong with him that a few days in bed won’t cure, but he really has pneumonia. Both are pretty bad, but only one will have potentially fatal consequences.
It’s the same in trading and investing. If you rely on a factor that doesn’t actually work very well, you may not beat the market. But if you ignore a factor that can have serious consequences, you might end up losing your shirt.
Type I and type II errors are statistical terms. A type I error is the mistake of accepting a hypothesis when it’s not true—when the results the hypothesis explain can be attributed to chance or explained by something else. This is exemplified in the contemporary period by the idea that stocks with a low ratio of price to book value will outperform. Or the idea that because a factor worked in the past, it will definitely work in the future.
A type II error occurs when the “null hypothesis” is false and we fail to recognize that because of a lack of a better hypothesis. In the world of factors and algorithmic trading systems, the “null hypothesis” is the random-walk theory—that true, long-lasting proficiency in trading or stock-picking is impossible, and that so-called “factors” don’t exist.
One of the clearest explanations of investing errors—"Pascal's Wager and Type I and Type II Errors"—was written by Nick Ryder of Kathmere Capital Management, who is a proponent of investing in indexes, and who thinks that type I errors are much worse than type II errors. If you don’t accept my explanation or analogies, I suggest you read his article.
But a far more academic and evidence-based approach was taken in a recent paper—"What is the Optimal Significance Level for Investment Strategies?"—by Marcos López de Prado and Michael J. Lewis, who reach the opposite conclusion. They point out that “a particularly low false positive rate (Type I error) can only be achieved at the expense of missing a large proportion of the investment opportunities (Type II error).” In other words, by rejecting a lot of factors or tactics because you can’t prove that they work, you end up diminishing your returns.
There is no way to completely avoid type I and type II errors. We don’t have the knowledge or foresight to include every factor that might possibly exist, nor can we know with any degree of certainty that a factor that has worked in the past will work in the future. Our work, as traders and investors, is always going to be error-filled.
The best way to avoid type I errors is to use logic, mathematics, common sense, and robust backtesting techniques. If a factor or tactic looks promising, question it from all angles before adopting it. Make sure it works not just in general, but for the particular assets that you’re interested in buying—and not only for the particular assets you’re interested in buying, but in general. Make sure there’s not a better way of calculating things—for example, using fully diluted shares rather than current shares outstanding when calculating market cap, or comparing the value to other companies in the same industry rather than to all stocks or to an ideal. Make sure that it makes logical sense—some “technical” factors turn out to have fatal flaws because they’re unexplainable.
The best way to avoid type II errors is to work very hard to evaluate stocks and/or investing/trading tactics on as many solid factors as possible. Only leave a factor or tactic out if you conclude that it’s redundant, flawed, or doesn’t apply to you. For example, asset turnover (sales to assets) can be a powerful factor, but if you are already using factors that reward high sales (e.g. price to sales), you may not need to include it. But there are other factors that can have serious negative consequences if you ignore them. Companies can thrive even if their sales are steadily diminishing or if their short-term debt is a lot larger than their gross profit, but the chances are low. You don’t have to treat everything as a ranking factor—simply setting some logical limits through screening rules might be good enough. But type II errors are the ones that can really get you.
I’m going to perform an experiment to illustrate this. I have 120 different factors. Some seem to have worked in the past, some certainly didn’t, some have worked sometimes and not other times. Some I use in my own systems, others I wouldn’t dream of using, and a few don’t really make much sense. In other words, there are plenty of opportunities for type I errors here.
I’m going to backtest 75 different ranking systems based on those factors, using Portfolio123. The first five will have one factor, then I’ll add another factor to each system for another five with two factors, and so on until I get to fifteen factors. All the factors in these systems will be chosen completely randomly out of my set of 120 factors, so that the final fifteen-factor systems will be quite different from each other.
I’ll use as my universe stocks with a minimum market cap of $100 million, a minimum median (over the last three months) daily dollar volume of $100,000, and a minimum price of $1.00, and I’ll exclude REITs. The portfolio will consist of the forty stocks ranked highest on the equally weighted factors used, and will be rebalanced to equal weight monthly, with transaction costs of 0.25%. I’ll show the results for four different five-year periods as graphs with annualized returns.
If type I errors are worse than type II errors, you should see pretty much the same (or worse) results no matter how many factors are introduced, for the introduction of a new factor will present more or as many problems as leaving a factor out. If type II errors are worse than type I errors, the results should improve as we add more factors, even if a lot of them don’t actually work.
Below are the results of my experiment.
As you can see, in all four five-year time periods, fourteen- or fifteen-factor ranking systems tended to work better than one- or two-factor ranking systems. If you read the charts from left to right, you can see how type I errors are often introduced when we put new factors in (and that what might be a type I error in one period may not be in another). And if you read the charts from right to left, you can see how type II errors are introduced as we reduce factors. If this experiment has validity, type II errors—false negatives—are quite a bit more dangerous than type I errors.
Single-factor investing takes type II errors to an extreme. Traders who put all their money into stocks with low ratios of “the acquirer’s multiple,” enterprise value to EBITDA, for example, without looking at anything else about the company, may well lose their shirts because of type II errors. Ditto for those who invest using Joel Greenblatt’s “Magic Formula,” in which the only factors they consider are that multiple and return on capital; or those who look for stocks selling below their net current asset values without looking at other aspects of those stocks. Ditto for those who invest in single-factor ETFs.
I don’t know how to design an experiment that would generalize this from stock-picking factors to investment/trading decisions in general. But I am 100% sure that the same principle applies. It’s better to overthink things than to underthink them. When it comes to investing, simplicity is your enemy.
When investing or trading, it’s best to use as much information as possible, and to look at your options from every angle. If you were playing poker, you would want to consider not only what cards you held, but what cards other players might be holding; you’d want to consider their “tells”—their body language, the way they hold their cards, how confidently they’re betting, the likelihood they’re bluffing; you’d want to consider how much is in the pot and how much you have left to bet; you’d want to consider the probability of each card left in the deck helping or hurting your hand and those of your opponents; you’d want to consider how likely a strong bet is going to force them to fold and how likely a weak bet will leave them in the game; you’d want to consider the cost of seeing the next card after all bets are placed. Even if you’re completely wrong about some of these factors (a type I error), it’s better to consider them all than to ignore one or more of them (a type II error). Don’t be swayed by all the “curve-fitting” arguments out there, or the studies that purport to show that actively managed portfolios underperform indexes (they fail to take into account the multiple disadvantages those portfolios face). Instead, decrease your risk and increase your returns by examining every trading or investment decision you make from as many angles as possible.
Disclosure: I/we have no positions in any stocks mentioned, and no plans to initiate any positions within the next 72 hours. I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it. I have no business relationship with any company whose stock is mentioned in this article.