An Example Of How Data-Mining Bias Plagues Fund Management

| About: SPDR S&P (SPY)

Summary

The connection between market forecasts and patterns in historical data is weak.

The main reason for the weakness of patterns in forecasting market moves is data-mining bias.

Data-mining bias gets high when filters are used to polish the performance of historical patterns.

Data-mining bias and how it is manifested in trading and investment management decisions is a complex subject. Here we present an example from a recent article in Yahoo Finance of how data-mining bias still drives Wall Street decisions.

The fund manager in the article argues that "the market must be bought aggressively." In principle, there's nothing wrong with the call itself. I know a few professionals who share the same idea. What is important is how they justify it.

In the quoted article at the bottom, the manager justifies his opinion as follows:

"Since 1940, to gauge what stocks do between 9/15 and YE is simply look at YTD performance. When stocks are up 5% or better, they rally into YE 87% of the time (90% when between 5% and 20%). When stocks are down YTD (through September), they historically show no further advance until YE."

Let me show you why the above strategy is a result of data-mining bias.

Nothing was mentioned in the Yahoo article about a 23.5% loss in 1987 - only the win rate was mentioned, rising from 87% to 90% when this outlier is removed by placing a 20% cap on YTD return to mid September. Some of you remember that the stock market was up more than 30% by mid September 1987. Our fund manager wants to remove that outlier because it does not look good for his strategy. So he arbitrarily filters out 1987 returns by applying a 20% cap. This is part of what causes data-mining bias. In this case the bias is manifested through selection bias.

Below is the performance of the proposed strategy since 1950 without dividend reinvestment:

Performance of strategy of holding stocks until the end of the year if by mid September the S&P 500 is up 5% or more Click to enlarge

In the above chart of S&P 500 (NYSEARCA:SPY) since 01/1950, the middle pane shows buy and hold equity performance (no dividends are included) and the bottom pane shows the performance of the proposed strategy.

The outlier in 1987 is noticeable in the bottom pane that shows the strategy equity. This left tail risk is removed after applying the filter of a maximum 20% gain by mid September, as shown in the chart below.

Hold stocks until the end of the year if by mid September the S&P 500 is up between 5% and 20% Click to enlarge

Now, the equity curve in the bottom pane looks fine but risk of a large loss lurks because the 20% cap is irrelevant and arbitrary. Any gain to mid September can be followed by a large loss by year end. This is because the equity curve in the above chart corresponds to only one possible path out of a very large number of possible paths in the time domain with different statistical properties and left tail risk in the distribution of returns.

Note that nothing in the above analysis says that one should not buy aggressively stocks now. The point this that such decisions cannot not be justified in the context of data-mining bias. In reality, the important point is how an investor is protected against left tail risk by proper diversification. Looking at the odds of success, especially when they are plagued by data-mining bias, is the wrong way of dealing with the markets.

Original article

Charting and backtesting program: Amibroker

Disclosure: I/we have no positions in any stocks mentioned, and no plans to initiate any positions within the next 72 hours.

I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it. I have no business relationship with any company whose stock is mentioned in this article.