The Perils Of Backtesting Technical Strategies

Includes: RSP, SPY
by: Jeff Gonion

Backtesting is the process of evaluating a strategy using historical data as input, usually by measuring the performance of the strategy over time. It is a popular technique used by both individual and professional investors alike, but should be used with caution.

For example, growing weary of your net worth evaporating during recessions, you might decide to backtest owning the S&P 500 only when it's above it's 200-day moving average, thus avoiding the big recessions. You could test this theory on historical data for SPY to see how it performs. Backtesting this strategy reveals that it avoids recessions as planned, but also locks-in many smaller losses, while avoiding the rebounds that typically accompany them. Backtesting helps you realize that perhaps this isn't such a good idea after all.

When backtesting a timing strategy on a single equity or index, care must be taken to ensure the backtest does not inadvertently use future or coincident data that would not be available to base decisions on. Otherwise, backtesting simple timing strategies is fairly straightforward.

Backtesting stock-picking strategies is entirely different, fraught with subtle complexities that can easily lead to incorrect conclusions. In order to correctly backtest stock-picking strategies, a complete historical database of the entire stock market is needed, including companies that have either gone bankrupt or merged with other companies.

A common technique among individual investors is to backtest strategies using historical data for stocks that currently exist, which ignores companies that have merged or gone bankrupt. This causes "survivor bias" in the backtest, and results in backtests that are overly-optimistic.

The further back in time the test goes, the greater the number of bankrupt companies that will be ignored, and the greater the survivor bias, especially if the results are compared to real-world stock indexes which do not have survivor bias. This is also true for smaller market-cap stocks. Small-cap stocks declare bankruptcy more frequently than large-cap stocks, and therefore will have more survivor bias than large-cap stocks.

While the returns of backtested strategies will be overly-optimistic, and shouldn't be compared with real-world returns, but the impact of survivor bias on stock-picking strategies can be minimized.

Given that survivor bias changes with the time-period under test, and the market cap of the stocks tested, it can be minimized by comparing 2 strategies on the same market-cap range and same time period. Since survivor bias affects both backtests, much of the survivor bias will be cancelled, leaving only the bias that applies unequally between the two strategies. The hope is that this will be minimal, but we really have no way of knowing.

Backtesting, an example:

Large cap stocks are the top third (by rank) all stocks by market cap. Since the actual market cap of these stocks varies over time, the rank percentile (67-100%) is used to classify whether a stock falls into the large-cap bucket.

The strategy we will test is holding stocks that trade the most volume of shares (on average). It will hold top 50% of large-cap stocks, as sorted by 3-month average trading volume. The time period will be since summer of 2003. The strategy equally-weights the stocks that it selects.

(Click to enlarge)

The chart shoes that holding only the upper half of large-cap stocks by volume returned (on average) 11.2% per year. In contrast, holding the lower half returned 9.8% per year. Just holding all large-cap stocks returned 10.5% per year.

The important thing to remember is that all of those figures include survivor bias. You would not actually have gotten 11.2% returns if you had invested this strategy over the time period. However, there is a good chance that you would have out-performed large-cap stocks by a small margin (0.7% = 11.2% - 10.5%).

It is probably safe to say that the strategy outperformed large-cap stocks by 0.7% over the period. If RSP (an equal-weight large-cap fund which actually returned 8.3% during the period) is representative of large-cap stocks in general, then it would be reasonable to assume that the strategy would have actually returned 9.0% during the period. If RSP is not representative, there is no way to know what the total return would be in absolute terms.

While it may look that this strategy could give better returns than large-caps in general, it is important to note the characteristics of a strategy in both bull and bear markets. For example, during the Great Recession, large-cap stocks returned -58.1%, and this strategy returned -62.2%. This tells us that the strategy has simply selected stocks with higher beta than large-caps as a whole. Stocks with a higher beta will rise and fall more than their benchmark. This means that the strategy gets higher returns by taking higher risks.

It reasonably easy to get higher returns in the market by taking more risk, so there is nothing particularly interesting about our high-volume strategy. Ideally a strategy would give higher up-side returns but have the same of better down-side risk than the market.

While backtesting strategies is a useful tool for technical traders, it is important to consider how a backtest could differ from actual market conditions.

Another example: If a strategy purposely (or inadvertently) selects stocks with low trading volume, it may not be possible to actually get the stocks price used in the backtest. You can easily be forced to buy at higher prices and sell at lower prices than the backtest.

Can backtesting actually be used to develop viable strategy that works in the real market?

That remains to be seen, but we are going to give it a try in a live experiment here on Seeking Alpha. Further details are available here.

Disclosure: I am long RSP.