Backtesting 101, Part 1: The Misunderstood Investing Tool

|
Includes: AGX, APOL, AR, ARLP, ATW, BPT, CALM, CENX, CYD, CYOU, ENTA, EPEG, ESV, EXTN, GBX, GME, HPQ, MOH, NSU, OUTR, PDLI, PZE, RPXC, SAFM, SSL, TA, UEPS, VLO, WLKP, ZVO
by: Ryan Telford
Summary

While backtesting can be a valuable tool in an investor’s toolkit, it is often misunderstood.

Investors can be tempted to use backtested data to pick invidiual stocks, when in fact backtesting is a portfolio tool.

The reliability of backtesting results depends on many factors, but total time tested is of great importance.

(Credit: Pixabay.com)

One of the most important tools in a quantitative value investor's toolbox is that of backtesting. Once we find an investing strategy that makes intuitive, logical sense, we test it to see how it has performed in the past to identify key information that can be used to predict how it will perform in the future.

This said, in my travels, one theme that is consistent is that backtesting is generally misunderstood. Some investors are quick to validate an investment thesis through a superficial backtest, while others immediately discount the idea of using backtested data entirely.

There are many backtesting tools out there that can help retail investors (I use Portfolio123 for my backtesting). Like any tool, the results will reflect on how well you understand the tool and its limitations. At the risk of quoting an over-quoted expression, "garbage in = garbage out".

An analogy for car enthusiasts: you can have the fastest car, but if you don't know how to drive the car and/or you do not know the track you are driving on, then you are not using the car to its potential, or worse, you may crash and burn.

The same goes for backtesting. In this case, the backtesting tool is the car, and the data you are testing is the track.

Basic backtesting will tell you some general, overall statistics about a strategy, but will only be part of the picture. It is the details you want to understand. Put another way, in quantitative investing, "there is backtesting, and then there is backtesting".

Through my own research and experience with backtesting in my own quantitative investing, I set to explain backtesting in an effort to promote more understanding. When I started writing, I realized that this discussion was best suited to a series of articles. I have broken down the subject into three parts:

Part 1 - How Backtesting is intended to be used (subject of this article)

Part 2 - Analyzing backtesting data

Part 3 - Universes, Data Sources and other considerations

My objective with these articles is not to hard sell readers on the merits on backtesting, but instead to provide enough information on the subject so they can make an informed decision on when to use this technique in their investing, if at all. Backtesting and quantitative investing is a vast subject, and there may be some items that won't be discussed here. Depending on the feedback there could be further discussion at a later time.

Basic Definition of Backtesting

To frame this discussion, here is my own definition of backtesting:

Backtesting is the use of historical data to assess how a rational "buy and hold" portfolio investment thesis has performed in the past over several extended periods. After a prudent assessment, the strategy may be considered to perform similarly in the future over an extended period.

The key phrases in this statement are:

  • "rational 'buy and hold' portfolio investment thesis",
  • "several extended periods" and
  • "prudent assessment"

In my opinion, without these elements backtesting is simply an exercise in data mining.

Let's take a look at this in more detail.

Key Phrase #1: "Rational 'Buy and hold' Portfolio Investment Thesis"

There are three key concepts in this statement. First off, backtesting is intended for testing "portfolio" strategy investment theses with a basket of stocks (say 20 to 30), not individual stock picks. For example, backtesting may show that investing in a basket of low EV/EBIT stocks has performed well in the past.

As an example, the following table shows the 30 lowest EV/EBIT stocks in the US Mid and Large Cap Universe I screen for, as of 01 Jan 2016.

Ticker

Name

EV/EBIT(as of 01 Jan 2016)

APOL

Apollo Education Group Inc

0.3

PDLI

PDL BioPharma Inc

1.53

MOH

Molina Healthcare Inc.

1.8

NSU

Nevsun Resources Ltd

1.94

BPI

Bridgepoint Education Inc

2.58

AGX

Argan Inc

2.73

CYD

China Yuchai International Ltd

2.79

TA

TravelCenters of America LLC

3.01

EXTN

Exterran Corp

3.26

HPQ

HP Inc

3.43

GBX

Greenbrier Companies Inc. (The)

3.49

RPXC

RPX Corp

3.49

BPT

BP Prudhoe Bay Royalty Trust

3.6

CYOU

Changyou.com Ltd

3.61

ENTA

Enanta Pharmaceuticals Inc

3.69

ATW

Atwood Oceanics Inc.

3.76

CALM

Cal Maine Foods Inc

3.83

ARLP

Alliance Resource Partners LP

4.03

OUTR

Outerwall Inc

4.1

WLKP

Westlake Chemical Partners LP

4.11

EPE

EP Energy Corp

4.51

PZE

Petrobras Argentina SA

4.54

UEPS

Net 1 Ueps Technologies Inc

4.56

CENX

Century Aluminum Co

4.74

GME

GameStop Corp.

4.77

SSL

Sasol Ltd

4.78

AR

Antero Resources Corp

4.86

SAFM

Sanderson Farms Inc

4.9

ESV

ENSCO Plc

4.91

VLO

Valero Energy Corp

4.98

(Data Source: Portfolio123)

An investor may be tempted to cherry pick one or more of these stocks based on the low EV/EBIT alone, and to extend the backtested performance to individual low EV/EBIT stocks. The backtested performance applies only to buying and holding the entire basket of stocks, not individual issues. That is not to say that an investor cannot use the screen as a starting point for further research into an individual stock(s), however the overall backtested results applies only to the portfolio.

This is one of the key facets of portfolio investing; you are investing in multiple stocks to achieve a result. Some stocks may appreciate, some may crash and burn, but it is the overall result you are after.

The second concept here is that backtesting is used for "buy and hold" investment strategies. A common buy and hold strategy is to buy 20 or 30 positions based on a given criteria, and hold for one year. The "hold" is important, as the backtesting results assume that the stocks for the given criteria are held for the duration. Regardless of what you may think about "for profit" education stocks (APOL or BPI) or energy stocks (NYSE:VLO), or that natural gas may make coal obsolete (NASDAQ:ARLP), backtesting assumes the stocks are held for the entire holding period.

The third key concept is that of being a "rational" investment thesis. The thesis being tested should be rooted in some form of fundamental (and/or technical) basis. Backtesting software will let you test for any parameter, but it will not tell you what you can and cannot test.

James P. O'Shaughnessy, a pioneer in quantitative investing, said it best:

If you torture the data long enough, they will confess to anything. If no sound theoretical, economic, or intuitive common sense reason exists for the relationship, it's most likely a chance occurrence. Thus, if you see strategies that require you buy stocks only on a Wednesday and hold them for 16 1/2 months, you're looking at the results of data-mining.

(Source: "What Works on Wall Street", 3rd Edition, p 36)

As we shall see, strategies should be tested in as many time periods as possible to avoid the dangers of data mining.

Key Phrase #2: "Several Extended Periods"

"Extended Periods"

The larger the study period in any backtest, the better. Statistics over decades is much more comprehensive than any 1, 3 or 5 year periods. A strategy may have done incredibly well in these shorter periods, but chances are that over longer periods this performance will break down.

Not only are longer periods preferred, but through as many different types of markets as possible. Bull, bear, severe bear, bubbles, recoveries, etc. These types of markets repeat themselves over time, so you want to be aware of how a strategy has performed through these periods. History often repeats itself in the stock market, and you will most likely experience these types of markets in your investing career at some point.

Strategies can behave very differently in different environments. Testing periods to capture these are critical. As an example, take a look at the S&P 500 index since 1995:

SPY Chart

SPY data by YCharts

Using the S&P 500 chart as a broad indicator of the US market, we can seen the following market types:

  • 1999 Tech Bubble
  • 1999/2000 Tech Bust
  • 2002-2003 US Recession
  • 2003-2007, Growth
  • 2008 Severe bear
  • 2008/2009 US Recession
  • 2010-2015 Recovery, growth
  • Future 2016, Trump(!)

I cannot stress how important it is to test as far back as possible. If a strategy is boasting high returns, be sure to check over what time period it is referring to. As we will see, all strategies have periods of ups and downs, so excess returns may be due for a correction downwards, and vice versa.

"Several Extended Periods"

While total length of time testing is important, it is also important to know over what period. For example, if you are looking for the 10 year performance of a strategy, realize there are many 10 year periods since a given start date. For example, you have 2000 to 2010, 2001 to 2011, 2002 to 2012, etc. And then within each of these periods you have Jan 2000 to Dec 2010, Feb 2000 to Jan 2011, etc. These subperiods are referred to as "rolling periods", and provide a much more accurate picture of the strategy than just the calendar year periods.

Rolling periods are such an important component of backtesting results, that I believe that if they are not presented in a set of backtest performance results, then any claims on performance should be rejected. Let's understand why.

Capturing as many different market types as possible

A 5 year period from 2003-2007 is very different than 1999-2003 (see S&P chart above). To truly see how a strategy has performed over time, you want to see how it has done over all 5 year periods. This way you are seeing the strategy through many more types of market and business cycles.

Seasonal Effects

In many backtesting strategies and performance results, the assumption is that the investor invests on 31 Dec or 01 Jan, holds for a year, and re-balances.

Realistically, do you make all of your transactions on one day of the year? I don't.

But then again, you may be asking, "Does it make a difference as to what time of year I invest in a quantitative strategy"?

Let the table below answer this question. The low EV/EBIT strategy was tested over twelve different 16-year periods, or a different purchase and re-balance time throughout the year.

Annual Return, Arithmetic Mean, %

15 Year Period

Low EV/EBIT*

R3000TR

Jan 1999 - Dec 2014

26.2

5.77

Feb 1999 - Jan 2015

24.64

5.46

Mar 1999 - Feb 2015

22.48

6

Apr 1999 - Mar 2015

18.83

5.62

May 1999 - Apr 2015

15.16

5.31

Jun 1999 - May 2015

15.21

5.67

Jul 1999 - Jun 2015

13.91

5.14

Aug 1999 - Sep 2015

15.95

5.5

Sep 1999 - Aug 2015

12.21

5.1

Oct 1999 - Sep 2015

12.58

5.14

Nov 1999 - Oct 2015

16.56

5.35

Dec 1999 - Nov 2015

16.6

5.27

Average

17.53

5.44

Standard Deviation

440.8%

26.8%

(Source: Portfolio123 data & author calculations)

· (*) Mid and Large Cap Stocks Universe, US Market, 30 stock holdings with lowest EV/EBIT in universe

As you can see, the "buy" period has made a significant difference in annual returns with this strategy. Buying and holding in July has resulted in a return nearly a 50% less than if buying in January! Note the fairly steady returns in buying the benchmark, Russell 3000 Total Return.

Can these variations be explained? The stock market is a very dynamic system, and in addition to broad macro movements, there are often changes seasonally. You may be aware of the "January Effect", for example. This phenomenon can be described as follows. As the year approaches its end, investors are aware that the tax man will be coming after him/her in the new year for any capital gains over the year. To help counter these gains, many investors rush to sell-off their losing positions just before year end. These investors are suddenly sitting on a lot of cash, which they spend after January 1st of the new year on new issues, creating a surge of buying in the market and lifting many security prices.

There is also the traditionally more volatile period of May to October. "Sell in May and Go Away" is a maxim describing the activity of investors selling their stocks in May, only to re-enter the market in November. There are many theories on why this is, Investopedia has a good summary here. The point for us is that trading patterns in the first quarter of the year aren't necessarily the same as the 2nd and 3rd quarters of the year.

Considering again that investors have different times of the year at which they invest, coupled with these seasonal effects of the market, it is important to take these effects into account in backtesting results. In the backtesting that I do, I look at all monthly periods, i.e. Jan - Dec, Feb-Jan, Mar-Feb, Apr-Mar, as the table above shows. Testing in this way you are getting a picture more representative of annual effects.

This can make quite a difference in strategies. In earlier editions of his book "What Works on Wall Street", O'Shaughnessy found that a low Price to Sales (P/S) ratio was the best performing single value factor over his period tested. His approach at the time assumed an investor bought the basket of low P/S stocks on January 1st, held for one year, and re-balanced. In the 4th edition of his book, O'Shaughnessy improved his backtesting methodology to include rolling periods.

The outcome? After considering more rolling periods, he found that the low Price to Sales ratio was not the best performing single value factor, but in fact the low EV/EBITDA strategy was. This should be a clear example of how neglecting seasonal effects can have misleading effects on backtest results.

The takeaway here is that if you see strategies boasting high returns, be sure to understand at what time of year the re-balance date is, and if it covers multiple months within the year.

Concluding Remarks

In Part 1 of this series, we have looked at some of the minimum requirements for appropriate backtesting. In summary the investment thesis being tested should be rational, and is intended to be used only for portfolio investing. We also discussed the importance of testing as far back as possible, and with as many different rolling periods as possible.

In Part 2 we will look at how to interpret backtested data. Stay tuned!

Follow me on Seeking Alpha for more articles on Quantitative Value Investing!

Disclosure: I am/we are long AGX, VLO, BPI, HPQ, SSL, TA, ARLP, NSU, RPXC. I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.

Additional disclosure: I am a user of Portfolio123.com and have included affiliate links in the article above.