What You Need To Know About Backtesting A Strategy

by: Fred Piard


The author lists the pitfalls to avoid in a backtest.

Then he lists the most useful factors to interpret it.

Finally he gives a scientific view of what is "luck" in investing.

More and more investors are using simulations on the past (or backtests) to make investment decisions for the future. Unfortunately they often make two mistakes: they assume that a simulation has a predictive power and they focus on the return. Whereas the main interest of a simulation is in evaluating the risk of a strategy. Here are some non-exhaustive ideas about how avoiding the pitfalls and how interpreting a backtest.

A. The 7 Deadly Sins of Backtesting

1. Forgetting time factors

A simulation should use only data that were available at every decision point in the past. Make sure that your database has no bias by design. It must contain disappeared and merged companies. Index-based universes must be timestamped (for example the S&P 500 list changes over time). The fundamental data must be timestamped and available at each point in time.

2. Forgetting trading costs and slippage

Even with a low-cost broker, trading fees can have a heavy drag, especially on daily strategies. Moreover, when you begin to become familiar with backtesting, you may find strategies that should make you a billionaire in a few years. The problem is that they are unrealistic. Most of the time, penny stocks are included by default in the simulation universe. The volume may not be sufficient to build a sizeable position. Even if it is, you may get a much worse price for the volume you buy than the theoretical price used in simulations.

3. Modelling short strategies for stocks

Short-selling strategies cannot be simulated in a reliable way on individual stocks, because the required data don't exist. There is no database available to retail traders that can tell you that a stock "S" was shortable by a broker "B" from a date "D1" to a date "D2", with an average borrowing rate "R".

Apart from the data availability, here is a short story that informed my decision to avoid selling something I don't own. In 2008 Volkswagen AG was put in a "short squeeze" on the German stock market. The share skyrocketed from about 200 Euros to almost 1000 Euros in two days, becoming the largest capitalization in the world. Then it fell to its previous level even faster. In the interval, short sellers covered their positions at any price, voluntarily or forced by their brokers. At least one of them committed suicide: a 75 year-old billionaire and stock market veteran.

4. Complexity

Warren Buffett has written that we should invest only in something we can understand. If that is true for individual stocks, that is also true for strategies. You will be more confident by keeping things as simple as possible. It is also very often a clue of robustness. Avoid proprietary features of your software. You master your strategy only if it is portable to another tool.

5. Over-fitting

The rules, parameter values, simulation interval, starting date, number of positions may be involuntarily optimized for a particular market situation. A reasonable backtest should have a sufficient number of decision points, contain all market conditions in the study interval, and be run with different starting dates, rebalancing periods and number of holdings. Ideally a model should be tested out of the data sample used to design it.

6. Forgetting control

Track and compare real and simulated results when you are executing a strategy. Investigate if you detect a divergence and don't hesitate to put a strategy in quarantine if you cannot find an acceptable explanation.

7. Misinterpretation

A simulation just gives you a data series. It may lead to a bad decision if not correctly interpreted. For example, a common mistake is to focus on the return, whereas the drawdown is a more important factor, both from a psychological and financial point of view. An all-weather portfolio should contain at least 3 or 4 strategies based on different logics. There are two risks: investing in a bad strategy (statisticians call it the alpha-risk), and rejecting a good one (the beta-risk). The alpha-risk is the most harmful. With good money management it means the loss of all or a part of the allocated capital, and with a bad or non-existent money management it may be the way to financial ruin. The beta-risk means just a missed opportunity. Missing an opportunity is not harmful. Except if the decision process makes you miss opportunities systematically. The next part shows a few factors that should be considered in the interpretation of a backtest and the evaluation of a model.

B. Evaluation factors

1. Average return

There are different ways of "averaging" a data series. Usually, average return means the Compound Annual Growth Rate. It is the constant annual return that would have given the total return, reinvesting gains. If the total return of a period of Q years is T%, then: cagr = (1+(T/100))1/Q -1

2. Return by period

Returns by period help to understand how a strategy might perform in various market conditions. For example, you can run backtests by years, or by 6-month periods.

3. Drawdown Maximum Depth

The current drawdown is the loss in percentage between the highest portfolio value and the current portfolio value. The maximum drawdown for a period of time is the maximum of the drawdowns for all dates in the period. For leveraged and very volatile trading strategies, it is better to use intraday lows and highs. Sometimes drawdowns are calculated using weekly or monthly close prices, which may be quite inaccurate to represent what really happens to a portfolio.

4. Drawdown Maximum Length

The maximum duration in drawdown may have a deep impact on an investor's psychological state. It must absolutely be taken into account.

5. Standard deviation

The standard deviation measures the dispersion of a data set. In finance, it is usually calculated for a time series of returns as a measure of historical volatility. In effect, it is used to quantify a risk, especially in the calculation of the Sharpe and Sortino ratios.

6. Sharpe ratio

The Sharpe ratio is a risk-adjusted performance indicator: the higher, the better. It takes into account the difference of the average return with a benchmark, and the volatility. The Sharpe ratio promotes strategies that have good and steady returns.

7. Sortino ratio

The drawback of the Sharpe ratio is that it penalizes strategies that have sometimes exceptionally good years (because they have a higher volatility). The Sortino ratio corrects that, taking into account only the "negative" volatility. I like strategies with a good Sortino ratio, I like even more when the Sortino ratio is higher than the Sharpe ratio. It shows that the deviations from the mean are stronger upward than downward.

8. Kelly Criterion

Both previous ratios rely on Gaussian statistical hypotheses. Gambling theory is supposed to be more general. The Kelly criterion is the best known ratio in this field. Nevertheless the Kelly criterion also relies on probabilistic hypotheses: the probability of gain and the average gain/average loss ratio are supposed to be constant. Its value gives the theoretical percentage of an available capital to bet on a strategy to maximize performance in the long term. In reality it should be considered as a limit. I use it as a probabilistic indicator of robustness. As it is an invariant with leveraging, it can not be used as a risk indicator.

9. Correlation with benchmark

Correlation is a statistical measure of the closeness of returns of two data series. The correlation with a benchmark is not good or bad by itself, it points out how a strategy is influenced by broad market moves. This information may be used in combining strategies.

C. The Luck Factor

Even if the numbers look great, you must never forget the underlying hypotheses of the models. The Sortino ratio makes sense with a Gaussian statistical distribution of the return, the Kelly criterion with a constant probability and gain/loss ratio, and the maximum drawdown makes sense if the simulation period has covered bad enough market conditions. To give an example of the influence of a small variation in the model, here are numbers of a strategy from my book: the dataset has 501 weekly returns, the average gain/average loss ratio is 1.02, the experimental probability of a winning week is 61.6%. Taking into account the dataset size, the probability with a 95% confidence interval is 57.2%. It means that I have measured a probability of 61.6% on a set of 501 data, but in fact the real probability of the game may be higher or lower. In this case a formula of statistics tells us that the real probability has a 95% chance to be higher than 57.2%. In this case, it remains a winning game, but the Kelly criterion falls from 24% to 16%. Here are 100 random equity curves based on probabilities in both cases:

Click to enlarge

The horizontal bar represents the same value on both charts. On the second chart, not only the time to reach it in the best case scenario is longer, but also the incertitude is bigger. When for any reason a strategy is over-rated (methodology, insufficient sample of data and market conditions, etc.), the beam representing possible futures for the portfolio value may be wider than you think. The wider the beam, the less certain the profitability. Selecting various strategies with thin beams, and based on different logics, gives you a better chance to reach your financial target in an acceptable time. This article summarizes a few paragraphs from my book Quantitative Investing published at Harriman House in 2013.

Disclosure: The author has no positions in any stocks mentioned, and no plans to initiate any positions within the next 72 hours.

The author wrote this article themselves, and it expresses their own opinions. The author is not receiving compensation for it (other than from Seeking Alpha). The author has no business relationship with any company whose stock is mentioned in this article.