Break Your Strategy: How To Stress-Test Your Quantitative Models
Summary
- Suggests that quantitative investors improve the robustness of their systems by subjecting them to stress tests.
- Identifies ways to produce variations on an investment or trading strategy as the basis for further testing.
- Provides concrete examples of stress-testing and the results of applying these tests to actual multifactor systems.
- I do much more than just articles at The Stock Evaluator: Members get access to model portfolios, regular updates, a chat room, and more. Get started today »
Building and Breaking Models
If you’re a quantitative investor or trader, you build a model and then backtest it to see if it has worked in the past; if you’re like most people, you try to improve your model with repeated backtests. You’re operating under the assumption that there will be at least some modest resemblance between what has worked in the past and what will work in the future. (If you didn’t assume that, you wouldn’t backtest at all.)
But what few backtesters do after building their model is to try to break it by subjecting it to stress tests. A truly robust model should withstand every moderate attempt to break it. Only then should it be put into practice.
This article will outline some techniques for stress-testing quantitative models. I use Portfolio123 to build and backtest my models. If you model using a different platform, most of what follows won’t apply, but I’ll try to explain my techniques in language that can be adapted to other platforms.
General Guidelines for Stress Tests
Each model that one designs on Portfolio123 essentially consists of a ranking system and a universe to which it applies. One can add a lot of complexity to the model, but those are perhaps the two most important foundations. The universe consists of a root universe and then incorporates a variety of screening rules to eliminate stocks with low liquidity, high risk, low growth, high price, or whatever else you want to put in there. In order to do these tests, you must put as many of your screening and buy rules into your universe as you can. Each model then buys the top-ranked stocks according to its ranking system and sells them when their ranking falls to a certain point, buying new stocks to replace them. The number of holdings and the sell rules are some of the things we’re going to vary to see if the models will break.
Portfolio123 offers four very different ways of testing a particular ranking system and universe. We can use:
- a screen backtest, which rebalances to equal weight every rebalance period;
- a rolling screen backtest, which holds stocks for a certain period, and one can see overlapping returns that way;
- ranking system performance, which separates the universe into quantiles according to the ranking and shows how each would have performed; and
- an actual simulation of the model, with buy and sell rules.
To try to break a model, we should try all four of these tests, varying parameters. We should vary the number of stocks held at one time; we should vary the universe rules, testing on subsets of the universe or on an altogether different root universe; we should vary the factor weights in our ranking system; and we should vary the time period tested, including, if possible, testing it on a time period that hasn’t been tested before.
A test can be said to fail if the results have a negative alpha or a negative excess return when compared to a benchmark that consists of all of the stocks in the root universe of the one that’s being tested (or if, in a rank performance test, the top bucket is lower than the middle buckets).
So let’s take a closer look at these stress tests.
Four Sets of Variations
- First, vary the number of stocks being held. Test it on the top ten, top twenty, top fifty, and top one hundred.
- Second, vary the root universe. Test it on the S&P 500, the Russell 1000, the S&P 1500, the Russell 3000, Canadian stocks, international stocks only, and universes consisting of only certain sectors. Be sure to adjust the slippage accordingly: testing on the Russell 3000 will require higher transaction costs than testing on the S&P 500. Change your screening rules by varying your hard limits by 10% or 15%.
- Third, vary the factor weights in your ranking system to a moderate degree. Add, say, 3% to five factors and subtract 3% from another five factors, or add 3% to five factors and normalize them all. Then go back to your original weights and do that again.
- Fourth, vary the time period. Test it on the last ten years, fifteen years, and three years; go back and test it on other discrete time periods as well.
If you do all this using a screen backtest, a rolling screen backtest, a ranking system performance test, and a simulation, you’ll be doing thousands of stress tests. This is obviously impractical, so it's best to design a subset of a dozen or so stress tests that will help you try to break your system.
Practical Stress Tests
For the purposes of this article, I designed five very different systems that backtested quite well, and then put them through a dozen stress tests. I won't describe all these in detail, as it would be tedious in the extreme. Suffice it to say that three out of the five systems I tested failed a test. The two toughest tests were as follows:
- I ran a 10-bucket rank performance test on one variation of my ranking systems over the last fifteen years with a rebalance period that matched my actual average holding period and my root universe being the S&P 1500. I then compared the top bucket to the middle buckets. In one case, the top bucket was lower than the average of the two middle buckets.
- I ran a rolling screen backtest of the top 100 stocks, using the Russell 3000 as my base universe, over the last four years only, using another variation of my ranking systems, with a holding period that matched my actual average. In two cases, my backtest failed to exceed that of the benchmark, in part because of realistic slippage costs and in part because the last four years have been pretty terrible for small-cap value stocks.
Conclusion
The natural impulse for a quantitative investor or trader is to try to create a system by tweaking various inputs so that when backtested, it shows excellent returns. But unless the system is subjected to stress tests such as the ones I've discussed, it has a higher chance of breaking down when actually implemented with real money. Backtesting for failure may be just as important as backtesting for success.
My marketplace service, The Stock Evaluator, comprehensively ranks close to 5,000 stocks weekly based on a sophisticated multi-factor system with deep roots in accounting and valuation methods. It has a terrific out-of-sample record: over the 21 months since the service began, high-ranked stocks have consistently outperformed the market while low-ranked stocks have massively underperformed it.
This article was written by
Yuval Taylor is an author and analyst with 8 years of experience using multifactor ranking systems to buy and sell stocks. He focuses on microcaps and emphasizes evaluating every stock from as many angles as possible via algorithm. He is the leader of the investing group The Stock Evaluator.
Features of the service include: disclosure of Yuval’s personal positions, 2 unique portfolios, a spreadsheet of nearly 10,000 stocks rated from 0 to 100 weekly, and live chat for questions. Learn more.Analyst’s Disclosure: I/we have no positions in any stocks mentioned, and no plans to initiate any positions within the next 72 hours. I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it. I have no business relationship with any company whose stock is mentioned in this article.
Seeking Alpha's Disclosure: Past performance is no guarantee of future results. No recommendation or advice is being given as to whether any investment is suitable for a particular investor. Any views or opinions expressed above may not reflect those of Seeking Alpha as a whole. Seeking Alpha is not a licensed securities dealer, broker or US investment adviser or investment bank. Our analysts are third party authors that include both professional investors and individual investors who may not be licensed or certified by any institute or regulatory body.