This is a brief article that discusses the perils of portfolio allocation, strategic or tactical, when multiple trials are used to identify composition and parameters. This is by no means a complete treatment, but only a brief introduction to the issues involved.

Some books and articles suggest various passive or active portfolio allocation schemes for superior risk-adjusted returns based on the S&P 500 total return benchmark (NYSEARCA:SPY). In most cases, the allocations are based on hindsight. Even worse, the statistical significance of proposed portfolios that use ETFs is limited by available data history. Investors should be educated about the perils of hindsight bias.

Due to proliferation of online services that offer portfolio-level backtesting of strategic or even tactical portfolio allocation schemes, it has become easier to try different combinations of assets and portfolio compositions in an effort to identify a mix that has outperformed the market.

This practice is dangerous, especially when there is not enough data to perform an out-of-sample test. However, even out-of-sample tests are useless with this practice, because as soon as the first test is performed, any subsequent ones can no longer be considered out-of-sample tests due to data snooping.

Data snooping is what basically ruins naive portfolio analysis. Usually, one starts with an idea about a portfolio, PORT1, which is significant at the 5% level with probability P1. If PORT1 is not acceptable and another one is considered, PORT2 with probability P2, then the following does not hold:

PORT1 is significant with P1

PORT2 is significant with P2

But rather, this holds:

PORT1 is significant with P1

PORT2 is significant with P2/k

where k is a positive number that is, in general, a function of the trials and parameters involved in the portfolios.

The correction of the probability P2 is essentially a Benferroni correction for multiple comparisons. There are other more advanced correction methods, but the main idea is the same: as the number of trials to find a portfolio that outperforms some benchmark increase, the statistical significance of a single result decreases, and appropriate corrections must be made to account for that. For an academic treatment of this important subject, see the paper by Campbell R. H. and Yan L., *Evaluating Trading Strategies*.

The important thing to know about the correction k is that it grows with the number of trials. For example, depending on the number of assets considered in a strategic allocation scheme after 10 trials, k may get as large as 100. In tactical allocation schemes and depending on number of parameters to also determine timing, k can get as large as 1,000 or even larger. This means that as the number of trials increases, the portfolios considered demand even higher statistical significance. This further means that most portfolios with unadjusted levels of significance (k = 1) can be artifacts of multiple comparisons and data mining bias. Using a portfolio with unknown significance may be a costly move. Sadly, the statistical significance of 99% of the portfolios that are proposed in books and articles is not known due to issues arising from multiple comparisons and discussed above. Possibly only a few portfolio schemes are significant that were conceived based on fundamental considerations about the markets, such as the 60/40 portfolio, for example.

I provide an example below of how data mining bias and multiple comparisons can misguide people into believing they have something, when in fact, they have nothing. The example is from a recent article in Seeking Alpha for 80/20 allocation in SPLV (low-volatility S&P 500) and TMF (3 x TLT). Obviously, after the recent rally in bonds, TMF has done very well. Below is the portfolio performance since inception of SPLV (blue line):

It may be seen that the portfolio underperforms SP500TR (red line) on a risk-adjusted basis. Although YTD performance is more than 10%, since 2012 it has been volatile, with 11% maximum drawdown versus 8.3% for SP500TR.

This is just an example of what is served to the public by users of online backtesting software after trying many combinations of assets and allocation schemes. I hope some robo-advisors are not using the same practice to determine allocations.

Quants and investors should understand the perils of trying to identify a good portfolio by backtesting. Unless there are sound fundamental reasons as to why a portfolio should outperform buy-and-hold, chances are high that in the future it will fail.

More information about the perils of backtesting and data mining can be found in my book *Fooled by Technical Analysis: The perils of charting, backtesting and data-mining.*

**Disclosure:** I/we have no positions in any stocks mentioned, and no plans to initiate any positions within the next 72 hours. I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it. I have no business relationship with any company whose stock is mentioned in this article.