How do we know we have enough samples of whatever it is we are testing? What amount of trades is the number? When can we be confident we have enough trades to rely on the results of our back test?

Jacob Bernoulli, the famous Swiss mathematician, is best known for his introduction of the theorem known as the law of large numbers. Wikipedia defines the law of large numbers as “a theorem in probability that describes the long-term stability of the mean of a random variable. Given a random variable with a finite expected value, if its values are repeatedly sampled, as the number of these observations increases, the sample mean will tend to approach and stay close to the expected value (the average for the population).”

In other words, since the probability of flipping a coin and seeing a head (or tail) is 50%, the more we flip the coin, the more we will see the frequency of heads or tails approach 50%. Also, the expected value (50%) becomes more reliable the more we flip.

In his book “Against the Gods”, Jacob Bernstein points out where Bernoulli performed various studies using marbles to conclude that a sample of 25,500 trials was necessary to achieve what he called “moral certainty.” Moral certainty he defines as a 2% chance of error.

He understood that the Law of Large Numbers is very important because it stabilizes long term outcomes for random events. In Las Vegas a player may beat the table in a given day, but eventually, whatever edge is built into the game will dominate, the more the game is played.

Is this true of trading systems though? I don’t think so.

First, there are big differences in reliability when testing such things as trading concepts and coin tosses. In the *“real world”* there are few longer term trading systems that can even produce 25,000+ trades as a sample size. Even if we could, Bernoulli had the advantage of working with a closed sample (5,000 marbles). Traders take in new data every day as new trades surface. This is the equivalent of throwing more marbles in the jar, perhaps even different colored marbles. Although a back test with 5,000 trades is likely to be more reliable than one with 200, it still has limitations that should be viewed with skepticism.

Charles Maley

www.viewpointsofacommoditytrader.com