A classic shortcut of leapfrogging the hypothesis and methodology stages, and going directly to data analysis. Process: gather a bunch of data and run many statistical analyses every which way, looking for some relevant relationship to fall out. Then, circle back and dream up a reason (hypothesis) why the relationship is plausible. The easy access to data and statistical software makes data mining simple.
The first time I saw data mining in action was when I was consulting to a major New York bank’s economics department in the 1970s. Through computer time-sharing and the new Chase Econometrics database becoming available, data mining was doable. The bank formed a group of analysts whose job, the head person excitedly told me, was to test every possible correlation, trying to find something that looked significant.
Forgetting the null hypothesis
A null hypothesis is the proposition that there is no relationship among the data. Using yesterday’s example, the null hypothesis would be, “the Golden Globes best picture award announcement has no influence on the Academy Awards choice.” Like “innocent until proven guilty” in a trial, a hypothesis must be considered “null until proven otherwise.” That test is much more rigorous than the common look for reasonableness or plausibility.
Interpreting a correlation as cause and effect
Like yesterday’s example, the occurrence of a similar event in a time sequence (Golden Globes Award precedes Academy Awards, so similarity of choices is due to the former influencing the Academy’s members.) A simpler example perhaps shows the problem better – the town clock strikes twelve, and then the townspeople eat lunch – every day! The Pavlov effect? That question requires more study. The correlation, itself, cannot provide the answer.
Presuming that data have a “normal” distribution
I believe this is the biggest problem affecting investment statistical analysis today. Most statistical analysis methodologies are designed for normal distributions (the bell curve). However, much in investing in non-normally distributed. This means the results are truly unusable. Examples cover the waterfront: leverage, derivatives, convertibles, shorting and credit/bankruptcy risk (think junk bonds). These distributions are far from normal. The resulting bad analysis has produced many dazzling “surprises” and losses.
A companion area is “fat tails,” referring to bulges at the ends of an otherwise normally distributed set of data. The stock market has them, caused by those bursts of enthusiasm at a top and bouts of fear at a bottom. Both are emotionally based, so they accentuate the moves at the extremes. When they occur, we hear comments about it being a 100-year event, with the statement that no one could have seen it coming. Actually, they happen more often than that, and the data reveals the possibility.
So, look out for shortcut statistics. Their erroneous conclusions could lead your assets astray and put them at risk.
Disclosure: No positions