Predictive analytics intrudes on our lives every day. Amazon recommends products (sometimes, admittedly, bizarre -- I've received ads for headlights, guitars, and baby diapers), Stop & Shop sends targeted grocery coupons, and the same politician crafts different ads for Republicans and Democrats. Eric Siegel takes the reader into this powerful and potentially troubling world in Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (Wiley, 2013).
The glut of data is growing by "an estimated 2.5 quintillion bytes per day (that's a 1 with 18 zeroes after it). ... As data piles up, we have ourselves a genuine gold rush. But data isn't the gold. I repeat, data in its raw form is boring crud. The gold is what's discovered therein" (p. 26-27). Enter predictive analytics, "technology that learns from experience (data) to predict the future behavior of individuals in order to drive better decisions" (p. 37).
I'm going to look at a single use of predictive analytics, how it has been used in data-driven black box trading. For this purpose Siegel turned to John Elder, now head of the largest predictive analytics services firm in North America, who claims that "Wall Street is the hardest data mining problem" (p. 74). Early in his career, while still a graduate student at the University of Virginia, Elder reverse engineered a black-box trading system that claimed 70% accuracy with its predictions on whether the S&P 500 would go up or down the following day. It turned out that the so-called predictions "were based in part on a three-day average calculated across yesterday, today … and tomorrow. The scientists had probably intended to incorporate a three-day average leading up to today, but had inadvertently shifted the window by a day. Oops. … Any prediction it would generate today could not incorporate the very thing it was designed to foresee -- tomorrow's stock price" (p. 77).
After he finished his dissertation and became a postdoc, Elder created a trading system that detected a slight but repeatable pattern among the overwhelming market noise. He risked everything he had on his system:
'Going live with black box trading is really exciting and really scary,' says John. 'It's a roller coaster that never stops. The coaster takes on all these thrilling ups and downs, but with a very real chance it could go off the rails.'
As with baseball, he points out, slumps aren't slumps at all -- they're inevitable statistical certainties. Each one leaves you wondering, 'Is this falling feeling part of a safe ride, or is something broken?' A key component to his system was a cleverly designed means to detect real quality, a measure of system integrity that revealed whether recent success had been truly deserved or had come about just due to dumb luck.
John's system was a phenomenal success, increasing his assets at a rate of 40% a year. Investors started signing up until finally the fund was managing a few hundred million dollars. But "after nearly a decade, the key measure of system integrity began to decline. John was adamant that they were running on fumes, so with little ceremony the entire fund was wound down. ... [A]ll the investors came out ahead" (p. 83).
I'm sharing this excerpt with the hope that someone can explain to me how one might go about measuring system integrity, at least in principle. You can post a comment below.