Current exposure to data of all kinds is the new hazard that our information age has brought us to bear, or should we call it the mis-information hazard that we would continue to face in the foreseeable future. The World Economic Forum also lists this rising impact of mis-information as one of the critical risks for the current times, citing that You-Tube alone downloads every minute 48 hours of viewing content and many other such examples; the more we adjust ourselves to this rising tide of 'loads' of information, we need to have a mental make- up of what to reject and what to accept as our brains process (sometimes rationally and sometimes irrationally) information continually.
In Statistics, we make two kinds of errors, accepting a Null Hypothesis, when it is actually False and rejecting a Null Hypothesis when it is actually True. Our mental faculties are continually doing this without statistical aid of any kind and have to depend on commentators and advisers who neither have time nor have the wherewithal to do this when a range of issues have to be dealt with almost on an hourly basis as the day progresses.
I did find it amusing that when someone starts the internet explorer early in the morning at 7: 15 a.m. Zurich time, the market is already at peak hours in the eastern part of Asia and Tokyo, Shanghai and Hong Kong is already giving some headline news, only to be taken forward by some of the lesser potent exchanges like the Bombay Stock Exchange and when the first bourses of the European system opens up its early hours of trading, the information becomes more dense with the Financial Times reporting certain analysis of the earlier sessions of Wall Street that could impact the day's trade in Europe. One who had already gone through these sessions in New York and had seen either a placid setting or a 'topsy- turvy' volatility, would have found such an analysis either biased or timidly explorative of the binding nature of stock trade, that has to deal with impacts of daily information on the minds of the investors and also assess how investors would be actually likely to behave given the change in the information from that of the previous day.
Imagine if one has to do Type 1 and Type 2 Error on these multitudes of analyses that happen and their impact on investor response!
It would be interesting to understand the rising need of daily data loads and their analysis. If one is a CEO, the daily stock data of his own company is perhaps the one single metric that embodies the cumulative appraisal (appreciation or rejection) that the investors of his company has returned that had netted of all the goods against the bad to arrive at the valuation of the company in terms of its expected future cash flows. This is what the wisdom of the crowds is all about, that by churning a few thousand information points (related to the company or un-related) almost on a daily basis the investors are appraising the company and on their daily appreciation or rejection of the stock, a company's valuation dwindles.
Isolating a single company to be able to do this is next to impossible, so the investors have simplified the problem by introducing a group of companies and further simplification by the institutional investors have seen the entire exchange club a range of companies together to represent the do-gooders like the Dow Jones Industrial Average (where 30 companies are bunched together) or the S&P 500, where 500 do-gooders are bunched. Either way it provides the investors with a yardstick to compare what an individual stock did against the performance of this bunch and vice versa. To do this on an hourly basis or daily basis and have objectivity in the comparison, really would need large scale use of portfolio analysis and advanced mathematics to aid the process. But investors normally do not wait for such analysis, when decisions have to be taken faster than the time available for doing analysis.
Large scale use of heuristics is the norm of the day, but one has to be careful that heuristics could give erroneous results as we take Taleb's famous example of the Turkey. The example is important as daily data of any statistic is analyzed. The story ran like this that as if one plots the weight of the Turkey on a graph, then it would show a rising trend till the date, when the Thanksgiving Day arrives and the weight of the Turkey becomes meaningless. The best example of this is Lehman Brothers stock, till it became bankrupt one day. In statistical language it means absence of small probability events that could have high impact that is not present in the current data could lead us to infer wrongly on the current state of things (taking their probability as zero) and when eventually that 'risky' event would materialize in the long run it would take us by a storm. There is also the inverse Turkey example, where currently there could be presence of small probability events with high impact vitiating the data, which may not be present later on. The best example is a chaotic economy like Greece with presence of all kinds of fiscal problems and debt overhang, forcing the policy makers to ignore the fact that presence of such cathartic events in the long term would be absent and policy stances should take cognizance of that.
Presence of big outliers is a very common phenomenon, in fact Taleb in his latest book 'Anti-fragile' goes one step further to say with many examples that the most uncommon (low probability) makes its presence most conspicuously, for example less than 0.2% of the authors make it to the best selling list, or the largest number of commentaries are written by only a few people, and the list goes on. The fact remains that we are completely ignorant of the fact that what we assume as a normal distribution is either a concavely skewed one or a convexly skewed one.
How does this impact our daily analysis of data is the question. For this let me take the example of opacity in daily data and how the current analysis fails to understand the intrinsic factors. I had actually tracked for almost five years, the housing starts in U.S. as I was studying the rise of Aluminum LME in the period 2001-2006. This was a period that saw U.S. housing starts to keep building a stock of houses which almost had no occupier and while this was happening we saw a furry of activities to keep pushing the loan growth through many schemes that had the tacit connivance of a host of agencies that included the regulatory ones as well. In many of the economic data, the housing starts featured as the one single biggest contributor to U.S. economic health. This had to be for two reasons, one that it created an unprecedented loan growth that created a financial industry behemoth attracting thousands of skilled people, and secondly it helped some of the ancillary industries related with housing. But for those like me who were tracking daily, monthly and yearly data of housing starts in U.S., it was like following a Turkey with an unknown Thanksgiving Day.
How do we remove the Turkeys from the data, the inverse Turkey included? This is to be done though a normalization process, but this can actually happen if our mental set-up first accepts that data is actually skewed and that to transform a convex or concave distribution, one would have to take out the assignable causes that make them skewed first, before any meaningful analysis can be done. Unfortunately the world around us is happy to have data that is skewed as it helps in instant gratification, in projecting benefits that would vanish if data was normalized. On the contrary data is insidiously normalized with the sole purpose of projecting a better picture, never the worse.
These fallacies are outside the realm of statistics, and have more to do with 'results before analysis' syndrome, something that opacity helps to administer, much to the loss of the general understanding of what the intrinsic factors are and their impact on the future. This is one of the reasons that financial data has moved to opacity as never before, as few can question their veracity.
Opacity helps in the wisdom of the crowds to do wonders, in the short run.