Surprise! People Don't Trust Government Data

by: John Lounsbury

Okay, so I'm being sarcastic. Simon Briscoe at (here) discusses survey results that show similar results in the U.K., Europe and the U.S. Between 85% and 91% of those surveyed, amount depending on which country you look at, believe that government statistics are manipulated. About 70% in all countries believe the problem is compounded by spin from politicians and media.

Just 6% of people in the U.K. believe that government statistics are "honest". The number "surges" to 10% in France and Germany. The survey was conducted in all countries by The Financial Times.

I spend a lot of time studying data from government agencies, private industry groups and professional economic analysis entities. I am much more positive on the "honesty" factor; much of the data appears to be free from such actions as selective deletion or biased sampling. But that does not mean there are not problems. Some of these are discussed below.

Measurement Uncertainty

There are many problems with sampling error (ie, uncertainty in how well the sample represents the entire population). In the Dept. of Labor data, the employment number has a measurement uncertainty around +/- 300,000 every month. This error is often larger than the actual monthly change. So month to month changes in employment (and unemployment) that receive so much media publicity have a lot uncertainty and that is not adequately recognized. In November, for example, the official unemployment rate was 10.0%. Going to the extremes of measurement uncertainty, the unemployment rate might have been as small as 9.6% or as large as 10.4%.

Another area with sampling error is housing. While the sampling here is much more extensive, and therefore less subject to uncertainty, this data is much more reliable if 3- and 4-month averages are followed rather than looking at monthly fluctuations. An appreciation of the extent of sampling error can be gained by following multiple data sources, which sometimes have significant differences. Three data surveys for existing home sales are (1) the Case-Shiller Housing Index, (2) the National Association of Realtors and (3) Altos Research. The following graph shows how each has reported throughout 2009. Click to enlarge:

Are some of these data reports bogus? After all, early in the year Case-Shiller indicated prices were falling and the other two surveys showed price rises. More recently the reverse has been true. The differences should not cause us to decide any of the three data reports is worth less than the others. Over long periods of time (years) all three data sets track each other very closely. In any short period of time, sampling processes will produce deviations between what each of the three indicate. The problem is less in data reliability and more in what people try to read into it.

Everybody wants certainty. Data that is reported by government and private agencies contains uncertainty that is incurred in the data collection. This lack of certainty is a major factor in individuals not trusting data. People want data to tell what is happening. When we have conflicting data, such as displayed in the housing data above, we should be concentrating on what the data is not telling us. The message here is that there is not a clear signal that housing prices have bottomed or not bottomed in 2009. If the data is not clear, distrust is a frequent reaction.

People want to know what is known. Many will consider information worthless if it tells them what is not known. In this regard, I am reminded of an old admonition: It is a wise man who realizes what he does not know. Too many do not recognize this.

Analysis Procedures

There are problems in analysis procedure, as well. I have criticized the use of "body counts" in calculating the unemployment rate (here and here). A proposal has been made to use hours worked as the basis for calculating unemployment. With such a process, the unemployment rate would currently be over 16% (compared to the official rate of 10%). We have the sampling error uncertainty discussed in the preceding section (+/- 0.4% in the unemployment rate), but the analysis procedure difference here is an order of magnitude larger (6%).

The loss of over a million people from the labor force in 2009 is the result of another area of analysis procedure that should be reviewed. This was discussed here. This might be better characterized as a survey deficiency. It results from the requirement of an unemployed person having actively looked for work within the past four weeks in order to be included as an unemployed member of the labor force. If the labor force had not shrunk in 2009 (it should have increased since population increased), the reported 10% unemployment in November would have been close to 11%.

Data Adjustments

There are also problems with data adjustments. Many data sets are subjected to seasonal adjustments. This is not an attempt to hide anything; the unadjusted data is displayed along side the seasonally adjusted numbers. However, I have found that there appear to be problems with the seasonal models. Recently I found that the year to year change in housing prices (using NAR data) from October, 2008 to October, 2009 was not the same if I used seasonally adjusted numbers instead of not seasonally adjusted numbers. Does that mean seasonal adjustment in October 2008 is different from seasonal adjustment in 2009? This doesn't make sense to me.

Another data adjustment that appears to have weak justification is the "Birth-Death Adjustment" by the Dept. of Labor, attempting to account for the closing of small businesses and opening of new ones in real time, months before the state records are available to define the exact numbers. This has had a good history of estimates agreeing with the subsequent data up to 2008. It appears that it has gone astray in the recession,seen in the following graph. Click to enlarge:

The data adjustment error will be corrected when unemployment numbers for 2008 into early 2009 are adjusted after the fact by about 850,000. This correction will be applied in early 2010, but does nothing to help the misleading numbers that were "on the books" in real time and remained there until now.


There are problems with data and statistics. Some of then have been discussed here. But the data has great value and I wouldn't want to be without it. The use and analysis of the data can be improved. Sampling can be improved to decrease uncertainty in how well the data represents the entire population. Adjustments to data should be constantly questioned to make sure that historical correlations still apply.

When I read the Financial Times survey result that about 90% of Europeans and Americans do not believe government statistics, I expect that more than 80% of the same population do not understand any of the numbers themselves. Those that do understand them and do have questions about accuracy can do (and many actually do) a lot good work to produce value that is missed in the first pass.

Disclosure: No stocks mentioned.

About this article:

Want to share your opinion on this article? Add a comment.
Disagree with this article? .
To report a factual error in this article, click here