In previous articles, we have examined the total wealth of households in the United States and the major categories of income they receive. Comments received on those articles focused heavily on distribution questions, and sometimes expressed incredulity at the overall wealth and income levels depicted in those aggregate statistics.
I believe these issues arise because the general shape of the distribution of income in the US is poorly understood. People expect single distribution location statistics like the mean or median to accurately summarize the full distribution. This is far from true for sharply peaked distributions like those always seen for wealth and income. This article will describe the empirical US income distribution, and then presents a statistical model that readily generates similarly shaped distributions, helping us to see how they arise.
My information on the empirical US income distribution comes from the IRS statistics. The IRS publishes reports on the distribution of US adjusted gross income (AGI) by filing, with some lag for arrival of all returns and data preparation. The most recent report of this kind covers calendar 2013. The distribution shapes seen do not change rapidly, so the data being two years old will not make much if any difference in the conclusions we can reach from this type of data.
The IRS page where the reports are published can be found here:
The data are reported by filing, not by person or household, and that fact needs to be kept in mind throughout the following analysis. Couples can file jointly in one larger return, or separately in two smaller ones. Households may have multiple filers for other reasons - e.g. children working while living at home. A portion of income tax filers have little or no adjusted gross income, either because they worked only for a limited time in the year in question, or because they are retired and are only reporting interest income, and similar reasons.
A small portion of filers report negative income for the year, reflecting deductions greater than income, loss carry-forwards, and similar accounting issues. Those filings have been excluded from the analysis in this article. In addition, the AGI line item on a Federal tax return excludes some allowed categories of gross income for those who itemize (on form 1040 but not 1040EZ) - IRA contributions, tuition and fees, student loan interest, health savings account contributions, moving expenses, and the like. Keep those provisos in mind in what follows.
Overall positive reported AGI in 2013 totaled $9.29 trillion, spread among 145 million filings, giving an average AGI per filing of $64,000. As we will see in detail below, the median income - counting by filers - is well below this figure, somewhere around $35,000, while the median dollar of earned income - counting by dollars - is well above it, somewhere around $125,000. The $64,000 mean figure is not misleading for the middle class, but still understates the level of income at which the average dollar is earned. The median figure is a decent approximation to the status of the working poor, only. Very little of overall US income is earned at or below that level of income.
To show these relationships, I have summarized the portions of population and income in various income categories in the following table:
The first column gives an income range in thousands of dollars of annual AGI. The second column gives the portion of all filings that lie within that income range. The third column gives the portion of all income that falls within that bin. The last column gives the ratio of the third column to the second, which can be read as the number of times its population "share" the income level in that bin represents. For example, the $50-75K slice accounts for about 13% of the population and also 13% of all income, with a ratio just below 1. The $100-200K slice, on the other hand, is 11% of the population but has 24% of all income, with a ratio just above 2.
To find a median by population, scan down the second column adding figures for each row until we go over 1/2. We are already there with just the first two rows - 63% of all filers are below $50K in income, and between them have only 21% of all income, about 1/3 of their numerical "share" in a return-count sense. This tells us the median AGI by return count is under $50K. Further analysis inside the figures shown put it in the upper $30K range.
To find a median by dollars, on the other hand, scan up the third column adding income portions in the same manner. The bottom three rows sum to 54.5% of all income while counting only 15% of all returns filed, with about 3.6 times their numerical "share" in a return-count sense. This tells us that the median dollar of income is earned above the $100K per year level. Further analysis within the figures puts it in the $125K range, at about the 13th percentile of filings.
The latter aspect of income distributions is generally much less well known than the former. The popular press endlessly reports statistics about the "median income", always presenting this as a meaningful average or norm, and silently implying that just about everyone is like that median. However, if you are interested in counting the dollars that might walk in to your place of business, such statistics are extremely misleading. Half of all income is earned above the $125,000 a year level. That subset of households more than makes up anything they may lack in numbers by the depth of their wallets.
Next, I want to draw attention to a different row-by-row sum, this time of the middle lines of the table from the $50K per year line to the $200K per year line. This set excludes the first and last two rows of the table. 48% of all income is earned in those intervening three lines, which represents 33% of all filings. This is the true American middle class - not the median which tracks basically the second row of the table, corresponding to a "working poor" instead. There are 48 million filings in this middle set, with $4.5 trillion in aggregate income, an average of $93,000 income per filing, and a combined ratio of 1.45 times their filing count "share".
Now I want to present a theoretical model of income that is readily able to reproduce distributions similar to those seen empirically for income, in the US and elsewhere. First imagine that everyone had an equal prior probability of success on a large number of random trials, that when combined correlates with income. We would not see equal outcomes from that hypothesis, but would instead see a bell curve of outcomes, as some individuals just got luckier in their random trials and achieved more successes than others. What modification is necessary to instead see an outcome distribution that matches what we see for AGI in the US?
All that is required is to change the assumption of an equal prior success probability, and to instead assume that prior success chances are lognormally distributed. Below I show one such distribution, a lognormal with mean -2.7 and standard deviation 1.15. The mean success chance with this procedure is e^(-2.7) = 6.72% chance per trial while one standard deviation better yields a chance of e^(-2.7+1.15) = 21.2% chance of success per trial. The full distribution is shown below as a histogram of "bins", each 2% wide in the success chance, with the height of each "bin" corresponding to the portion of the whole population with that success chance:
Next, we just let a population of 10,000 individuals with that distribution of prior skills conduct 25,000 random trials each, and count the number of successes each achieves. Sort the resulting population by its number of successes, from the most successful to the least. The highest will of course reflect those with the highest prior success probability, then a portion of those with slightly lower prior chances but better than average luck and so on down the set of composed trials.
We can then ask what portion of the total successes were achieved by what portion of the population? We can plot the result with a cumulative population portion on the horizontal axis and the cumulative success count on the vertical axis. Then, we plot the empirical cumulative distribution of US AGI as reported above on the same diagram. Here is the result:
I submit that the closeness of that fit cannot be the result of chance. Of course, I adjusted the parameters of the lognormal prior skill distribution to the values of -2.7 for the mean and 1.15 for the standard deviation of log skills to get a good fit. The sharply peaked shape of the US income distribution must reflect some similar distribution of the prior ability to earn income. Income distributions are not normal because prior ability to earn income is not equal across all individuals. All that is necessary to generate close fits to actual income, though, is the parsimonious assumption that prior income-generating ability is lognormally distributed, instead.
I hope this is interesting, and as always, comments are welcome.
Disclosure: I/we have no positions in any stocks mentioned, and no plans to initiate any positions within the next 72 hours.
I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.
Additional disclosure: Long US stocks, long US corporate bonds, long US financials.