It is common practice among credit analysts to use historical default rates published by rating agencies as a proxy for a true forward-looking firm-by-firm set of modern corporate default probabilities. This analysis shows that such approximations of true portfolio losses are grossly inaccurate. In the current environment, historical losses reported by rating agencies overstate near-term losses and seriously underestimate long-run losses. The long-term underestimate is due both to the backward-looking nature of ratings-based self-assessment and due to the moral hazard of rating agency self-censorship, which has eliminated the defaults of FNMA and FHLMC from the AAA-rated category, for example. For these reasons, ratings-based credit portfolio simulations should not be considered as sufficient by any financial institutions regulator which takes its prudential responsibilities seriously.
The first use of credit ratings in the United States dates to 1860 with the founding of a predecessor to S&P Global, Inc. Despite the fact that the business model of the oldest rating agencies in the United States is largely unchanged since then, the use of credit ratings is still common in many large financial institutions and is indirectly encouraged by well-meaning bank regulators.
The problems stemming from rating agency behavior in the credit crisis of a decade ago have been well-documented by the U.S. Senate Subcommittee on Special Investigations and in the popular press. Most of the problems that have received the largest focus have been ratings of securitized products. In this series of notes, we focus on a quantitative assessment of the errors that stem from the “bread and butter” of rating agency revenues and profits: the assessment of the credit quality of individual public firms.
In the first of these notes, we measure the errors in assessing the riskiness of a portfolio made up of every rated public firm in the world over time horizons from 1 year to 10 years using common practice, the use of historical ratings-based default rates in the portfolio simulation.
Section 1 of this paper compares the business model used by traditional ratings agencies with modern big data-based quantitative default probabilities, using the default probabilities of Kamakura Risk Information Services ( www.kamakuraco.com) as an example. Section 2 discusses the structure of the credit portfolio simulation that will be used to measure the differences in implied default experiences. Section 3 summarizes the results of the credit portfolio simulation, and Section 4 summarizes the conclusions.
1. The Ratings Business Model
The basic business model of the credit rating agencies has been largely unchanged during the 158 years since the founding of a predecessor to S&P Global, Inc. We can summarize the major features of a typical “rating system” with a series of bullet points. Please note that the simplifications used by the rating agencies are the result of a well-considered business strategy set in 1860 and, despite the 158 years of opportunities to refine this strategy and make it more sophisticated and accurate, the rating agencies have chosen not to do so.
Here are the key characteristics and implicit assumptions of a corporate rating system:
Public firms are grouped into N categories (20 is typical), which are intended to vary by the degree of credit risk. There is no term structure of credit risk provided. There is no explicit default probability or time horizon over which this default probability is relevant specified by the rating agencies. There is no explicit recovery rate provided. The median time between ratings changes found in a recent study is 815 days.
The rating agencies provide historical data on default rates by ratings categories and recovery rates by ratings categories for public firms, but not for securitized assets. These self-assessments are subject to the moral hazard of rating agency self-censorship. The omission of the failures of Federal National Mortgage Association and the Federal Home Loan Mortgage Corporation, both ISDA events of default, is just one example of the fact than an independent calculation of the historical default rates by ratings grade will be more accurate than that provided by the rating agencies themselves.
As a point of comparison, we summarize a completely different set of business strategies put in place by quantitative default probability providers like Kamakura Corporation.
Public firms are grouped into 10,001 categories, by default probabilities that vary in 1 basis point increments from 0.00% to 100.00%. A full term structure of monthly default probabilities from 1 month to 120 months is provided. The KRIS website focuses on the primary maturities at 1 month, 3 months, 6 months, 1, 2, 3, 4, 5, 7 and 10 years, but the monthly default probabilities are available for download. Each default probability is associated with an explicit time horizon. The default probabilities, when combined with traded bond prices, imply specific levels of the recovery rate and liquidity risk for each bond issuer (see Jarrow and van Deventer, 2018). The median time between default probability changes is one day.
Note that the terms “point in time” and “through the cycle” are irrelevant to a quantitative default probability provider. Those terms exist only because of the conceptual dilemma of the ratings model where there is no term structure, as explained here. For a quantitative default probability provider like Kamakura Corporation, every day’s default term structure is a point in time term structure, and the best estimate of the through the cycle default probability is the longest maturity default probability available for that issuer. For the remainder of this series of notes, we will not use the terms “point in time” and “through the cycle” because they have no economic meaning in the context of modern credit portfolio management. Instead we use default probabilities which have an explicit term structure.
Why have the rating agencies not improved the sophistication of the ratings system over the last 158 years? Frankly, that is a mystery to this author. The closest thing to an explanation was given by a Managing Director of Moody’s in a credit conference in Chicago in the fall of 2008 after the magnitude of the unfolding credit crisis was apparent. “Our clients prefer stability over accuracy,” said the Moody’s representative. As it turned out, the credit crisis shows that the users of legacy credit ratings got neither stability nor accuracy.
We now focus on accuracy in the rest of this note.
2. The Credit Portfolio Management Simulation
We use extensive conversations with regulators, clients and friends of the firm to characterize the elements of a typical “common practice” credit portfolio management simulation. We note that the emphasis is on “common practice,” not “best practice.” A common practice simulation has the following elements:
Each public firm’s risk is measured by its legacy credit rating The user supplies the default rate attached to each rating, because the rating agencies themselves do not. We use historical long-term weighted average default rates in the public domain to make this assignment. The default rates used were the most recently available default probabilities in the public domain. 1 2016 Annual Global Corporate Default Study And Rating Transitions,” April 13, 2017.
Note, however, that the long-term average default rates were reported only for seven broad ratings grades: AAA, AA, A, BBB, BB, B and CCC/C, not for the more granular 20 grades upon which averages could have been reported. See appendix A for the default probabilities used.
We assume that the portfolio simulation covers a portfolio with equal weighed exposures to every rated public firm in the world. We simulate default/no default consistently over time horizons of 1 year, 2, 3, 4, 5, 7 and 10 years. In the ratings case, since there is only one default rate per ratings grade, we could enumerate the distribution of defaults using the binomial distribution assuming an infinitely large number of simulations. For comparison with the “challenger model” of KRIS default probabilities, we instead simulate default/no default for 100,000 scenarios using annual time periods from 1 through 10 years.
Now it is well-known that reduced form default models are much more accurate that agency ratings in predicting default (see Hilscher and Wilson, 2016). This common practice simulation is often selected by analysts who, for whatever reason, feel that simplicity is more important than accuracy for the calculation at hand.
As a challenger model, we use the June 15 term structure of default probabilities provided by KRIS for 2,765 public firms with ratings. The default/no default performance for each firm is simulated for both the ratings-based model and the KRIS-based model following the assumptions of Jarrow, Lando and Yu . The dispersion of KRIS default probabilities by credit rating is shown for June 15 in this graphic from Kamakura Risk Information Services:
We are most interested in whether or not the ratings-based model is “roughly the same” as a more accurate model using default probabilities for each individual firm. We turn to that assessment in the next section.
3. Results of the Simulation
The results of 100,000 scenarios for the first year of the 10-year period are shown in this graphic:
Using the KRIS default probabilities, the number of defaulters among the 2,765 public firms in existence on June 15, 2018 ranged from 2 to 35 (shown in blue). The number of defaulters using the historical default rates by ratings grade (shown in red) was much higher, ranging from 13 to 63 firms. Contrary to the expectations of many, using legacy ratings produces HIGHER loss rates than using modern reduced form default probabilities like those from KRIS if the time horizon is one year.
At a 2-year horizon, the cumulative default distributions are as follows:
The losses using the KRIS challenger model range from 26 through 85 firms. Using historical ratings-based default rates implies a loss experience ranging from 38 to 101 firms, higher on average than the KRIS model would indicate.
At three years, the results are more heavily overlapping:
The range of losses simulated using the KRIS default probabilities runs from 48 to 129 firms. The ratings model shows a loss experience from 60 to 140 firms, higher but closer to the KRIS experience than for 1-year and 2-year horizons.
At 4 years, the two models show their closest result.
The range of losses is nearly identical although the mode of losses is still less for the KRIS model than it is for the ratings model. The range of losses is from 85 to 174 firms for KRIS and from 88 to 175 firms for the ratings-based model.
At 5 years, the KRIS results show that the current environment implies a higher degree of losses in the long run than the historical 1-year ratings-based loss rates imply:
The KRIS-based loss experience runs from 126 to 240 firms. The ratings-based simulation shows a range from 116 to 214 firms. There is no basis on which the ratings-based estimate is “better,” particularly since the ratings-based default rate is backward-looking for a quarter of a century. The KRIS default probabilities, by contrast, are forward looking for the next 10 years.
The difference becomes starker at 7 years:
The KRIS default probability-based simulation shows a range of losses from 240 to 391 firms. The backward-looking ratings-based default rates show a lower range from 172 firms to 287 firms, a dramatic understatement of the bottoms-up analysis that the KRIS default probabilities provide.
The final view is of the cumulative losses for a 10-year horizon.
The KRIS default probability simulation provides a range of losses from 308 firms to 475 firms. The ratings-based results show much lower losses, because of the backwards-looking nature of the default rates used, from 243 to 373 firms.
We summarize our conclusions in the final section.
The findings of Hilscher and Wilson make it clear that even a simple logistic regression model predicts corporate failures more accurately than legacy credit ratings. This conclusion should surprise no one. The U.S. Senate report on the rating agency performance during the credit crisis and subsequent fines for their behavior shows that their accuracy problems stem not only from the primitive technology used to assess risk but also due to the many conflicts of interest of an issuer-pays ratings model. The omission of FNMA and FHLMC from the self-reported loss experience by ratings grade and the complete failure to report on asset-backed securities ratings during the credit crisis shows that the concerns of the U.S. Senate report are well-founded and continue to be relevant.
The simulations of the KRIS-based loss rates versus the ratings-based loss rates lead to a number of stark conclusions. First, the ratings-based simulations are particularly blind to the term structure of default, ignoring the relatively bright outlook for credit in the short term and the relatively grim long-run outlook that the KRIS default probabilities show. Second, it is not correct to argue that “simple is better because the results are about the same” as a detailed issuer-based default probability simulation would provide. In fact, the ratings-based assessment is very different in both the short run and the long run. Third, over the full 10-year horizon, the ratings-based assessment is simply too low in part because the rating agencies have just erased their mistakes, like FNMA and FHLMC, as Jarrow and van Deventer feared in their 2009 article describing the credit crisis as a "ratings Chernobyl.” Finally, since the taxpayers ultimately end up holding the bag when the bias of the rating agencies becomes apparent, no regulator who cares about their prudential responsibility should view a ratings-based credit portfolio simulation as sufficiently accurate to pass regulatory muster.
Hilscher, Jens and Mungo Wilson, “ Credit Risk and Credit Ratings: Is One Measure Enough?” Management Science, October 17, 2016.
Jarrow, Robert, David Lando, and Fan Yu, “ Default Risk and Diversification: Theory and Applications,” Mathematical Finance, January 2005, pp. 1-26.
Jarrow, Robert and Donald R. van Deventer, “ The Ratings Chernobyl,” Kamakura Corporation blog at www.kamakuraco.com, reproduced by the Global Association of Risk Professionals and www.riskcenter.com, March 9, 2009.
Jarrow, Robert and Donald R. van Deventer, “The Valuation of Corporate Bonds,” Kamakura Corporation and Cornell University memorandum, May 1, 2018.
S&P Global Ratings, “Default, Transition, and Recovery: 2016 Annual Global Corporate Default Study And Rating Transitions,” April 13, 2017.
United States Senate Permanent Subcommittee on Investigations, Committee on Homeland Security and Governmental Affairs, “ Wall Street and the Financial Crisis: Anatomy of a Financial Collapse,” April 13, 2011.
van Deventer, Donald R. “’Point in Time’ versus ‘Through the Cycle’ Credit Ratings: A Distinction without a Difference,” Kamakura Corporation blog at www.kamakuraco.com, May 9, 2009.
Appendix A: Historical Default Rates Used
We define common practice as using the historical ratings-based default rates that are available for free in the public domain. The default rates meeting this definition are shown here. We used the weighted long-term average in this report:
- S&P Global Ratings, “Default, Transition, and Recovery:
Disclosure: I/we have no positions in any stocks mentioned, and no plans to initiate any positions within the next 72 hours. I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it. I have no business relationship with any company whose stock is mentioned in this article.