*By Samuel Lee*

*A version of this article was published in the August 2013 issue of* Morningstar ETFInvestor. *Download a complimentary copy here.*

Yesterday, I briefly explained the three major risk factors that determine most assets' behavior: economic growth, inflation, and liquidity. These three ingredients are why stocks, bonds, and many alternative strategies have higher expected returns than cash.

Today, I'm going to talk about the standard linear factor model. The linear factor model attempts to answer a simple question: Can a return stream be broken down into explainable and unexplainable parts?

Most studies looking at mutual fund manager performance use linear factor models to disentangle the roles of skill, luck, and risk in producing a given manager's return stream.

Roughly speaking, a linear factor model creates the best-fitting custom benchmark for an asset. It provides additional information, such as the constitution of the benchmark, how nicely the benchmark fits the asset's returns, by how much the asset beat the benchmark, and how likely these findings are attributable to chance.

My goal for this primer is to provide a practical understanding of factor models. I try to keep the math as simple as possible, but some familiarity with statistics helps a lot.

**The Case of Fidelity Magellan**

Exhibit 1 is a scatter plot of the monthly returns over cash of Fidelity Magellan (FMAGX) (plotted on the vertical or y-axis) and the U.S. stock market (plotted on the horizontal or x-axis) from May 2003 to March 2013.

The graph shows a strong linear relationship between the market and Magellan: When the market is up, Magellan is up about the same amount; when the market is down, the fund is down the same amount.

The data suggest there is a strong fundamental relationship between this fund and the market. And indeed there is: Magellan is a well-diversified basket of U.S. stocks and therefore is exposed to the same broad macroeconomic risks as the U.S. stock market.

**A Brief Statistical Detour**

The factor models we're interested in use linear regression, a statistical method that fits a line through two sets of data, a dependent variable that we want to "explain," and one or more independent or explanatory variables (which, naturally, do the explaining). Fidelity Magellan is our dependent variable, and the stock market is our explanatory variable. We could use the fund as the explanatory variable and the market as the dependent variable, but that would be getting our causation backward: Changes to Magellan don't move the markets; markets move Magellan. So we always begin with a fundamental story before setting up the scaffolding around the numbers.

The fitted regression line through the scatter plot provides an estimate of the true relationship between the market and this fund.

Recall from high school geometry that the equation for a line follows the form

y = mx + b,

where m is the slope term (a measure of the line's steepness) and b is the y-intercept term (the spot where the line crosses the y-axis). By the conventions of financial statistics, the intercept term b is denoted α, Greek for alpha, and the slope term is denoted by β, Greek for beta, and the terms are rearranged such that

y = α + βx,

where y is the asset's excess return (asset return minus the risk-free or cash rate), and x is the market's excess return (market return minus the risk-free rate). Keep in mind these are periodic returns--daily, weekly, monthly, and so on, though monthly returns are most commonly used. In order to make y and x more explicit, they'll henceforth be renamed R-Rf and Mkt-Rf, respectively, where R is the fund's monthly return, Rf is the risk-free rate of return, and Mkt is the S&P 500's return:

R-Rf = α + β*(Mkt-Rf)

The linear regression procedure finds the values of α and β that produce the best-fitting line. (A technical note for the curious: The line is fitted to minimize the total sum of the squares of the vertical distances between the data points and the line.) In this case, the procedure estimates the following:

R-Rf = -0.35+1.16(Mkt-Rf)

This equation is straightforward to interpret; for each percentage-point change in the U.S. stock market's monthly excess return, Fidelity Magellan's monthly excess return is predicted to move in the same direction by 1.16 percentage points, minus 0.35 percentage points. For example, the equation predicts that if the market is up 10% one month, the fund will be up 10%*1.16- 0.35%= 11.25%.

The equation tells you two things. First, Magellan's return pattern can be replicated by simply leveraging the S&P 500 by 1.16. Second, Magellan underperformed that simple leveraged strategy by 0.35 percentage points a month, or 4.2 percentage points a year (0.35%*12).

In finance jargon, the fund's beta to the market was 1.16, and its annual alpha was negative 4.2%. Sound familiar? That's because this is what experts mean when they talk about beta and alpha.

Of course, Magellan could've been unlucky. Recall that the linear regression is an estimate of the true relationship between this fund and the market. The true relationship could actually be something like y = 0.50 + 1.20x, for example, and the period we looked at captured some extreme data points that skewed our estimates of alpha and beta. Of course, no one knows the true relationship; statistical methods can provide only informed guesses as to what it actually is.

The statistical uncertainty of the alpha and beta terms is quantified by the p-value, which indicates the percent chance of obtaining as extreme of a value if the alpha or beta were zero.

An abbreviated form of the regression output looks like this:

where the intercept is the monthly unexplained return, or alpha, and Mkt-Rf is the market factor. The intercept's p-value shows that there's only a 1% chance an outcome this bad or worse could've occurred due to luck, assuming the fund's true alpha is zero (that is, the fund's managers had no skill). The t-statistic is another way of expressing the p-value (and favored over the p-value; however, its interpretation isn't as intuitive, so I'm glossing over it). Finally, the R2 is a measure of fit. An R2 of 100% indicates that the model perfectly fits the data. A value of 0% indicates it's completely unrelated to it. The regression's R2 is very high, indicating it does a good job explaining Fidelity Magellan's monthly returns.

**Multifactor Models**

The single-factor model I used to examine Fidelity Magellan's return is a version of the capital asset-pricing model, or CAPM, which predicts that the only way to obtain higher returns is to increase exposure to market beta. One way to test this claim is to sort stocks by beta and see whether higher beta is associated with higher returns. Since the 1960s, researchers have known that it's not.

In the 1970s and 1980s, researchers found that fundamentally cheap stocks beat expensive stocks (value beats growth), and that small-cap stocks beat large-cap stocks (small beats large), even when market exposure is controlled for. Running with these findings, Eugene Fama and Kenneth French augmented CAPM with two factors capturing the excess returns of value and small-cap stocks, producing the now famous Fama-French model.

They constructed their value factor by simulating the return history of a long-short strategy that every year bought low price/book (value) stocks and short-sold high P/B (growth) stocks. Similarly, their size factor was simply the returns from buying small-cap stocks and short-selling large-cap stocks each year.

They didn't add value and size arbitrarily. They found that the lower the P/B or smaller the market cap, the higher a stock's returns were, and this relationship was smooth. On the other hand, beta didn't seem connected to a stock's returns in any meaningful manner. This was, in their eyes, convincing evidence that the stock market "priced" size and value as "risk factors," meaning they were associated with higher returns because they were somehow riskier.

Around the same time Fama and French created their famous namesake model, Narasimhan Jegadeesh and Sheridan Titman found that stocks with relatively high recent returns beat stocks with relatively low returns. Soon, Mark Carhart extended the Fama-French model with a momentum factor, constructed by simulating the returns of a monthly strategy that bought the best-performing stocks by trailing 12-month returns, excluding the most recent month, and short-selling the worst-performing stocks.

The Fama-French-Carhart model has been a mainstay of academic and practitioner research since. It looks like this:

R-Rf = α + β*(Mkt-Rf) + β*HML+ β*SMB + β*UMD,

where R is the return of the asset, Rf is the risk-free rate, α is the unexplained return, Mkt is the U.S. market's return, HML (high-minus-low) is the value-factor-mimicking portfolio's return, SMB (small-minus-big) is the size-factor-mimicking portfolio's return, and UMD (up-minus-down) is the momentum-factor-mimicking portfolio's return. (Technically, each instance of the β term should have a subscript or superscript to denote that they're different, but display limitations prevent us from showing it. There should also be an error term at the end of the equation, but it's omitted for concision.)

This equation simply says that an asset's return above cash can be described as a linear combination of exposures to market, value, size, and momentum factors, plus an unexplained alpha. The β or beta coefficients in front of each term show how sensitive the asset is to each factor, holding all other factors constant.

Notice that the equation looks very much like the simple line equation we countered earlier. That's because they're nearly the same thing, except this time the line is fitted to four different explanatory variables instead of one.

The multiple linear regression procedure attempts to find the alpha and beta coefficient values that best explain the asset's returns.

The table below shows the results of a Carhart regression of Fidelity Magellan's monthly returns. The interpretation is similar to the single-factor regression we looked at earlier. Take HML, the value factor. Its coefficient, or loading, of negative 0.20 means that for each percentage point value stocks beat growth stocks in a month, Magellan's monthly return is predicted to fall by 0.20 percentage points, all other factors held constant. Magellan has a negative loading to the value factor; that is, it's behaved like a growth fund.

Notice that the p-values of the SMB and UMD factors are 0.67 and 0.20, respectively, and the loadings are close to zero. This says that there's a 67% (20%) chance of obtaining as extreme of a value or greater if the true SMB (UMD) loading were zero. By convention a p-value of 5% or lower is considered "statistically significant."

Overall, the results suggest that Magellan had high market beta, a pronounced growth tilt, a large- to mid-cap size tilt, and little exposure to momentum stocks. Unfortunately, it underperformed its regression-based benchmark by 3.85% per year, and this outcome was statistically significant at the 2% level.

**The Use and Abuse of Factor Models**

Factor models have limitations. First, they often need a lot of historical data to detect statistically significant evidence of skill (or lack thereof). A manager who beats his regression-based benchmark over a decade may still not be deemed to have surpassed the hurdle of statistical significance. Such is the limitation of data. This is why finance researchers often claim that it's hard to be certain a manager is skilled, even with many years of performance data to consider.

Second, even if you find a strategy with economically and statistically significant alpha, it's unlikely to persist. So you can't run around with a Carhart regression, check every single mutual fund out there, buy the highest-alpha funds, and go on to enjoy outsize returns. In other words, past performance doesn't predict the future (aside from a short-lived "hot hands" effect).

Finally, the factor model can't capture all the important nuances of a strategy or asset. All models are wrong to some degree. The question is whether they say something useful. Fama and French have recently produced some new research suggesting the value factor is really just a combination of two new factors: profitability and investment intensity.

With all those caveats out the way, factor models can be tremendously useful. They can tell you whether an asset or fund is offering something unique--alpha--rather than repackaging known factor exposures that you can obtain with low-cost index funds. Factor models can explicate a managers process and give you more confidence that he's doing what he says he does. And perhaps most importantly, they instill humility in you--beating a factor-based benchmark is often fiendishly difficult.

**Summary**

- An asset's periodic returns can be broken down into components attributable to factors, traits that explain or predict returns.
- Finance researchers broadly agree that stock returns can largely be explained by market, value, size, and momentum factor exposures. (Bond returns can also be broken down into factor exposures, though linear factor models don't work as well with them.)
- A common way to figure out an asset's factor exposures is to perform a multiple linear regression of an asset's periodic returns (usually monthly) against the returns of long-short factor-mimicking portfolios. Doing so creates a regression-based benchmark that tries to explain the returns of the asset.
- Unexplainable excess return, or alpha, is often interpreted as evidence of skill or some kind of additional risk not captured by the factor model.
- Because linear regressions are statistical in nature, their outputs shouldn't be taken as gospel. Even the measures of statistical uncertainty that accompany them, such as the t-stat or p-value, should be taken with a grain of salt.
- If you want to run your own regressions, watch Drexel University professor Wesley Gray's YouTube tutorial.

**Disclosure: **Morningstar, Inc. licenses its indexes to institutions for a variety of reasons, including the creation of investment products and the benchmarking of existing products. When licensing indexes for the creation or benchmarking of investment products, Morningstar receives fees that are mainly based on fund assets under management. As of Sept. 30, 2012, AlphaPro Management, BlackRock Asset Management, First Asset, First Trust, Invesco, Merrill Lynch, Northern Trust, Nuveen, and Van Eck license one or more Morningstar indexes for this purpose. These investment products are not sponsored, issued, marketed, or sold by Morningstar. Morningstar does not make any representation regarding the advisability of investing in any investment product based on or benchmarked against a Morningstar index.