Factor Modeling - Let's Go Deeper

by: Maurice Chia


How the Fama French factor models are calculated.

AQR revisits the Fama French Five-factor model.

Can deep learning help solve this tautology?

In asset pricing and portfolio management, Eugene Fama and Ken French in 1992 laid the empirical foundations for the Fama and French three-factor model which explained how size and value, in addition to the market, contributed to portfolio returns.

How did Fama French calculate these factors?

"In June of each year t from 1963 to 1991, all NYSE stocks on CRSP are ranked on size (price times shares). The median NYSE size is then used to split NYSE, Amex, and (after 1972) NASDAQ stocks into two groups, small and big (S and B). Most Amex and NASDAQ stocks are smaller than the NYSE median, so the small group contains a disproportionate number of stocks (3,616 out of 4,797 in 1991). Despite its large number of stocks, the small group contains far less than half (about 8% in 1991) of the combined value of the two size groups."

"We also break NYSE, Amex, and NASDAQ stocks into three book-to-market equity groups based on the breakpoints for the bottom 30% (low), middle 40% (Medium), and top 30% (High) of the ranked values of BE/ME for NYSE stocks" - where BE stands for book common equity and ME stands for market equity.

Fama and French mentioned that these splits "are arbitrary, however, and we have not searched over alternatives."

Six portfolios were then constructed from the intersection of the two size and three BE/ME groups (S/L, S/M, S/H, B/L, B/M, B/H). Monthly value-weighted returns on the six portfolios are calculated from July of year t to June of year t+1, and the portfolios are reformed in June of t+1.

The SMB portfolio is the difference, each month, between the simple average of the returns on the three small-stock portfolios (S/L, S/M, S/H) and the simple average of returns of the returns on the three big-stock portfolios (B/L, B/M, B/H).

"Thus, SMB is the difference between the returns on small- and big-stock portfolios with about the same weighted-average book-to-market equity".

The HML portfolio is the difference, each month, between the simple average of the returns on the two high BE/ME portfolios (S/H and B/H) and the average of the returns on the two low BE/ME portfolios (S/L and B/L).

The proxy for the market factor is the excess market return, Rm-Rf where Rm is the return on the value-weighted portfolio of the stocks in the six Size-BE/ME portfolios, plus the negative BE stocks excluded from the portfolios while Rf is the one month bill rate.

To test the factors on various portfolios, Fama and French tested on 25 portfolios, formed on size and book-to-market equity, as dependent variables in time-series regressions. They used the NYSE breakpoints for Size and BE/ME to allocate NYSE, Amex, and NASDAQ stocks to five size quintiles and five book-to-market quintiles. The 25 portfolios were then constructed from the intersections of the Size and BE/ME quintiles and value-weighted monthly returns were calculated as before. They also later tested on Earnings-to-Price (E/P) and Dividends-to-Price (D/P) portfolios to "check the robustness of our results on the ability of our explanatory factors to capture the cross-section of average returns".

It's a tautology

You can almost always find a factor (linear or otherwise) which explains portfolio performance by regressing portfolio performance against the performance of a categorized group within the universe of stocks that the portfolio stocks themselves belong to. This article I wrote entitled "The parable of the jump shot or why factor modeling is a tautology" drives home the point.

Tautology notwithstanding, one of the biggest problems a model can have is factors that are dependent on each other, making it unclear the real contribution of a factor in explaining portfolio performance. When the factors are dependent, any statistical inferences made will be inconclusive or incorrect.

Ongoing research on factors by AQR inadvertently illustrates this problem of a model being inconclusive. This is what they had to say in their article entitled "Our Model Goes to Six and Saves Value From Redundancy Along the Way":

"Fama and French's latest Five Factor Model is an impressive way to summarize the known playing field of factors, and brings some very good things to the table. However, for reasons we don't find convincing, it leaves out momentum. With no change necessary to the value factor it's absolutely compelling to add momentum back creating a better Six Factor Model. But, as we've argued elsewhere, the value factor should in fact be improved. If you change to more timely value then the momentum result is even stronger, as a matter of fact the strongest of all factors we test, and the value factor, rendered distressingly (for those of us who've considered ourselves value investors for many years!) redundant by the Five Factor Model, is easily resurrected. But, sadly in the process, and we readily admit other constructs, samples, and tests might save it too (we've only focused on the basic Fama-French 3x2 portfolio formation technique here over just this time period in just the USA), we have mostly lost the CMA factor. Thus we're still back at a Five Factor Model, just a better one, in our opinion, than we started with, and one with a very significant role for the stand-alone (timely) very-non-redundant value factor.

Finally, note, this is not our final word, or presumably the final word of others, on the best multi-factor model".


In my last article, I mentioned that "machine learning must go a step further to add value in the prediction of future returns. It can do this by working on models that do not assume normal distributions or independent and identically distributed variables. What is needed is an elemental decomposition of the data and a paradigm shift in regard to portfolio management modeling (an alternative to factor modeling, perhaps)."

There is some AI research being done that proposes to combine bayesian statistics and machine learning, deep learning, and probabilistic programming in order to improve factor reliability.

The bayesian statistics and machine learning may be combined to form bayesian neural networks. Simply described, bayesian neural networks are a probabilistic version of neural networks.

Deep learning makes use of neural networks many levels deep and offers a deterministic approximation to an unknown nonlinear function.

Combining these disciplines in a probabilistic programming language will provide the researcher with the ability to model the uncertainty of a non-linear predictive model.

If factor dependencies can be addressed by this non-linear model and bayesian predictions prove to be reliable (there are mixed reviews), then it would be a step closer towards the holy grail of prediction modeling. Otherwise, it will be an academic exercise that at most helps to reduce the intractability of non-linear computations.

Meanwhile, investors remain better off just investing in a plain vanilla ETF.

Disclosure: I/we have no positions in any stocks mentioned, and no plans to initiate any positions within the next 72 hours.

I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.