Low Correlation And Portfolio Choice

by: Jonathan Ross


I introduce a simple algorithm which is designed to choose a small subset of weekly correlated stocks from the market whose portfolio standard deviation is small relative to the market's.

The mean risk premium (including dividends) of this 250-stock portfolio is insignificantly different from the market's over the period 1965-2014 (9.47%) while the standard deviation is much smaller.

The strategy buys the portfolio at the beginning of year t and sells at the end of year t. The Sharpe Ratio of this portfolio is larger than the market's.


The following strategy is based on my current academic research. Specifically, the strategy is discussed in depth in my paper titled "The Correlation Anomaly: Return Comovement and Portfolio Choice" (3rd paper). I assume the reader is familiar with some basic mathematical notation throughout.

I summarize the strategy in this article. The strategy is a longer-term strategy where the investor buys a portfolio of equally-weighted 250 stocks at the beginning of the year and sells them at the end of the year. I advocate the investor repeat this strategy over a 3-5 year period and they will earn the same average return (including dividends but excluding transaction costs) as if they held the whole market portfolio but take on substantially lower levels of risk (assuming the past 50 years in the U.S. stock market repeats itself).

This article discusses precisely how to choose these 250 stocks from the universe of U.S. publicly traded stocks each year. I conclude with a short discussion on transaction costs at the end and offer the individual investor advice on how to minimize these costs. I also report some interesting statistics on which stocks are in the portfolio and supply the tickers for the 250 stocks in the 2016 implementation of the algorithm for those investors willing/able to invest.

The Strategy

First, assume equal-weighting and consider the formula for standard deviation of portfolio return (henceforth "portfolio standard deviation") given in equation (1) below and cited in many Finance textbooks (see here for example). It is a well-accepted measure of risk both among Finance academics and practitioners.

In equation (1) represents the number of stocks in the portfolio, the correlation in returns between stocks i and j and is the standard deviation of stock i's return. The standard deviation of portfolio return can be measured over any time period one wishes. I assume that and are rather persistent over time. For example, one can use the prior, say 36 monthly returns to measure for a pair of stocks and this will be a good proxy for the that will be realized for this same pair of stocks over the next 12 months. This assumption appears warranted in the data.

Next, assuming and are independent of each other, rather straightforward calculus can show that declines as declines. This immediately implies that as the mean declines, will fall. Therefore, if we add stocks to a portfolio such that the mean level of correlation declines, portfolio standard deviation will decline as well.

The mean pair-wise correlation among all possible pairs of stocks in the portfolio can be mathematically represented as in equation (2) and is labeled as .

is a measure of diversification. Investors care (or should care) about not having stocks in their portfolio whose returns move similarly over time ceteris paribus. Portfolios with low satisfy this objective. For a given portfolio, the smaller is the less correlated (on average) the returns of the stocks in that portfolio are and the more diversified the portfolio is. The algorithm, which picks the stocks is designed in such a way that as the number of stocks added to the portfolio declines, declines as well.

Thus declines as declines. This leads to declining well below the level of the market even though a smaller and smaller subset of the market is chosen. This decline in happens only up to a point after which begins to rise again as the increasing effect on of decreasing (this effect is well-documented in prior finance literature and summarized in Figure 2 of the paper referenced above) begins to outweigh the decreasing effect on of decreasing . I find empirically that the magic number of stocks is .

I applied the algorithm at the beginning of each year over the time period 1965-2014 to the CRSP (University of Chicago Center for Research in Security Prices) market which includes highly accurate stock return data for every publicly traded stock on NYSE (since 1925), NASDAQ (since 1972) and ARCA (since March 2006). CRSP's coverage represents approximately 99% of the market by capitalization and is the primary source of market data used by academics worldwide.

The algorithm is discussed in detail below.


Step (1):

At the beginning of year t exclude any stock from the CRSP market which does not have at least 36 prior months of total stock return (including dividends) data, has a share price less than $5 or is an ADR, closed-end fund, foreign domiciled stock or real estate investment trust. This step leaves one with a "market" composed of an average of = 2,524 firms across the 50-year sample period. The reason for the first cut is the algorithm relies on historical return data for the correlations and the reason for the next cuts is I don't want to risk picking up stocks which have very low liquidity (and thus could not be bought by the individual investor easily anyway).

Step (2):

At the beginning of year t and using the reduced market from Step (1), form the correlation matrix using the prior 36 months of total returns data for each firm in the market. That is, calculate from equation (1) for each pair of firms i and j in the market.

Step (3):

Let 0 < < 1 represent a threshold correlation value and let be an identifier matrix with zeros on the diagonal and where if for all . Thus, in words, the identifier matrix has zeros on the diagonal and a 1 in any off-diagonal entry where the corresponding off-diagonal correlation entry in the correlation matrix is greater than or equal to the threshold correlation value.

Step (4):

Stocks with all zeros in their corresponding rows are labeled "singletons" since they exhibit less than the threshold correlation with every other stock in the market. These stocks comprise the low correlation "singleton" portfolio alluded to in the title.


By lowering , one can make the size of the singleton portfolio as small as they like and the mean pair-wise correlation ( from equation (2)), declines monotonically (this is easy to prove but I omit the proof here for the sake of space). Thus, as falls, falls and falls as well up to a point for the reasons discussed earlier.

I experimented with different values of in order to form singleton portfolios of various sizes (1000 stocks all the way down to 10 stocks). I then examined the out-of-sample performance of these historically low correlation singleton portfolios during year t over the time period 1965-2014. I find the performance of the 250-firm singleton portfolio performs the best. The average value of required to ensure a singleton portfolio with 250 firms was

= 0.615.

Empirical Performance of the Strategy (1965-2014)

Although the theory and algorithm are designed to drive down risk, as measured by , the amazing thing is that portfolio return is not sacrificed in doing so! This is inconsistent with long-standing Finance research, which posits a positive relationship between risk and return. This is also precisely the reason I label this finding the "Correlation Anomaly" in my paper cited at the beginning of this article.

Neither I nor my co-authors have an explanation for why this apparent "anomaly" exists. Our finding adds to the long literature on anomalies in Finance. Other well-known anomalies are the small-value anomaly, low beta anomaly, low volatility anomaly and the momentum anomaly. We show in our paper that the correlation anomaly is distinct from each of these other anomalies.

Figure 1 here plots the annualized singleton portfolio risk premium (total return less the risk-free rate) versus the market portfolio risk premium (both including dividends) over the time period 1965-2014 (I use this sample time period because the CRSP "market" that survives after the data cuts is rather small pre-1965. The risk-free rate is the rate on a 1-month treasury bill compounded annually. The average risk-free rate over the sample time period was approximately 5%.

Figure 2 here plots the annualized singleton portfolio standard deviation versus the annualized market portfolio standard deviation over the same time period.

Figure 3 here plots the Sharpe Ratio (risk premium scaled by standard deviation of risk premium in each year) of the singleton portfolio and market portfolio.

Table 1 below details the average performance of various-size singleton portfolios versus the market over the 1965-2014 time period.

Table 1

In Table 1, COR represents the equation (2), BETA is the covariance of the portfolio return with the market return scaled by the variance of the market return, STD is portfolio standard deviation following equation (1), RP is the risk premium, SR is the Sharpe Ratio and TR is the Treynor Ratio. PTR is a measure of portfolio turnover and Avg. N is the average number of stocks in the portfolio. The stars represent statistical significance tests of the difference between the singleton portfolio summary statistics and the corresponding market portfolio summary statistic. ***(**)(*) represents significance at the 1%(5%)(10%) levels respectively. The t-stats of these tests are reported below their corresponding summary statistic.

Notice from Table 1 that, on average, the singleton portfolios are all more diversified than the market (have lower ), are less risky (have lower ) earn the same risk premium, have higher Sharpe Ratios and have higher Treynor Ratios than the market portfolio!

I advocate the S250 portfolio (Singleton 250 stock portfolio) because its Sharpe and Treynor Ratios are the highest. These are relative measures of reward per unit of risk taken on.

Interesting Statistics

There are a couple of other interesting statistics regarding the 250-firm low correlation singleton portfolios formed each year over the 1965-2014 time period. First, there is reasonable turnover. Specifically, there is a 34%(20%) likelihood that a stock in the portfolio in year t will be in the portfolio in year t+1(t+2). Thus, there are a non-trivial number of stocks that do not have to be sold each year (85 = 0.34*250) on average.

Second, the 250-firm low correlation singleton portfolios equally represent the various economic sectors/industries. Specifically, Table 2 below summarizes the proportion of stocks in a given industry that are represented in the 250-firm portfolio.

Table 2

Notice that some industries are much larger than others (e.g. Manufacturing) so the raw number of stocks in an industry that are also in the portfolio might be much larger than for another industry. The takeaway is the "Avg. Percent" column where the percentages are fairly even across the industries. Thus, the low-correlation algorithm does not disproportionately favor choosing firms from one industry over another. In this sense, the algorithm is picking a diverse group of firms as well.

2016 Portfolio Identification

I have run the algorithm on the CRSP market beginning January 1, 2016 and the 250 firms that emerged are reported in Table 3 below.

Table 3

Transaction Costs

For individual investors, transaction costs do tend to be a problem. With online brokerages charging an average of $7-$8 per trade buying 250 stocks at the beginning of the year and selling them at the end would cost $3,750!! ($7.50*2*250) This cost is fixed so institutional investors with millions of dollars to invest would need to only generate a tiny return to cover the transaction fees (since they have such high trading volume they pay much less per trade anyway).

Individual investors, on the other hand, need to find alternative ways to reduce their transaction costs. One way, of course, is to buy fewer stocks but this is not desirable as it reduces diversification, which is oh so important when investing. Fortunately, a few opportunities exist to help the institutional investor out. One such opportunity is Motif Investing (www.motifinvesting.com). They offer a great deal but the catch is you buy a 30-stock "Motif" and the transaction fee is $9.99.

Thus, the fee per trade is effectively 30 cents. If you want to buy or sell individual stocks you pay $4.95 per trade. This setup is perfect for the strategy discussed in this article. You buy 8 motifs each with 30 stocks which gives you 240 of the 250 stocks then you buy a 9th motif with 30 stocks; 10 of them new and the other 20 ones you already bought. At the end of the year you sell these 270 stocks. The transaction costs are thus 9*2*$9.99 = $180 which is very reasonable.

Their platform allows trading in fractional shares and you can rebalance a motif for a $9.99 fee. I don't advocate rebalancing with the strategy discussed in this article as the results reported are without rebalancing.


This article, although somewhat technical, introduces an implementable trading strategy for the investor, which applies a simple algorithm to the correlation matrix of stocks in the market and leads to the formation of a 250-firm subset of this market, which realizes higher risk-adjusted performance than the market from which it is drawn over the time period 1965-2014.

Although the algorithm is not implementable (as described in this article) without the investor having access to a database of stock returns (or stock prices with dividend information) such as the University of Chicago's Center for Research in Security Prices (NASDAQ:CRSP), I have included the tickers of the 250 stocks which the article describes how to derive for 2016.

This strategy however can be applied to any "market" the investor has access to in order to choose a more diversified and less risky subset from that market. My research paper "The Correlation Anomaly: Return Comovement and Portfolio Choice" (3rd paper) applies the algorithm to several "markets" with similar results.

Disclosure: I/we have no positions in any stocks mentioned, but may initiate a long position in THE 250 SINGLETONS AS REPORTED IN TABLE 3 over the next 72 hours.

I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.