Portfolio diversification is often touted as a viable strategy for mitigating investment risk. What does it mean, though, for a portfolio to be diversified? By owning multiple stocks in different GICS/S&P sectors, one version of the idea goes, risk can be reduced. I think it's important to dig a bit deeper, however, and attempt to understand and quantify the relatedness between and within these sectors.
One way to measure relatedness is through stock price comovements. Below, Comcast's (CMCSA) stock price over the last three years is plotted as a function of The Home Depot's (HD) stock price during that same time period. With an r2 value of 0.96, I think it can be said that there would be some redundancy introduced in a portfolio including both of these stocks as representatives of the Consumer Discretionary sector, i.e. the prices of these two stocks tended to move similarly over the last three years.
I wrote a python script to perform this comparison for all possible pairs of stocks in the S&P 500 (there are 124,750 unique pairs), and I report a portion of my results in the form of network graphs below. The main goal of my analysis is to find a way to maximize diversification by avoiding possible portfolio redundancies within and between sectors. By identifying which stocks' price comovements have been highly correlated over the last three years, I think an investor can maximize the spreading around of risk by increasing portfolio heterogeneity.
Below are representations of "within sector" and "between sector" relatedness over the last three years as network graphs in which nodes represent stocks and edges between nodes represent a significant correlation where r2 >= 0.95, p < 10^-5. With this threshold for edge formation, I analyze price comovements both within and between the Consumer Discretionary and Utilities sectors.
As one might expect, Lennar Corp. (LEN) and D.R. Horton, Inc. (DHI) are connected, as both are members of the Homebuilding Sub-Industry and are likely to be sensitive to similar economic conditions. Another intuitive result is the connectedness between HD, LEN, and The Sherwin-Williams Co. (SHW). The other connections are perhaps not as intuitive, but are interesting to consider.
The Utilities network looks much different:
The salient difference here is that the Utilities sector has much higher "within sector" connectivity, with Pinnacle West Capital (PNW) and Xcel Energy Inc. (XEL) connected to ten and eight other Utilities stocks, respectively.
Another way to analyze relatedness is to look at connections between sectors. Below is a network including the Consumer Discretionary and Utilities networks along with the connections between them (blue edges).
Another way to visualize connectivity is to include all stock pairs from all sectors in one graph:
(click to enlarge)In this network graph, as before, node color denotes the sector and edge color represents whether the connection is between two sectors (blue) or within one sector (dark grey). I think one interesting result in this network graph is the large number of blue (between sector) connections; one might expect most of the connections in such a network graph to be "within sector" connections, but this result indicates that an appreciable portion of highly correlated stocks are in fact from two different sectors, underscoring the importance of being careful when going about diversifying one's portfolio.
Above are network graphs that represent the relatedness of stock price comovements between and within the Consumer Discretionary and Utilities sectors. In addition, using the same methodology, the most highly correlated stocks in the S&P 500 are presented as a network graph. I think this information is useful because it is possible that a portfolio with certain combinations of Utilities stocks, for example, may introduce homogeneity given the similar price comovements of some of the stocks in that sector. Conversely, a portfolio with representation from unconnected Utilities stocks would have perhaps been better diversified. Even though this analysis is retrospective in nature, and is restricted in that sense, I think the results are still interesting to consider.
1) List of S&P companies obtained from en.wikipedia.org/wiki/List_of_S%26P_500_companies
2) The python script was written using Spyder 2.1.10
3) matplotlib.finance modules used to fetch and parse stock price data from Yahoo Finance
4) r2 values were calculated using SciPy stats module by Gary Strangman
5) CMCSA vs HD graph created using LibreOffice
6) BWA, DLPH, TRIP, KRFT, KMI, MPC, PSX, WPX, VNO, ABBV, ADT, XYL, LYB, QEP were excluded from the analysis due to a lack of available data on Yahoo Finance
7) Network graphs were created using Cytoscape 2.8.3