Pick up the New York Times and skim over the business section. As you read, you form opinions about the character and prospects of the myriad companies featured in the daily news. Your brain arrives at a “sentiment” score based on a rubric of positive, negative, or neutral emotions stimulated by the text. In the computer science equivalent of reading the news, sentiment analysis is the systematic processing of attributes from words extracted from text mining.
What is clear from looking at a page in the newspaper, text heavily outnumbers numerical information. Charts and graphs are outmatched by anecdotes, recollections, and quotes. Financial analysis, previously constrained to price ratios and margins, is currently undergoing a sentiment revolution.
“Sentiment Analysis in Finance” now has 661,000 search results on Google Scholar, with seminal publications released by Tetlock et al. (2008), Mitra et al. (2008), and Leinweber and Sisk (2010). As shown in the diagram to the left, text is tokenized (broken into words), filtered, stemmed, and classified. The literature has now accessed varied sources of text such as (a) forums, blogs, and wikis; (b) news and research reports; and (c) content generated by firms.
Existing academia is chiefly focused on using sentiment to auger stock market returns. As a result, the literature has not evaluated whether textual analysis is predictive of a firm’s future income, cash flows, leverage.
To fill this research gap, we evaluate whether the sentiment of the Management Discussion and Analysis (MD&A) section of the SEC Form 10-K is predictive of a firm’s fundamentals in the next filing period. The Management Discussion and Analysis section is reserved for management’s discussion of the firm’s current financial health and its future growth prospects.
Our findings will add value in determining whether management’s tone and characterization of a firm’s trajectory is accurate. From a practical standpoint, we (1) provide machine learning models for quantitative finance firms seeking to forecast the financial future of investments; (2) point insurance companies toward key statements in the 10-K, which aid in ascertaining the quality of a firm’s balance sheet health; and (3) increase CEO and CFO awareness of which words the market is most reacting to. Corporate management liable for accurate representations of enterprise value can now be more circumspect with their chosen verbiage.
Key Findings Summary
Previous academic literature has constrained sentiment analysis to relationships with equity returns without reference to underlying fundamentals. We demonstrate that changes in sentiment in eight key emotional categories is significantly predictive of firm net income, cash flows, and dividends. We conclude by adopting a machine learning approach to modeling financial sentiment, explaining ~40% of the variance in firm fundamentals and equity market returns.
The following sections are organized as follows. Part I investigates whether total word counts per sentiment category are predictive of firm fundamentals in the following filing period. Part II investigates whether changes in word counters per sentiment category are predictive of future fundamentals. Part III evaluates a machine learning approach to predicting equity market returns based on significant variables identified in the previous section.
PART I. PREDICTIVE POWER OF AGGREGATE SENTIMENT
We webscrape the 10-K reports for every S&P 500 company in 2016 using edgarWebR. Subsequently, we perform an NRC sentiment analysis in Syuzhet, based on Saif Mohammad’s Emotion lexicon. According to Mohammad, “the NRC emotion lexicon is a list of words and their associations with eight emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive)”. NRC sentiment attributions were initially gathered manually through Mechanical Turk, and now have adopted widespread usage in computational intelligence and financial literature. We combine our NRC analysis with a firm’s net income, operating cash flows, investing cash flows, financing cash flows, dividends, and share based compensation from 2016 to 2017 (obtained with finreportR).
A) Sentiment Overview
We immediately note firms naturally tend toward positivism, with the amount of positive words double the amount of negative words in the average 10-K. Behavioral economists would likely characterize the data as an ode to the overconfidence of CEOs and CFOs forecasting their futures. We also keep in mind that 2016 boasted a strong economy and a robust bull-market, supporting optimistic outlooks.
Words associated with "trust" have the highest frequency of occurrence. Modern emphasis on corporate governance and ethical standards largely explains the sample phenomenon. Given the forward-looking nature of the MD&A section, it is also no surprise that Anticipation holds the second highest word count. Surprisingly, sadness and fear occur relatively often as well, likely concentrated in distressed energy and retail companies throughout 2016.
FINANCIALS KEY. SCSTKC: STOCK COMPENSATION; NI: NET INCOME; IVNCF: INVESTING CASH FLOWS; OANCF: OPERATING CASH FLOWS; DLTIS: DEBT ISSUANCE; DV: DIVIDENDS; FINCF: FINANCING CASH FLOWS
B) Correlation Plots and Visualizations
We turn to the correlation plot between the various NRC emotions. Contrary to expectations, we find that there is no significant negative correlations between emotions such as anticipation and fear, joy and anger, and surprise and disgust. In fact, the greater the number of words associated with fear, the greater the number of words associated with joy (a testament to the polarity of management). Given the proclivities toward optimism mentioned above, it appears as though excessive positivism has “smoothed” over sharp differences in underlying emotive content. Differences in total lengths of 10-Ks should further be considered, as larger 10-Ks can simply contain more fear and joy words overall.
Contrary to the emotive correlation plot, the financials correlation plot is awash with interesting relationships. (1) For instance, Net Income, Investing Cash Flows, and Operating Cash Flows are all negatively correlated with Stock Compensation. Unfortunately for CEOs, lower bonuses this year translate into higher earnings metrics next year. (2) Further worth noting, Net Income and Operating Cash Flow register a correlation of -0.16. The negative correlation implies significant usage of accruals and earnings smoothing for S&P 500 companies. (3) All other logical relationship appear to hold reasonably well; higher operating cash flows implies greater ability for reinvestment, leading to strong correlation with increasing investing cash flows.
Looking over the Bi-variate plots, we express initial optimism that positive and negative sentiment significantly covary with firm fundamentals. In the plots above, we note that the greater the number of positive words, the higher the firms subsequent change in net income. Further, the more negative the 10-K, the lower the firms operating cash flows. Similar relationships hold for the firms dividends, compensation, and investing flows. However, to confirm our initial speculations, we perform a Gaussian regression analysis.
C) Regression Analysis on all 6 Financial Variables
After 14 hours arduously scraping 10-Ks and agglomerating company financials, we do not find a single significant relationship between the sentiment expressed in a firm’s 10-K and its financials in a subsequent filing. Neither regressions of positive and negative words against firm financials nor regressions of all eight emotions against firm financials yields a significant relationship. We thus conclude that analysts' spending countless hours spent extracting general sentiment from MD&A sections of 10-Ks are for naught.
PART II. PREDICTIVE POWER OF CHANGES IN SENTIMENT
Robin Hood once said “rise and rise again, until lambs become lions”. Rather than dismiss sentiment analysis of the 10-K, we rise again and re-approach the data with a different perspective. The second time around, we spend an additional 4 hours extracting the sentiment from 10-Ks filed in 2015 for all S&P 500 firms. We then calculate the change in sentiment from 2015 to 2016, and use it to predict the change in firm financials from 2016 to 2017. In other words, do changes in sentiment significantly co-vary with a firm’s future financial condition?
A) Changes in Sentiment Overview
Even though positivity reigned supreme, negativity was certainly rising in 2016. Surprisingly, we learn that 10-Ks registered a sharply higher increase in negative words over positive words. The results attest to the importance of evaluating both change and total counts in findings, at the risk of missing important underlying trends.
The 6% increase in "disgust" in 10-Ks drove the rise in negativity, offset partially by a 2.7% rise in Surprise. We note that the rise in Disgust can stem from the category being the smallest in terms of raw word count. As a result, even a handful of additional “disgusting” words in 10-Ks can result in a sizable percent change.
B) Correlation Plots and Transformations
Correlations between changes in sentiment remain positive, although are notably lower than correlations in average total word count. The fact that percent changes in joy and disgust still have a slight positive correlation (~0.2), remains puzzling.
We also re-evaluate our calculation for changes in Net Income. As shown above, (Net Income 2017 / Net Income 2016) — 1 results in a fat-tailed distribution centered around 0%. A mass of data at 50% alludes to the S&P’s hefty profitability gains in 2017.
We recalculate change in net income as:
log (Net Income 2017 / Net Income 2016). As shown above, logged change in Net Income “brings in” fatter tails in the distribution, normalizes the data and improves tractability.
C) Regression Analysis on all 6 Financial Variables
We immediately notice that a variety of emotional categories are significantly predictive of changes in firm fundamentals. These findings are the first of their kind, as all academic literature has constrained sentiment analysis to relationships with equity returns without reference to underlying fundamentals. We summarize our findings with five significant comments.
- Every financial variable has its own unique set of relationships with sentiment categories. For instance, "disgust" has a statistically significant negative relationship with logged change in Net Income. However, it proves insignificant with changes in a firm’s Operating Cash Flow. This finding is significant for analysts seeking to forecast particular components of a firm’s income statement or balance sheet; case-specific research requires a tailored approach to 10-K analysis and conversations with management.
- Changes in emotions follow logical relationships with changes in firm fundamentals. For instance, a 1% increase in joy leads to a statistically significant increase in Investing and Financing Cash Flows. CEOs exuberant about future prospects naturally increase capital expenditures, while also issuing greater debt to finance speculative plant expansions. This finding can be referenced by investment banks seeking to forge relationships with clients predicted to have greater future growth capex (i.e. through M&A).
- A few niche relationships are worth noting. With respect to dividends, an increase in Surprise leads to higher dividend distributions while an increase in "disgust" leads to a lower dividend distribution of the same magnitude. 10-Ks displaying "surprise" typically do not also feature many "disgust" terms. With respect to Stock-Based Compensation, increases in "anticipation" lead to lower stock-based compensation. CEOs appear to be reducing existing incentive plans when expressing concern regarding the near future. CEOs also appear to be sacrificing near-term compensation when planning for long-term investment opportunities.This finding can be useful for firm employees gauging future compensation plans.
- Changes in positive and negative sentiment are not statistically significant predictors of changes in firm fundamentals. A data analyst would need to break down these sentiment buckets into the more granular emotions discussed above.
- R-Squared statistics average 3% for the regressions above. While significant, we now turn to a machine learning approach to financial sentiment analysis seeking to maximize percentage of variance explained.
D) A Machine Learning Approach to Financial Sentiment
Random forest is an ensemble learning method for classification and regression. The algorithm (1) uses bootstrap samples as training data, (2) grows large (low bias) trees with randomly sampled predictors used for each partitioning of the data, and (3) constructs fitted values averaged across tress from out-of-bag data which are often summarized in confusion table output.
Random forest is distinct from algorithms such as bagging by sampling predictors. With predictors sampled at each split, random forest increases the independence of fitted values and gives weaker predictors a chance to partition the data. This unique twist can dramatically improve gains from averaging over trees and further reduce generalization error, especially if predictors are substantially correlated. We proceed by implementing random forest with firm fundamentals as the response and emotional categories as predictors.
Remarkably, the percentage of variance explained across multiple financial variables is over 50%. As Random Forest uses out-of-bag data to obtain classification accuracy, estimates of our classification ability are unbiased (akin to using test data). In other words, using sentiment analysis alone we can explain a significant amount of variance in the logged percent change of key firm fundamentals (ranging from 27.9% of a firm’s Net Income to 62.2% of a firm’s Financing Cash Flows). Random Forests’ high percent variance explained is a significant improvement over the low R-Squared we achieved from simple regression, and is a testament to the power of machine learning.
We also assess the role of each predictor for each financial variable based on the reduction in classification accuracy (averaged over trees) when a predictor is shuffled. For instance, consider logged percent change in Net Income. When values of the “sadness” variable are shuffled randomly, classification accuracy declines by 23%. With the algorithm explaining 29% of the variation in Net Income, a decline of 23% is substantial.
There are a three key findings worth noting. Firstly, the variables determined “important” by the Random Forest algorithm are not the same variables found significant by regression above. Predictor importance does not indicate how the predictors are linked to the response. For instance, we do not know whether “trust” is important due to its interaction effects with other variables or whether its main effects are driving its ability to reduce impurity in trees. As a result, the discrepancy between regression significance and Random Forest importance is driven by the interaction effects of between emotions.
Secondly, there are key patterns among the variable importance plots. For example, "trust" is the most important predictor for half of the 6 financial variables (Operating Cash Flows, Investing Cash Flows, and Financing Cash Flows). Once interaction effects are incorporated within the data, emotions appear to more clearly link related parts of firm financials (i.e. the cash flows). Finally, while "joy" and "trust" are the main predictors of revenue-driven items, such as Net Income and Cash Flows, negative emotions such as "disgust" and "anger" appear to drive distributions, such as Dividends and Share Compensation. Shareholders ascertaining the magnitude of future distributions would do well to pay attention to any negative remarks expressed by management in the 10-K.
PART III. SENTIMENT-BASED PORTFOLIO CONSTRUCTION
A) Portfolio Construction
It is impossible to resist applying the principles identified in this study to the markets. Per Tobias Carslisle’s seminal book (Deep Value, 2014) on market multiples, cash flows are the most important beacon of firm profitability. We proceed to build a portfolio of firms with the most promising cash flow characteristics.
- Referencing our regression outputs above, we find that Anticipation is the most important variable for forecasting strong Operating Cash Flow growth in the following year.
- We select the 50 firms who had the highest percent increase in "anticipation" from their 2015 to 2016 10-K.
- We use the Quandl API to obtain the returns of each S&P 500 company from 2016 to 2017.
- We merge the two datasets and calculate the return, standard deviation, and Sharpe Ratio of our investments.
We register a remarkable 27.4% return with a Sharpe Ratio of 0.89, handsomely beating the market’s 19.4% return over the same period. However, it is important to keep in mind that the S&P 500 achieved a Sharpe of 3.04 in 2017, one of the highest Sharpe’s in the history of the market. The benefits of volatility reduction through diversification are crystal clear. It is important to extend the findings of the portfolio to all historical periods.
B) Using Sentiment to Predict Market Returns
We conclude with a familiar story. What the market finds important for equity returns, namely "fear" and "disgust," are not necessarily important for variables such as Cash Flow and Dividends. What regression finds important, namely "fear" and "disgust," is not necessarily found important in Random Forests. What Random Forests finds important is not necessarily found important in boosting, namely "surprise" and "disgust."
While investors often look for a silver bullet, it is extremely difficult to find a “one-size-fits-all” metric for mastering the market. In fact, the lack of a panacea is a wonderful outcome. Investors with aptitudes for reading management, for understanding cash flows, and for timing the market with short-term (high Sharpe) trades all can find a place in a world where idiosyncrasies are the generality.
That simple sentiment analysis has the capability of explaining 40% of the variance in market returns is worth raising a few eyebrows, and validates the use of alternative data for serious consideration in investing. As a previous skeptic of textual analysis myself, the results of this study really pushed forward the limits of what I previously conceived as “legitimate data”. Doubt is the origin of wisdom, but acquiescence to fact is the progenitor of good decisions.
Disclosure: I/we have no positions in any stocks mentioned, and no plans to initiate any positions within the next 72 hours.
I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it. I have no business relationship with any company whose stock is mentioned in this article.