By now everyone's heard of the predictive potential of Twitter mood states, and the "Twitter trading system" theme has been flogged to death.
So, for this article, we wanted to shift focus to another online data source…
Our question here is simple: Can subscriber trends on Seeking Alpha be predictive of short-term market movements? More specifically, can we predict the near-term direction of the market by comparing subscriber trends for the various sectors?
A few examples that we will be testing over the coming weeks:
- Should we expect a market downturn if the subscriber growth rate for large-cap defensive stocks outpace subscriber growth rates for speculative small-caps?
- What happens if subscriber growth rates stall after a long period of market declines? Is that a good contrarian signal?
- Do under-the-radar stocks, with low subscriber growth rates, outperform hyped up stocks with rapid subscriber growth rates?
And so on… The possibilities here are endless.
It's safe to assume that Seeking Alpha can be a proverbial gold mine for data miners. Surveys show that they attract the highest percentage of financial professionals (13.9%) of any major finance website.
And perhaps more importantly, for our purposes of developing trading indicators, over 52% of Seeking Alpha readers bought stocks in the trailing 30 days - double the closest second (TheStreet.com, at just over 26%). This means that there could be a close connection between Seeking Alpha subscriber trends and actual trading activity…
Approach For This Experiment (The First of Many…)
We wanted to see if there was a way to predict the direction of the stock market over the next 5 trading sessions, purely based on recent market trends and Seeking Alpha subscriber behavior.
Instead of focusing on the subscriber trends of individual companies, which can be very noisy due to company-specific events, we analyzed subscriber trends by sector - adding up subscriber trends for all companies in a specific sector, and monitoring the trends over time.
Here is a sample of the data, before transformation.
Using Machine Learning to Identify Predictive Variables
(Author Note: In case you're not familiar with machine learning technology, we've included links to Wikipedia pages that clearly explain the concepts discussed)
There are thousands of variables that can be constructed from Seeking Alpha subscriber trends mentioned above, but how are we going to know which ones are predictive, and which ones are noisy?
This is where machine learning technology saved us a lot of time. By using various attribute selection techniques to sift through the data we collected and transformed, we narrowed our focus on variables that show a close correlation to market directions, while eliminating variables that are highly correlated with other variables.
We haven't listed all the variables here, but if you're interested in the input variables we selected for this experiment you can check out this public Google Docs spreadsheet.
Building and Testing Our Algorithms
Artificial intelligence and machine learning sound like very complicated topics, but essentially it's about training algorithms to make predictions, by showing them historical examples.
All data points identified during our attribute selection process were sorted by date into an input vector-an ordered sequence of data points that reflected real-world conditions at a certain point in time.
We labeled these input vectors with actual historical outcomes. To be more specific, we had to label our input vectors with "Up" if the S&P500 gained over the next 5 days, or "Down" if the S&P500 index declined over the next 5 days. (In computer science, this is called supervised learning.)
The machine learning algorithms then sifted through the labeled data, searching for rules and relationships that can be used to predict the short-term direction of the market.
Every algorithm was trained to signal "Up" if the market was expected to move higher over the next 5 sessions, and "Down" if the market was expected to move lower. (In computer science, this is called binary classification.)
Of Course, Past Performance Is No Guarantee of Future Results
Just because a model did well in predicting market direction in the past, there can be no guarantee that it will do so again when presented with new data.
That's why we used a process of in-sample vs. out-sample testing to identify Seeking Alpha trading indicators that remain accurate and stable when presented with new data. In-sample observations were used to optimize the algorithms, and out-sample observations were used to test the accuracy and stability of our algorithms.
After using a proprietary algorithm screener, we identified 40 solid algorithms with superior performance in the out-sample test. This does not guarantee that they will do well in the future, but it does show that they are expected to remain stable and accurate when presented with new data in real-time trading situations.
Here is a summary of our algorithms' in-sample accuracy rates and performance statistics, based on observations between 3/28/2012 and 7/15/2012 (used for training the algorithms). As we mentioned, achieving a high level of in-sample accuracy is the easy part, since all algorithms are optimized to fit data in the training set. And here is a summary of our algorithms' out-sample accuracy rates, based on observations between 7/16/2012 and 8/28/2012 (used for testing the algorithms).
Developing Real-World Trading Signals From the Collection of Algorithms
Every algorithm listed above is unique-built with unique data points and unique machine learning structures. That's why every algorithm has unique strengths and weaknesses.
To reduce the variability of signals generated from specific algorithms, we've decided to build an index of their collective signals-tracking the percentage of all algorithms that either signal "Up" or "Down".
Here is a graphic representation of the aggregated signals generated from our collection of algorithms. As you can see, the signals have turned decidedly bullish during recent sessions.
And here is a backtest of the aggregated signals, dating back to the beginning of our sample.
So, What's Next?
Unfortunately, our sample size is very small, so we can't be too confident in the predictive potential of these newly-built algorithms.
That said, we have added all these models to our ecosystem of algorithms, and we will be monitoring their accuracy on a daily basis going forward. A month or two from now, we could probably narrow down the list of 40 to a list of the top 10, which we can then use for real-life trading signals.
This preliminary study seems to suggest there is a lot of predictive value in Seeking Alpha subscriber trends, and further investigation is definitely warranted.
What are your thoughts on the matter? Do you think that these types of signals are useful? Or are we placing too much faith in the activities of crowds? Let us know in the comment section.
We'll keep you posted on our future studies…
Disclosure: I have no positions in any stocks mentioned, and no plans to initiate any positions within the next 72 hours.