Reddit, GameStop And Python

Summary
- There are a lot of tools to do some basic monitoring of Reddit in real time without investing more than 1 minute.
- Python provides powerful tools to analyze Reddit and build outputs.
- Using a few lines of code, you can look around hundreds of post thousands of words to summarize information.
With all the hype about Reddit and GameStop, how redditors were turning into millionaires overnight (although I could only find 1 story about it), how greedy hedge funds were going to disappear and how redditors were going to save the world I thought it was probably a good idea to monitor Reddit and other social media like Twitter to find out a good investment opportunity… and to save the world.
Nevertheless, monitoring social media takes time. It could take a couple of hours a day. In addition, all the monitoring must be done in real time because, when trading, timing is everything. And the worse part… it is boring. It is boring to spend hours and hours and hours and hours reading the same web pages trying to find the next great investment opportunity.
Sentiment analysis using Twitter
Thankfully, there are a lot of tools to do some basic monitoring in real time without investing more than 1 min every now and then. In my previous post, Sentiment analysis using Twitter, I explained how some basic sentiment analysis could be done using twitter. With some tweaks, like searching for keywords such as “stocks”, “stock market”, “stock trends”, etc., the tool could be applied to search for investment opportunities among different stocks in different regions. However, one of the drawbacks is that there are no real defined groups where people talk about specific topics on Twitter. You can filter by trends, by regions, news? Sports?… but that’s it.
In contrast, Reddit has various groups called subreddits where people and bots talk about specific topics. One that might useful to find investment opportunities and where apparently the hype about GameStop started is “wallstreetbets”. This subreddit “is a community for making money and being amused while doing it. Or, realistically, a place to come and upvote memes when your portfolio is down” as the FAQ explains. Also, there is a subreddit called “investing” which comprises “markets, economic impacts, corporate profits, shifts in the yield curve, the federal reserve, taxes, potential government spending that may impact your portfolio”. Within a few minutes navigating Reddit you will find a plethora of subreddits that could potentially contain investing opportunities; but now, the fun part…
How to read all the posts? Scrap the text from Reddit. All the code is in my GitHub page.
First Step: getting access
Just as with Twitter, the first step is to go into https://www.reddit.com/prefs/apps, create an app (script one), and get the access keys. Once you have the access keys you insert them into the code where the capitalize words are after importing the libraries and creating some blank variables:
import praw
import pandas as pd
import os
from wordcloud import WordCloud, STOPWORDS
currdir = os.path.dirname(file)
stopwords = set(STOPWORDS)
title = []
comment = []
reddit = praw.Reddit(client_id = ‘HERE GOES THE CLIENT ID’, client_secret = ‘HERE GOES THE SECRET KEY’, username = ‘YOU USERNAME’, password = ‘YOUR PASSWORD’, user_agent = ’WHATEVER’)
Second step: defining the subreddit and the search criteria
I decided to search within the “wallstreetbets”, focus on hot topics and extract the content of the posts. Alternative, you could try different subreddits, search for the top comments, newest comments, etc. and extract the title of the post, the content, the upvotes, or everything. Because the variable that the library extracts is weird (or at least for me), you have to do a simple for procedure to extract the data into the comment variable:
wallstreetbets = reddit.subreddit('wallstreetbets')
hot1 = wallstreetbets.hot(limit=52)
for submission in hot1:
if not submission.stickied:
comment.append(submission.selftext)
Third step: create the cloud
Finally, you need to transform the variable into a data frame and then into a string variable to use WordCloud:
commentDF = pd.DataFrame({'comment': comment})
string = commentDF.to_string()
wc = WordCloud(background_color="white", max_words=100, stopwords = stopwords)
wc.generate(string)
wc.to_file(os.path.join(currdir, "wc.png"))
After you run the code an image will be saved into your working directory that looks like this (but with the latest information):
As you can see, the word preprocessing was done by the library STOPWORDS which was set at the beginning of the code. Also, the image contains some keywords that could be used to explore a bit further such as “silver” and “GME”.
Pros
With a word cloud you can grasp insights from, potentially, millions of words in a few seconds. Using Reddit to mine data is simple, flexible and targetable. Using a few lines of code, you can look around hundreds of post thousands of words to summarize information.
Cons
A word cloud loses a lot of information (e.g., how many times the word appeared, the context of the word, etc.). Also, it is difficult to drive meaningful conclusions from it. With the previous example it can’t be determined if, for example, silver should be bought or sold. Further analysis is required.
Analyst's Disclosure: I/we have no positions in any stocks mentioned, and no plans to initiate any positions within the next 72 hours.
Seeking Alpha's Disclosure: Past performance is no guarantee of future results. No recommendation or advice is being given as to whether any investment is suitable for a particular investor. Any views or opinions expressed above may not reflect those of Seeking Alpha as a whole. Seeking Alpha is not a licensed securities dealer, broker or US investment adviser or investment bank. Our analysts are third party authors that include both professional investors and individual investors who may not be licensed or certified by any institute or regulatory body.