The most pertinent question that comes to mind when investing in Big Data is "what is big data." For most investors - technical definitions are too in-depth and articles written by journalists gloss over the topic without any real analysis.
While it's a bit early in the technology's life to determine the full use cases, requirements, and implications to confidently pick winners and losers, it's still possible to pick technologies that present new opportunities - hence growth - the favorite tune of Wall Street. We'll start with an overview of Big Data and present 5 very interesting companies that allow direct exposure to each of the 3 lateral sections in Big Data.
This is a Primer on Big Data posted on my blog - I'll be updating it as time goes on so check there for the most recent version:
With the evolution and growth of digital data, there's been a trend of 'Big Data' - there's no official definitions but there's some industry rule-of-thumbs:
If you have to ask, you probably aren't using it
But the non-whimsical and grounded definition is probably:
data that is too complex/large to process with standard database management systems or traditional analytics
While there aren't technical delineations amongst all big data - there are common characteristics:
Volume - the most obvious is size. Users are generating immense amounts of data with their software and hardware - from obvious things like Facebook (FB) content and Twitter (TWTR), to subtle things like application settings, playlists in the cloud, exercise tracking.
If you think about software written for a traditional company, it's designed for access by hundreds of thousands of employees at once - even then, the average case is generally less than that (in the middle of the night, non-peak hours). While Google (GOOG) is an extreme example, the search engine processes billions of search queries daily.
Velocity - the rate at which we are creating this data. A relevant example is electronic trading; there are more trades being executed each hour than there was 10 years ago in a day. The data that could fit in a tape-ticker would require several hundred tickers simultaneously nowadays.
If you're not familiar with Foursquare - it's a location based social media site that lets users 'check in' into places they're currently at on mobile devices. The platform shares this information through other social media and allows pictures, comments, tagging. If Foursquare were to disseminate this information to retailers, they'd have to process hundreds of thousands of requests simultaneously:
- you have people checking in
- Foursquare processing the data
- Foursquare distributing data out
- retailers pulling more (contextual) data out (user profile, has this user come to my place before?).
This would all have to be done in a timely manner - within the first few moments of the customer coming in so the retailer can prepare appropriately - otherwise the data becomes non-veracious: a point later explained.
Variety - the nature of the digital data (pre-Facebook) used to be very narrowly defined: a number or text, etc. While that hasn't changed fundamentally, the high-level abstractions have.
Consider the anatomy of a tweet: it can have a hashtag "#," a user profile link "@xyz," text of the tweet, a web link to another site, and timestamp.
To even attempt to categorize and organize the tweet, it must be abstracted beyond a number or a text, hence the variety of data grows exponentially.
Veracity - simply put, it has to do with the timeliness of the data.
If the Foursquare example above couldn't explain veracity, try this;
- Imagine having a newspaper mailed to your house using snail-mail (a few days lag)
- You are trying to keep up with current events (trading on stock picks, checking the weather)
The timeliness and freshness of the data can change how useful the data is in a heartbeat.
Before any kind of Big Data developments can happen - there are a few sine qua non (requirements) that must be satisfied:
Storage Capacity - scale cheaper and faster. Solid state disks have probably satisfied retail demand for storage requirements for the next few years. They're quite expensive for enterprises since enterprises look at it from a cost per storage unit perspective, but once production scales and solid states are just as cheap as standard magnetic drives - expect to see controllers and infrastructure for new solid states come up into demand.
Memory - computations are done exponentially faster when data is pulled from memory. There's a very interesting 3-way teeter-totter balancing act between memory, processing speed, and software optimization. The slowest one will largely determine the overall result, but like above, memory size and speeds have made huge improvements and are one of the cheaper upgrades.
Processors - a lot of people know Moore's famous doubling rule, but there's a secondary and lesser known Moore's law.
It states that the costs of semi-conductor development increases exponentially with time.
It's a known fact that we're approaching physical limits of semi-conductors and solutions are being sought. There's a lot of research into quantum computing, and biocomputing. This shouldn't be a big risk factor in the mid-term.
Networks - bandwidth and speed. Everyone should have first-hand experience with internet speeds - slow YouTube videos or webpage loads. Because the internet is the backbone of Google's whole business model, they'll push it as hard as they can.
I talked about Variety earlier but the nature of data requires exploring a bit more in depth:
From the 1970s, the growth in digital data was structured; the data had structure, patterns, and existing schemas. The data for stock trading at the end of day could look something like this: price, volume, high, low, open, close. This format didn't change no matter what stock we were looking at, there wouldn't be 2 highs or multiple opens.
Like the tweet example above, unstructured data can't be easily put into a structure - yes I did say that the tweet could only contain certain things, a hashtag, a profile link, a weblink, text, timestamp.
But the problem is that when you want to create a traditional database, you have to set limits and define explicitly the structure of the data being stored in the database.
- How many #'s are there in a tweet?
- How many @'s are there in a tweet?
- How many characters are there in a tweet? (Yes 140)
Now you need to explicitly set limits - if you set the database to allow for the worst case scenarios (say 10 #'s, 10 @'s), then as tweets are populated in the database, you'll realize that many tweets don't contain hashtags or @s. Maybe even 90% of the database would be wasted. This kind of structure is simply unacceptable and an ineffective way of managing the data. I haven't even touched upon how the data doesn't relate to other data in an easy to manage way.
It's appropriate to point out at this point the dichotomy between human-generated data and machine-generated data. The growth of machine-data is generally faster than human-generated data because it doesn't require human input.
Human Generated Data
Text messages, social media, this data has to be stored and interpreted with context - imagine picking out texts from a conversation and they're not in the right order. Analyzing this data is more effectively done by a person than a machine right now. While the sentiment analysis of Twitter streams is a lot more effective now than it was years ago, but it only takes a cursory glance for a person to look at a text conversation to gauge the underlying mood and feelings.
Machine Generated Data
Here's an amazing picture from Splunk that shows how complex the environment is for a holistic solution:
You can see above, the different types of data and how quickly and complicated it can get. A general solution must accommodate data from different industries, there are very few solutions that can tie into new data and bridge existing systems (it's an awkward dance between outdated protocols and new requirements).
There are classical companies that are likely to benefit from Big Data - Oracle, IBM, Microsoft (MSFT), SAP AG (SAP), and Amazon.com (AMZN) to name a few. I would say they are to benefit more so because of their largesse rather than their finesse. These companies don't derive a significant percentage of their revenue from providing the 'new' Big Data solutions - Oracle and IBM do provide infrastructure and solutions but I consider that secondary exposure.
I'm going to focus on smaller sector companies that derive a large part of their revenues from providing big data solutions:
A fairly cheap company - it's only sitting at a sub-$1 billion valuation.
It has a two part business model:
- It provides data security solutions - data theft, access control, data privacy, auditing and regulation tools.
- Firewall solution - protects against active/advanced persistent threats.
An interesting combo focusing on privacy and security, especially given the NSA allegations and concerns.
Mateo Meier, director at Artmotion, Switzerland's biggest offshore hosting company, said revenues grew 45 to 50 percent last year as companies from industries as varied as oil and gas to technology to finance look for a place to store confidential data.
I like this company because of its cheap valuation right now, if you take a look at a few interesting items I pulled out:
Revenues are stagnating but the cost of revenues are very stable. Most tech startups burn through money whereas this has kept it to a moderate sub-25% of revenue.
From the tech IPOs of 2012 I've looked through, the SGA/R&D ratio is healthy. The sizeable investments are explained by a terse line as "$41.9 in net purchases of short-term investments," I expect that to taper down over the next few quarters. This is still considering they have a very big cash balance of about $100 million of which 98% of it is U.S. based, so it's not going to be inaccessible as some other companies' cash reserves are.
A legacy company from the 2000s, it provides holistic cloud/big data/analytics solutions.
Its customer list is extremely extensive and it's uniquely positioned to further extend service to its existing customer base.
Akamai and Imperva are down on recent earnings, making both a good deal to pick up and hold for the short/medium term.
AKAM has a nice steady increasing Net Income and healthy Cash Flow from Operating Activities
Because of their client list, I'd like to know that their SGA expenses aren't keeping up with their revenue/income but this is clearly showing that they're making sales without a bloated sales force; something every tech SaaS is striving for.
One thing not captured in their #s and financials is the size of their client list: they are supporting at least 250+ high profile customers. Most of these customers are only paying for one service solution at the moment - as the industry moves forward and user experiences and expectations consolidate - the need for video, mobile access, analytics will drive demand across every one of these customers. And AKAM won't pay a cent for any lead generation on each and every one of their existing clientele.
NICE Systems (NICE)
One of the leading non-US based tech companies providing voice and video analytics. They first provided voice and video capturing technology - your 911 call center is probably running routing software from NICE systems. They grew and branched out to provide analytics on top of this data.
Voice and video is an interesting communication medium especially in enterprise. It's used by consumers extensively through Skype/Google Hangouts. But there hasn't been a cross-over that's been effectively monetized.
You can check into a location using foursquare and provide geo data but a company can't intercept your video conversation with a friend about their products. NICE introduced a really interesting suite of analytical software in one of the consumer pain point interactions with companies: call centers. They're showing 10%-plus growth in analytics software used in call centers to better understand customers as they dial in.
This is a very interesting company - they provided the older systems that the call centers run on, and they're providing new analytics software to the same systems. Unlike other software solution providers - they have the expertise of providing the existing hardware and know the requirements of the call centers in and out, they don't need to work with documentation and interfacing as their own developers built the systems from the ground-up.
NICE's competitive advantage is probably what AKAM's will be like in the future - they provided the hardware and basic software systems at first, and they're reselling analytics software on top of their existing system. This moat will be extremely hard to breach - anyone knows that coding for undocumented, proprietary software coupled with hardware is a huge pain. Nice draw-down to pick up some stock.
If one thing is for certain amongst all consumer software and services, it's cloud deployment and storage of data. LSI and PMC Sierra both provide storage infrastructure: redundancy, backups, adapters, controllers, a non-negotiable must have for companies that want any sort of cloud solution.
I like LSI and PMCS particularly because of its low price, both can become a good part of your portfolio without diverting large amounts of cash: convenient especially if you don't have a large portfolio. Because these two companies are providing hardware components their risk & reward profile is the lowest. All Big Data solutions will arguably use their components whether they buy their own hardware or have them hosted in the cloud, so consider these two low-alpha investments. Below you can see nice solid appreciation in both the stocks:
All five of these companies have excellent exposure to the rise in Big Data - from holistic software solutions, to niche analytics, and underlying infrastructure. These are all great short to medium term exposure to the rise in Big Data.