The query is no longer "show me all persons who have committed acts of terrorism in the past." The query has changed to "show me all Internet citizens who are most likely to commit acts of terrorism based on their web surfing behavior." We used to ask "show me a list of customers whom have purchased drywall." Now, we ask "show me a list of persons who are likely to be remodeling their kitchens based on credit card purchase activity."
Computing is at a crossroads because of two main reasons:
- Data is exponentially more vast, but less and less structured.
- The questions we ask of collected data, and the answers we crave, have become more complex, more open-ended, and more predictive.
This article explores the trend of "Big Data," how this trend is changing, and why I believe Cray, Inc. (CRAY) may do well as a Big Data newcomer. When reading this article, understand that Cray has three main business lines:
- Super-computer Provider
- Data Storage Solutions
- Services: Big Data Solutions Provider
This article aims to create understanding around Cray's Big Data strategy.
Big Data: Trend or Annoying Buzzword?
"Big Data" is a catch-all term that relates the idea that data in the enterprise has changed and we, as an industry, are having to change our methods to collect, store and query it. From Google Trends, you can see how search terms like "big data," "hadoop" and "NoSQL" have exploded over the last few years while the term "RDBMS" (for "Relational Database Management System" aka a traditional database system) have decreased in popularity.
In my own enterprise, I have witnessed a departure from traditional data processing methods on stand-alone servers, to using clusters of commodity servers joined together under the Hadoop architecture. Using Hadoop, a data processing problem that used take 20 hours on one server, now takes 10 minutes when spread across an 80 server cluster.
Another great indicator of a trend is the passion with which human beings speak about it. I can tell you, as a software engineering professional who frequently interviews young, prospective employees, "Big Data" is all the rage at virtually every college campus on the East Coast. On top of that, most veteran software engineers I know love to bring up the topic of Big Data.
How Does Cray Fit In?
Cray is a company in the midst of transition. For years, it built extreme high-end super-computers mainly used for modeling and simulation. Its super-computers have been used to simulate and improve airplane and car safety, improve medical treatment, and to execute weather prediction models. This business has traditionally made the revenue stream very lumpy and susceptible to the ups and downs of scoring large contracts at large computing centers such as Oak Ridge National Laboratory, the Swiss National Computing Center, and DOD Supercomputing Resource Centers.
Cray has recently made a few very intelligent moves which illustrate plans to leverage its vast knowledge of super computing to enter the big data market. First, in April, Cray took advantage of an interconnect chip "arms race" between Intel and AMD to sell its own interconnect business to Intel for a boatload of cash. $140 million to be exact. The interconnect technology allows nodes of computers to be connected together at high speeds and in multiple directions (picture a mesh of connected servers). When explaining why Cray ceded super-computer interconnect technology development (its bread and butter for the better part of a decade) Cray CEO Peter Ungaro made the following comment:
"From the technology trends we saw, the hardware was not going to be as important to us in the future as the software trends."
Ungaro saw Intel and AMD ramping up into the interconnect business and didn't want to be stuck in the middle between two heavy-weight fighters.
Shortly after this sale, Cray utilized its newfound cash to shrewdly purchase high performance computing rival Appro International for $25 million. This will add $60 million to 2013 revenue, but more importantly, it completes Cray's move away from the custom interconnect chip business and adds the ability to sell more generic high performance clusters. This makes tremendous sense if you want to service the Hadoop Big Data market with, still very powerful servers, but at a more affordable price. In the most recent 3rd quarter conference call, Ungaro said this about the Appro acquisition:
And finally, teaming with Appro gives us new options to expand our offerings in Big Data, both alongside our uRiKA graph analytics appliance and as part of a broader portfolio of solutions in the future as the big data market is large and growing rapidly and a good portion of it is built on the model of industry-standard hardware leveraging a variety of software applications. Cray brings our unique ability to integrate HPC technologies into Big Data solutions to give customers a strong value proposition, as we've shown with our uRiKA appliance. We believe we can leverage our technology in this rapidly evolving market, and with this acquisition, we have even more opportunities to drive growth into the future.
Hadoop's Fatal Flaw: YarcData to the Rescue
Hadoop is great at carving up data processing problems into small pieces, creating output from each small piece, and then joining the output together. However, what if you have an extremely large dataset where the small pieces need to interact with one another to extract meaning? For example, in a social network setting, if you are processing tweets from suspected terrorists, those tweets might not mean much in isolation, which is how Hadoop would process them. However, when you combine the tweets as responses to tweets from other Internet citizens and other potential terrorists, those tweets take on much more meaning. So, how do we extract hidden meaning from massive datasets? The answer lies within a Graph Database structure.
As datasets grow and become more unstructured, it takes more and more Map/Reduce steps which increases complexity and processing times. In a graph database structure, loose associations are stored between people, objects, and places. Below is a very simple example showing us "Bob" and "Alice" know each other and are members of a chess club.
YarcData, which is a division of Cray, sells a unique graph analytics appliance called "uRiKA." Here is how the company describes it:
YarcData's uRiKA is a Big Data appliance for graph analytics that enables enterprises to discover unknown relationships in Big Data. uRiKA is a highly-scalable, real-time platform that supports ad hoc queries, pattern based searches, inferencing and deduction. uRiKA is a purpose-built appliance for graph analytics featuring graph-optimized hardware that provides up to 512 terabytes of global shared memory, massively-multithreaded graph processors supporting 128 threads per processor, and an RDF/SPARQL database optimized for the underlying hardware enabling applications to interact with the appliance using industry standard interfaces.
The fact that this appliance provides up to 512 terabytes of shared memory is, not only AMAZING, but it is of particular importance when dealing with huge graph databases. A graph is basically a "spidering" web of associations, attributes and objects. To query a graph database, an application must "traverse the graph", locate and retrieve results. In Hadoop, this cannot be done easily because a large graph would need to be spread across hundreds of physical servers. In the graph world, in order to traverse the graph and obtain peak performance, the entire graph database needs to be loaded into accessible memory. The uRiKA appliance does just this. It allows massive graph databases to be loaded into memory and queried at tremendously high speeds.
Phillip Howard, Research Director for Data Management at Bloor Research, describes uRiKA's capabilities this way:
Suppose you want to identify three or more people who are connected in some way (directly or indirectly) at least one of whom has rented or bought a truck, one of whom has bought fertilizer even though he doesn't own or work on a farm, one of whom has visited a website dealing with bomb making, and one of whom has been seen visiting national monuments. Graph analytics allow you to search for this pattern in the sea of what otherwise might appear to be innocuous relationships that when identified form a plot. Then, once you have detected these persons of interest, you can graphically visualize the relationships between these people and things and search for more evidence of this possible plot. And, according to YarcData uRiKA can do this in (NEAR) real-time, compared to the days or weeks that might be required using conventional methods.
Conclusion: Cray's Future
Cray is leveraging its rich history in building high performance super-computers and combining its YarcData unit to create the future of Big Data. As companies realize the limitations of Hadoop for answering many of today's toughest data-driven questions, Graph databases will move into the void. I believe Cray and YarcData are positioned extremely well to take advantage of this Big Data sub-trend.
YarcData has also intelligently positioned the software stack which runs on top of the uRiKA hardware appliance. It is currently based on JAVA, Apache, and SPARQL which are all industry standards. Therefore, there is already a large software developer base out there which is certainly hungry for this product.
From the Harvard Business Review Blog, consider these telling statistics from a recent survey of Fortune 1000 executives:
85% of organizations reported that they have Big Data initiatives planned or in progress.
15% of respondents ranked their access to data today as adequate or world-class.
In sum, I believe Big Data is changing and Cray is well suited to take on the challenge of delivering solutions to a hungry market. The uRiKA appliance was debuted in February 2012 and, already, there have been two contract wins but these were with existing clients who are large governmental agencies (Oak Ridge National Laboratory and the Pittsburg Supercomputing Center). Based on the data processing void the uRiKA product fills, I will be looking for large contract wins with Fortune 1000 companies. If I see near-term wins, I will likely be a buyer.