A couple of weeks ago Intel (INTC) investors received a jolt with the news that Google (GOOG) is looking to design its own server chip based on Arm's (ARMH) technology. Ashraf Eassa conducted an efficient rebuttal, though Valuable Insights spied a report that would seem to suggest there is some merit to the idea. While this drama is all very interesting, I believe it ignores the true revolution going on in the server world.
On Nov. 18th, Micron (MU) introduced its Automata Processor at Supercomputing 2013. Its massively parallel non-von Neumann architecture got it mentioned on a few high performance computing sites, but was largely ignored by the world as a whole, probably because it's not well understood. I'd like to thank Paul Dlugosch, Micron's Director of Automata Processor Technology Development, for answering my questions and helping me to understand it a bit better so that I might write this article.
A typical processor executes one instruction at a time, and spends quite a bit of its time waiting while it fetches more data to work on (known as the "von Neumann bottleneck.") With multithreading, a small number of separate operations can be executed at once. A single Automata Processor has tens of thousands of simple processors within it, each capable of separate parallel execution, along with some nonvolatile memory that stores the data to be processed. An Automata Processor is a coprocessor and doesn't replace the CPU, but instead augments it. 8 Automata Processors can be placed on a memory module and plugged into a typical DIMM, but they do not replace DRAM. However, this can be scaled upwards in the same way that you might add more DRAM.
Picture one of Intel's Xeon processors as a small team of highly intelligent (and expensive) proofreaders going through a book one page at a time. Add one of Micron's Automata Processors to this picture and the proofreaders can outsource their work to 50,000 moderately intelligent, lowly paid workers, each independently proofreading a page while the Xeon bosses are given the role of passing the pages out. If the need arises, perhaps the Xeon overlords hire 16 different groups of automatons and give a book to each. The collective intelligence of the Automata Processor dwarfs that of the Xeon in this kind of application, in the way that a beehive collectively has more complex behavior than its queen.
The Automata Processor is designed to be good at massively parallel operations, where many parts can be worked on simultaneously. Micron's Automata Processor intern is expected to help with:
"Application development that exploits the Automata Processor in key target markets, including; bioinformatics, image/video analytics, big data analytics, cyber security."
That's because these are examples of areas where many different processors can simultaneously work on different bits of data, or do different things to the same piece of data. Google indexing of the web is a great example of a massively parallel Big Data operation. Biotech firms might use the processor to examine DNA, look for drug candidates, or fold proteins. The NSA is probably salivating over the possibility of using the Automata Processor to analyze the flood of global communications. While Micron's first target applications are in the server or supercomputing fields, later applications might include something like a mobile coprocessor that could quickly search your photos for familiar faces.
The increase in efficiency with the Automata Processor is shocking. Micron compared 48 Automata Processors to a $20,000 cluster of 48 Xeons running "NP-hard" problems and found that when the complexity of the problem was increased, the time taken to solve it increased exponentially with the Xeons, and linearly with the Automata Processor. In other words, increasing the complexity would quickly make some problems unsolvable on the Xeons, while with the Automata, you may want to take a longer lunch break. With a simpler problem the Automata Processor solves in minutes what the Xeons solve in hours. Not only was the solution incredibly quick, the Automata Processors consumed only 245 to 315 watts, compared to the more than 2,000 watts consumed by the Xeons. Finally, at the risk of making this sound unbelievable, I will mention that the current prototype Automata Processor is fabricated on a 50nm process (which is about 5 years old) and is the size that a 2Gb DRAM chip would be at that node (2Gb DRAM chips can be fabricated for just over a buck each) -- we can only imagine what the processor would be like on a state of the art process.
Some problems that the Automata Processor would excel at are not currently realistically doable with conventional processors. Less demanding work that it might take on is currently done with a combination of CPUs from companies such as Intel and IBM (IBM), FPGAs from companies such as Xilinx (XLNX) and Altera (ALTR), and GPUs from Nvidia (NVDA). Google, Microsoft (MSFT), and Amazon (AMZN) all have in excess of a million servers each, so this is not a small market. Even a small share of the market could eventually lead to billions in revenue for Micron. I believe this would be a relatively neutral development for Intel because CPUs would still be needed in each server with Automata Processors, though far fewer in some. However, Intel's close relationship with Micron should help it in designing a CPU optimal for taking advantage of the Automata Processor. For Xilinx, Altera, IBM, and Nvidia, I look on this as a clearly negative development in the high performance computing space.
Micron has been working on the Automata Processor for seven years, and has the first revision in its R&D facility in Boise. Samples of the Automata Processor, along with a software development kit, will be available for partners and early adopters by mid 2014. Public availability has not yet been announced. Once it's publicly available, the speed of uptake will be dictated by the ease with which the end user can program applications to exploit its unique advantages. To that end, Micron is working with academia in the University of Virginia and the University of Missouri. Programming for parallelism has traditionally been a tricky thing, and while the new architecture of the Automata Processor may initially slow uptake, the way it naturally fits problems with unstructured parallelism may keep the transition from being too difficult.
Micron's Automata Processor gives it an entry into a new market with a product that is faster, uses less power, and is cheaper than the competition. It also moves Micron further away from the commodity market, along with other initiatives like the Hybrid Memory Cube. While I don't expect significant revenue (if any) from the Automata Processor in 2014, I believe it may be a major catalyst for the company in the longer term. I expect to see Micron above $30 within six months, and significantly higher if the Automata Processor lives up to its promise.