Last month, Google (NASDAQ:GOOG) (NASDAQ:GOOGL) announced, among other things, the Tensor Processing Unit (TPU). While a number of Seeking Alpha commenters have hailed this innovation (here and here), none of them have really touched on the technicals reasons behind this, other than general comments on 'better machine learning'. This article is a brief summary of my thoughts on Google's move into custom machine learning hardware.
What is the TPU?
Let's begin by clarifying what exactly the TPU is. It's an application specific integrated circuit (ASIC). A normal CPU is a general purpose computer, it can run any conceivable program. An ASIC is purpose-built hardware where logic is encapsulated in the actual physical chip-layout, i.e. through the specific arrangement of logical gates that perform the desired function. This makes ASICS generally much (often by a magnitude) faster/more energy efficient than executing the same logic through software on a general purpose processor.
Designing and fabricating an ASIC makes sense if a) performance is so critical that cost is not an issue (like sending a roboter to Mars) or b) the ASIC will be mass-produced to make it worth the upfront design cost in the tens of millions. There is nothing really special about ASICs, building one is a strategic decision and not some hardware revolution. The TPU is simply an ASIC for low-precision matrix operations, and here we are getting to the crux of why Google is relying on custom hardware.
On a sidenote, today's graphical processing units ((GPUs)) are often called general purpose GPUs, but that refers to the fact that they can be used for things other than graphical processing (rendering/rasterization), namely numerical operations. Deep learning is primarily done through matrix operations - updating the weights of the neural network through gradient descent, multiplying inputs with weights. GPUs are hence the primary tools for heavy-duty machine learning.
Towards more efficient deep learning
I have commented many times on Google's comprehensive machine learning research and data processing infrastructure. To understand what's going on right now, we will take a short excursion into deep neural network research. A good, (almost) non-technical intro to the topic is given here.
A neural network consists of layers of neurons, weights and activation functions. The output of each neuron is computed by multiplying the input from the previous layer with the weights of the neuron and a bias term. These are generally floating point numbers (except in this toy example above) and the performance of GPUs is given in Teraflops, i.e. floating point operations per second. Large neural networks, e.g. for image classification, have millions to billions of parameters and take weeks to train on up to hundreds of GPUs.
Overall, deep learning is still a very fast moving field. Every few months, a new research paper changes the way training is done substantially (e.g. batch normalization last year). As with any newer technique, users were first and foremost concerned with how effective it is - not how costly. That's why for a while, a lot of research projects particularly from major tech companies have thrown larger and larger networks and more computational resources at the same problems (e.g. a long-standing image-classification problem) to get the highest classification accuracy. This is still being done for a number of problems, but industry players are now also very much concerned with the economical side of things.
First, how can the training itself can be done cheaper and faster? Second, a trained neural network model is essentially just a list of weights, represented as a multi-dimensional matrix (or a tensor) in Google's deep learning framework TensorFlow. What should be done with the weight data after training? What insights can be derived from the final weights?
Once you have trained a neural network on your GPU in floating point representation, every production request (e.g. classify a new image/email/..) is nothing more than passing these requests through the network with these fixed weights. This is great because it means nothing is really stopping you from transforming floating point weights into a representation that is more efficient to compute. As it turns out, you can do really well with integer or even binary weights. Integer (or fixed point) computations are much faster for a variety of reasons, for instance because integer additions can easily be optimized by compilers (loop unrolling, partial sums).
Another key issue with deep learning is that it is really hard to say in advance how many layers and neurons a neural network needs, so the network is likely to have more weights than it actually needs to learn a task. Designing a neural network (number of layers, neurons per layer) is hence more an art than a science. Another area of research is hence reduced representations of a given network. For instance, one can apply singular value decomposition to a trained network to determine which weights are actually relevant to a classification problem. As it turns out, this allows users to discard a large proportion of weights while getting almost the same classification accuracy. These questions are extremely relevant to all machine learning players right now because compressing large models to a size that allows them to be used on mobile devices is a key problem for machine learning application.
Google is at the forefront of deep neural network research thanks to its Google brain team and extensive collaborations with leading research universities. The TPU allows Google to leverage all the above insights into compressed, integer/binary representations that are much faster to compute.
Nvidia (NASDAQ:NVDA) is of course very much aware of these issues and has introduced half-precision floats (floating point 16), which essentially double throughput versus normal floats. However, this is still a far cry from using the specific integer or representation Google might want. Nvidia as a company has to think in major product cycles and also has to consider that the majority of its revenue is still from gaming (and in the not-so-far future VR/AR(. In short, general purpose GPUs cannot really deliver the specifications to support the latest machine learning research at the exact performance/precision/energy-consumption trade-off a company like Google might need.
The machine learning market is shifting from an exploratory phase to the commercalization stage. Serving machine learning models to billions of users in production is what concerns Google. I would expect Microsoft (NASDAQ:MSFT) to follow suit with custom neural network processing hardware shortly. IBM (NYSE:IBM) is also very much involved in AI hardware through its TrueNorth project. The TPU is a close-guarded secret at Google (no access even for research collaborations right now as far as I know), which is very untypical and illustrates the belief at Google that large scale machine learning will be its differentatior in the cloud.
Nvidia is not threatened by this move right now, but this could change in the medium term if Google, Microsoft and Amazon (NASDAQ:AMZN) determine they would rather design complete machine learning chip sets on their own. Acquiring Nvidia outright might actually be more beneficial in the AI race, but this seems equally unlikely given Google's acquisition strategy.
Disclosure: I/we have no positions in any stocks mentioned, and no plans to initiate any positions within the next 72 hours.
I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.