Artificial Intelligence ('AI') is a runaway success and we think it is going to be one of the biggest, if not the biggest driver of future economic growth. There are major AI breakthroughs on a fundamental level leading to a host of groundbreaking applications in autonomous driving, medical diagnostics, automatic translation, speech recognition and a host more.
See for instance the acceleration in speech recognition in the last year or so:
We're only at the beginning of these developments, which is going in several overlapping stages:
- Improvements in big data and neural networks.
- The use of big data in the cloud.
- The use of GPUs for machine learning in the cloud.
- The development of specialist AI chips.
We have described the development of specialist AI chips in an earlier article, where we already touched on the new phase emerging - the move of AI from the cloud to the device (usually the mobile phone).
This certainly isn't a universal movement but involves inference (the application of the algorithms to answer queries), rather than the more computing-heavy training (where the algorithms are improved through iteration rounds with the help of massive amounts of data).
Since GPUs weren't designed with AI in mind, so in principle, it isn't much of a stretch to assume that specialist AI chips will take performance higher, even if Nvidia is now designing new architectures like the Volta with AI in mind at least in part, from Medium:
Although Pascal has performed well in deep learning, Volta is far superior because it unifies CUDA Cores and Tensor Cores. Tensor Cores are a breakthrough technology designed to speed up AI workloads. The Volta Tensor Cores can generate 12 times more throughput than Pascal, allowing the Tesla V100 to deliver 120 teraflops (a measure of GPU power) of deep learning performance... The new Volta-powered DGX-1 leapfrogs its previous version with significant advances in TFLOPS (170 to 960), CUDA cores (28,672 to 40,960), Tensor Cores (0 to 5120), NVLink vs. PCIe speed-up (5X to 10X), and deep learning training speed (1X to 3X).
However, while the systems on a chip (SoC) that drive mobile devices contain a GPU processor, these are pretty tiny compared to their desktop and server equivalents. There is room here too for adding intelligence locally (or, as the jargon has it, 'on the edge').
Why would one want to put AI processing 'on the edge' (on the device rather than in the cloud)? There are a few reasons:
- Better privacy and security
- Lower latency
- Works without internet connection
The privacy issue was best explained by SA contributor Mark Hibben:
The motivation for this is customer privacy. Currently, AI assistants such as Siri, Cortana, Google Assistant, and Alexa are all hosted in the cloud and require Internet connections to access. The simple reason for this is that AI functionality requires a lot of processing horsepower that only datacenters could provide. But this constitutes a potential privacy issue for users, since cloud-hosted AIs are most effective when they are observing the actions of the user. That way they can learn the users' needs and be more "assistive". This means that virtually every user action, including voice and text messaging, could be subject to such observation. This has prompted Apple to look for ways to host some AI functionality on the mobile device, where it can be locked behind the protection of Apple's redoubtable Secure Enclave. The barrier to this is simply the magnitude of the processing task.
Lower latency and a possible lack of internet connection are crucial where there are life and death decisions that have to be taken instantly, for instance in autonomous driving.
Security of devices might benefit from AI-driven behavioural malware applications, which could run more efficient on specialist chips locally, rather than via the cloud.
Specialist AI chips might also provide an energy advantage, especially when some AI applications already use the local resources (CPU, GPU), and/or depend for data on the cloud (especially in scenarios where there is no Wi-Fi available). We understand that this is one motivation for Apple (NASDAQ:AAPL) to develop its own AI chips.
These low-end phones can be about 50 times slower than a good laptop-and a good laptop is already much slower than the data centers that typically run our image recognition systems. So how do we get visual translation on these phones, with no connection to the cloud, translating in real-time as the camera moves around? We needed to develop a very small neural net, and put severe limits on how much we tried to teach it-in essence, put an upper bound on the density of information it handles. The challenge here was in creating the most effective training data. Since we're generating our own training data, we put a lot of effort into including just the right data and nothing more.
One route is what Google is doing by optimizing these very small neural nets and feeding it with just the right amount of data. However, if more resources were available locally on the device, these constraints would loosen. Hence, the search for a mobile AI chip that is more efficient in handling these neural networks.
Dynamiq goes beyond offering just additional flexibility, and will also let chip makers optimize their silicon for tasks like machine learning. Companies will have the option of building AI accelerators directly into chips, helping systems manage data and memory more efficiently. These accelerators could mean that machine learning-powered software features (like Huawei's latest OS, which studies the apps users use most and allocates processing power accordingly) could be implemented more efficiently.
ARM is claiming that DynamiQ will deliver a 50 times increase in "AI-related performance" over the next three to five years. That remains to be seen, but it's noteworthy that they are designing chips with AI in mind.
The major user of ARM designs is Qualcomm and this company is also adding AI capabilities to its chips. It isn't adding hardware, but a machine learning platform called Zeroth, or the Snapdragon Neural Processing Engine.
It's a software development kit that will make it easier to develop deep learning programs directly on the mobile (and other devices run by Snapdragon processors). Here is the selling point ( The Verge):
This means that if companies want to build their own deep learning analytics, they won't have to rent servers to deliver their software to customers. And although running deep learning operations locally means limiting their complexity, the sort of programs you can run on your phone or any other portable device are still impressive. The real limitation will be Qualcomm's chips. The new SDK will only work with the latest Snapdragon 820 processors from the latter half of 2016, and the company isn't saying if it plans to expand its availability.
An exciting development in this field is Facebook's stepped up investment in Caffe2, the evolution of the open source Caffe framework. At this year's F8 conference, Facebook and Qualcomm Technologies announced a collaboration to support the optimization of Caffe2, Facebook's open source deep learning framework, and the Qualcomm Snapdragon neural processing engine (NPE) framework. The NPE is designed to do the heavy lifting needed to run neural networks efficiently on Snapdragon, leaving developers with more time and resources to focus on creating their innovative user experiences.
IBM is developing its own specialist AI chip called True North. It is a unique product that mirrors the design of neural networks. It will be like a 'brain on a phone' the size of the brain of a small rodent, packing 48 million electronic nerve cells, from Wired:
Each chip mimics about a million neurons, and these can communicate with each other via something similar to a synapse, the connections between neurons in the brain.
The chip won't be out for quite some time, but its main benefit is that it's exceptionally frugal, from Wired:
The upshot is a much simpler architecture that consumes less power. Though the chip contains 5.4 billion transistors, it draws about 70 milliwatts of power. A standard Intel computer processor, by comparison, includes 1.4 billion transistors and consumes about 35 to 140 watts. Even the ARM chips that drive smartphones consume several times more power than the TrueNorth.
For now, it will do the less computationally heavy stuff involved in inferencing, not the training part of machine learning (feeding algorithms massive amounts of data in order to improve them). From Wired:
But the promise is that IBM's chip can run these algorithms in smaller spaces with considerably less electrical power, letting us shoehorn more AI onto phones and other tiny devices, including hearing aids and, well, wristwatches.
Considering its energy needs, IBM's True North is perhaps the prime candidate to add local intelligence to devices, even tiny ones. This could ultimately revolutionize the internet of things (IoT), which itself is still in its infancy but based on simple processors and sensors.
Adding intelligence to IoT devices and interconnecting these opens up distributed computing on a staggering scale, but speculation about its possibilities is best left for another time.
Apple is also working on an AI chip for mobile devices, Apple's Neural Engine. There isn't much known in terms of detail; its use is to offload tasks from the CPU and GPU so saving battery and speed up stuff like face and speech recognition and mixed reality.
Then there is the startup called Groq, founded by some of the people who developed the Tensor at Google. Unfortunately, at this stage, there is very little known about the company, apart from the fact that they're developing a Tensor like AI chip. Here is Venture capitalist Chamath Palihapitiya (from CNBC):
There are no promotional materials or website. All that exists online are a couple SEC filings from October and December showing that the company raised $10.3 million, and an incorporation filing in the state of Delaware on Sept. 12. "We're really excited about Groq," Palihapitiya wrote in an e-mail. "It's too early to talk specifics, but we think what they're building could become a fundamental building block for the next generation of computing."
It's certainly a daring venture as the cost of erecting a new chip company from scratch can be exorbitant and the company faces well established competitors with Google, Apple and Nvidia (NASDAQ:NVDA).
What is also unknown is whether the chip is for datacenters or smaller devices providing local AI processing.
The current leader for datacenter "AI" chips (obviously, these are not specific AI chips but GPUs that are used to do most of the massive parallel computing of training neural networks to improve the accuracy of the algorithms.
But it is building its own solution for local AI computing in the form of the Xavier SoC, integrating CPU, CUDA GPU and deep learning accelerators and the GPU now contains the new Volta architecture. It is built for the forthcoming Drive PX3 (autonomous driving).
However, Nvidia's Xavier will feature its own form of TPU that it calls a Tensor Core, and this is built into the SOC.
The advantage for on-device computing in autonomous driving is clear - it reduces latency and the risk of loss of internet connection. Critical autonomous driving functions simply cannot rely on spotty internet connections or long latencies.
From what we understand, it's like a supercomputer in a box, but that's still too big (and too power hungry, sipping 20W) for smartphones. But needless to say, autonomous driving is a big emerging market in and by itself, and in time, this stuff tends to miniaturize, and that TPU itself will be a lot smaller and less energy hungry so it might very well be applicable in other environments.
Before we get too excited, there are serious limitations to putting too much AI computing on small devices like smartphones, here is Voicebot:
The third chip approach seems logical for on-device AI processing. However, few AI processes actually occur on-device today. Whether it is Amazon's Alexa or Apple's Siri, the language processing and understanding occurs in the cloud. It would be impressive if Apple could actually bring all of Siri's language understanding processing onto a mobile device, but that is unlikely in the near term. It's not just about analyzing the data, it's also about having access to information that helps you interpret and respond to requests. The cloud is well suited to these challenges.
Most AI requires massive amounts of computing power and massive amounts of data. While some of that can be shifted from the cloud to devices, especially where latency and secure coverage are essential (autonomous driving), there are still significant limitations for what can be done locally.
However, the development of specialist AI chips for local (rather than cloud) use is only starting today and a new and exciting market is opening up here, with big companies like Apple, Nvidia, STMicroelectronics (NYSE:STM), and IBM all at it. And the companies developing cloud AI chips, like Google and Groq might very well crack this market too, as Google's Tensor seems particularly efficient in terms of energy use.
Disclosure: I/we have no positions in any stocks mentioned, and no plans to initiate any positions within the next 72 hours.
I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.