Nvidia been enjoying picking on the CPU, and skating around TPU as Google is not selling the chip to customers. Graphcore IPU is better ML/DL chip, and the CEO is now publicly providing material revenue guidance based on visibility. All this news been flying quietly under the Wall Street radar.

Google/Microsoft are now demonstrating that marketing hype in ML/DL is not going to cut it. Both companies have announced very notable enterprise customer wins for non-GPU ml/dl platforms.

Crypto/Gaming noise has resulted in material information being completely ignored. In light of the recent huge surge in hyperscale capex during h1 2018, Nvidia's datacenter business is starting to underperform.

Nvidia: A Datacenter AI Bear Thesis

There is a lot of excitement with respect to investing for AI powered future, and as is often the case when investors get excited about a big theme the IQ of what they are investing in tends to suffer. The aim of this report is to essentially improve investor IQ in a segment, machine intelligence hardware, that desperately needs it. As the public market IQ in ML/DL hardware is currently limited to an infatuation with Nvidia’s datacenter business, this report will focus on the extensive flaws in that narrative. By the time I’m done you should understand why AI in the datacenter, the main reason often cited to own the stock for the long haul is in fact the reason to be short the stock today. I am going to cover a lot of topics in the process, but I think the best way to kick off this report is with some brief summaries of what hyperscale ‘Super 7’ have been up to with respect to non-Nvidia ML/DL hardware lately.

Alibaba (NYSE:BABA)- Intel FPGAS power Alibaba’s Cloud Acceleration-As-A-Service F1 Instance and Xilinx has been selected to power the next-gen FPGA F2 instance. Also, a few months ago Alibaba announced their custom AI accelerator chip, the Ali NPU, which they claim will deliver a 40x cost-performance improvement over existing architectures for their AI related workloads.

Baidu (NASDAQ:BIDU) - Last summer Baidu introduced a Xilinx (NASDAQ:XLNX) powered cloud Acceleration-As-A-Service instance. Baidu also inked a deal to work with AMD (NASDAQ:AMD) on using their GPU’s for some training related workloads. And a few weeks ago, Baidu introduced their own custom AI accelerator called Kunlun. The announcement includes training chip “818-300” and inference chip “818-100”. Kunlun supposedly can be applied to cloud and edge scenarios, such as data centers, public clouds and autonomous vehicles.

Microsoft (NASDAQ:MSFT) - A couple of months ago Microsoft previewed Project Brainwave, their real-time FPGA based cloud AI as a service platform. They have been touting Jabil’s (NYSE:JBL) use of the service on manufacturing lines over other alternatives, notably GPU’s, because of cost-performance advantages. Microsoft has also recently been hiring custom chip designers for their Azure division.

Facebook (NASDAQ:FB) - Facebook is probably the most open hyperscaler when it comes to sharing what they are doing AI hardware wise. They tell you how and why they use CPU’s for inferencing, and precisely what applications are trained on GPU’s and CPU’s or both and the specific hardware utilized and why. They also tell you that custom accelerators are suited for certain applications, and that they are experimenting w new types of hardware. All this is disclosed in a recently published paper by them which I will take a closer look at later. And just a couple of months ago, Facebook’s Yann Lecun tweeted that they are hiring chip engineers to build custom AI hardware. Recently, they also hired Google’s head of chip development.

Tencent (OTCPK:TCEHY) (OTCPK:TCTZF) -Tencent developed China’s first “FPGA Cloud Computing” service for its cloud service Cloud Virtual Machine. Compared to a CPU-based cloud server, the FPGA integrated CVM provides better computing power to support HPC application and deep learning development. They also have aggressively invested in AI chip startups.

Google (NASDAQ:GOOG)- Unlike the rest of the bunch, Google’s AI chip initiatives have grabbed decent press and there appears to be some understanding of what their TPU project actually entails. They are now on their third generation of their TPU, of which one pod now delivers 100 petaflops of deep learning processing power. They also recently introduced an edge TPU that will be made available to developers. Google clearly has very aggressive ambitions here and has simultaneously influenced the pace of development at their cloud competitors.

Amazon (NASDAQ:AMZN) - Amazon is known for custom hardware development across AWS, but so far has not made any major announcements. Based on their Blink and Annapurna Labs acquisitions, Amazon has nearly 500 employees with deep chip expertise. The current consensus view is that in-house teams are focused on designing edge AI chips for IOT devices like the Echo. At the datacenter accelerator level, Amazon so far has not revealed an in-house custom project along the lines of their peers.

So, the question you need to ask yourself is why are all these resource rich companies developing custom solutions in this space vs. simply being content buying whatever Nvidia has to offer?

This slide from Nvidia’s Blog may help answer this question.

This is Nvidia touting how much faster Alexnet can be trained today with their cutting-edge hardware vs. what it was trained on in 2012. For those not aware, Alexnet is the breakthrough conventional neural network ImageNet challenge winner that ignited the AI ML/DL field.

What's interesting here is that in 2012 those two-year old Nvidia GPU's cost roughly $700 to buy. The new DGX2 costs $400k. So, just on these numbers alone that 500x speed for 571x cost. Not exactly 'delivering exponential performance improvements'. This is more like performance degradation.

But to really grasp how comfy Nvidia is with their ability to market their narrative, they stick this on their blog and their CEO touts these numbers every time he speaks. This is essentially the bear thesis against them being advertised by them.

Now for someone like me or any other investor trying to understand why Google and nearly every other hyperscaler has built or is exploring building their own custom ml/dl chips you simply need to look at this slide, and you have your answer.

You can buy two used GTX 580's today for about $100. So, you are talking 500x speed up for 4000x cost. These benchmark ‘performance’ claims the more closely you examine them start to look pretty ridiculous.

And where are a lot of these more appropriately called throughput gains coming from?

Well, at launch the V100 Resnet-50 benchmark numbers where this….

A year later they have bumped up to this….

How do you go from 574 images/sec to 1075 images/sec in less than a year on the same chip?

Simple, you optimize the libraries for these well-known fairly basic CNN models, and you get them to run faster. Google has seen similar impressive Resnet-50 gains on TPU2 with images/second going from 1950 in February to a disclosed 3250 at Cloud Next this past month. So, if we really wanted accurate benchmarks, we should take older architecture GPU’s and apply all current libraries to truly measure processor gains. Intel has complained about the same issues with Nvidia comparing performance on CPU’s that were not using Intel optimized MKL libraries.

What happens when you optimize?

Well, papers like this are published which tout CPU performance that allowed researchers at UC Berkley, UC Davis, and U-Texas to train Alexnet in 11 minutes. This is of course followed with economic arguments highlighting how existing infrastructure can be leveraged to save money for machine intelligence workloads.

“ Facebook (Goyal et al. 2017) finishes the 90-epoch ImageNet training with ResNet-50 in one hour on 32 CPUs and 256 NVIDIA P100 GPUs (32 DGX-1 stations). Consider the price of a single DGX-1 station is 129,000 USD, the cost of the whole system is about 32×129,000 = 4.1 million USD. After scaling the batch size to 32K, we are able to use cheaper computer chips. We use 512 KNL chips and the batch size per KNL is 64. We also finish the 90-epoch training in one hour. However, our system cost is much less. The version of our KNL chip is Intel Xeon Phi Processor 7250, which costs 2,436 USD. The cost of our system is only about 2, 436 × 512 = 1.2 million USD. This is the lowest budget for ImageNet training in one hour with ResNet-50.”

Published academic research like this doesn’t fit smoothly into Nvidia’s “the more you buy, the more you save” GPU marketing company slogan.

To be perfectly clear, I am not defending Intel by highlighting these numbers, but rather simply pointing out there is a lot more going on here than most investors realize.

Take Facebook for example. Earlier this year, they published a research paper detailing their AI hardware and specific workloads running on said hardware.

Here is what ML/DL training is like at Facebook:

Now the inferencing part:

And here is an excerpt from that paper discussing inferencing at Facebook:

From a compute standpoint, the vast majority of online inference runs on the abundant 1xCPU (single-socket) or 2xCPU (dual-socket) production machines. Since 1xCPU machines are significantly more power and cost-efficient for Facebook, there is an emphasis toward migrating models from 2xCPU servers to 1xCPU servers whenever possible. With the rise of high-performance mobile hardware, it is even possible to run some models directly on the user’s mobile device to improve latency and reduce communication cost. However, some compute and memory intensive services still require 2xCPU servers for the best performance.

This paper is essentially Facebook showing you there are economic reasons behind all hardware decisions when it comes to AI. All training is not created equal. When you have extensive CPU capacity that can be leveraged in downtime you can use it for certain ml training workloads.

Which brings us back to this myth of the GPU being the future holy grail of AI computing. If software and optimization is one big piece of the pie that hyperscale AI giants can exploit by virtue of their near monopoly on human ml/dl resources, then naturally they are going to take a closer look at what they can do hardware wise.

When you look at what Google has done with the TPU, it says a lot about the compute problem in deep learning. The GTX 580 GPU’s used to train Alexnet had about 200GB/sec of memory bandwidth. The V100 has about 900GB/sec of memory bandwidth. Because of the massively parallel matrix multiplication required and the size of the models and number of coefficients needed, external memories are required with high bandwidth accesses. This means everyone is bottlenecked by the same memory bandwidth technology problem. The TPU simply recognizes this issue and embraces a more economically efficient workaround to their problem. Half the V100 die space is reserved for FP-64 and FP-32 math which is of no use for low precision deep learning workloads.

Why pay for graphics and super-computing when all you want is tensor cores and low precision math with maximum memory bandwidth?

It’s a great economic model for Nvidia to sell a one size fits all $10k chip to everyone, but as we have seen that is not going to be the future. These custom accelerators everyone is building in the hyperscale domain are about differentiation and total cost of ownership economically driven decisions. They don’t want to pay a margin for what they don’t need and can replicate more efficiently with deep knowledge of their own production neural nets and access to the same memory technology. Now the Chinese have figured this out, and naturally this appeals to their way of doing things. It’s the ASIC approach in cryptomining amplified by the fact that everyone looks at Nvidia’s profits here and sees easy dollar signs from essentially designing a stripped-down and reverse engineered custom accelerator card to do low precision linear algebra. Just last week Tencent led a $50 million round into AI chip maker Suiyuan Technologies. This company was founded in March and is just the latest in a long list of Chinese Ai chip startups that capital is being rained on to address a problem that is not capital intensive.

And here is a great quote from the PR:

"With China’s industrial upgrade, our country should be able to own independent and controllable key technologies. AI chip is China’s opportunity to catch up in the semiconductor sector," said Leiwen Yao, director at Tencent Investment. "We hope to leverage our industrial resources and work with China’s top teams to change the current situation of foreign AI chip makers’ monopolization."

You don’t need to be a genius investor to conclude that the economic outcome for the incumbent leader in the space won’t be pretty if a whole host of capital has been deployed to build custom ASIC’s. Basically, as far as the GPU goes in ML/DL, the commodity ASIC approach to the problem is a certainty. Will this just be a low-cost alternative for many large scale hyperscale production neural networkloads, while those working on new architectural solutions capture the higher-end? Probably, but that’s bad enough when you consider what that does to all the current assumptions around profitability that analysts are currently projecting. And when you consider the fact that everyone making datacenter AI chips is essentially chasing the same narrow cloud provider customer base, well these TAM numbers become utterly worthless. This is the foundation upon which semi-conductor bear thesis are built.

The Nvidia AI Datacenter Bear Thesis

Contrary to what some might think, I am not some Nvidia-has-gone-up-way-too-much-and-is-a-good-short bear. Back when sell-side analyst had mid-teen targets on the stock, I was out making an Nvidia bull case based on AI/Autonomous potential. At the time, the AI flavor of the market was Mobileye, which was being dubbed as the next WinTel by the likes of Ron Baron on CNBC. The narrative was so pervasive in sell-side land that Mobileye, which was trading already at double Nvidia’s enterprise value, had targets that were even double that. My surface analysis at the time told me Nvidia’s gaming/graphics business alone was probably worth double Mobileye, and that the nascent but suddenly gaining traction Datacenter GPGPU business as well as automotive future could be worth multiples of the market darling. I spent several months researching both companies and developed great industry contacts as well as broad knowledge on AI in the process. So, to put it bluntly, I get the bull narrative because I laid it out in early 2015.

GPGPU adoption leaves Nvidia well positioned in the Datacenter and Autonomous car future as an arms supplier, and because of this you essentially are buying into what will displace and then morph into Intel’s DCG business.

It’s a great thesis, and when anyone buys Nvidia shares today that’s precisely what they are paying top dollar for. And Nvidia management get this as well. That’s why every investor presentation or Jensen talk highlights datacenter TAM numbers and Intel server powered displacement GPGPU potential savings (often somewhat misleading btw but then that’s tech marketing). The narrative is pretty simple. Cloud hyperscale is a $20bl TAM, Enterprise is another $20bl, and Hpc rounds things out with $10bl. All things go according to plan, Nvidia DCG is the new Intel DCG. I personally used to love this thesis. Now I believe it’s a guaranteed certainty that it will never happen.

Why did I change my mind?

Most people may think my focus here is exclusively on silicon competition. And the Intel data points I cited earlier, my regular comments on certain AI hardware startup threats, and of course my Google TPU table pounding as far as big picture implications all support such a thesis. Commoditization and margin contraction from competition is not a hard argument to make semiconductor land, and I have no doubt it will be a notable driver going forward. But it’s not my sole focus here. My bearish view is far more big-picture oriented. I think the best analogy for the current Nvidia bull thesis in AI is the 1999 “Four Horseman of the Internet”. Sun Microsystems, EMC, Cisco, and Oracle were viewed as the no brainer infrastructure plays on the Web. This time around you can argue that the Nvidia AI datacenter/workstation story sounds a lot like the old Sun Microsystems.

Nvidia's new company tag line is "The AI Computing Company"

Remember Sun Microsystems famous slogan????

Sun killed it as enterprises and dot coms gobbled up their UltaSparc servers to get online. Sun’s model of differentiated silicon and a closed software stack producing very powerful and reliable hardware at super high margins for a massive TAM such as ‘THE Web’ was celebrated by all of Wall Street. Then Linux and x86 came along and commoditized their business nearly overnight. I see many similarities with Nvidia’s DGX stations to Sun’s Sparc Servers. The marketing approach is similar, and the future commoditization concerns are also kind of a no brainer comparison. But what I don’t see with Nvidia is anything close to the sweet spot TAM Sun had. Researchers and startups buying workstations for deep learning training is far smaller market then every dotcom and enterprise on earth seeking to get on The Web. And if you wanted to debate this point, it’s not the big problem.

These guys are…

Cloud Hyperscale Compute Infrastructure Centralization has completely changed the IT spending landscape.

Hyperscale companies spent $75 billion on capital expenditures in 2017.

The Big Five accounted for 70% of all Hyperscale Capex, and as group the Hyperscale giants are now approaching 40% of quarterly server purchases. Amazon alone sold itself more servers this past quarter to account for 10% of all global server sales in the quarter. Naturally this type of scale is reshaping the broader IT spending landscape.

Just look a recent server vendor market share data…

ODM Direct surpassed 25% of all global server revenue in this past quarter, and this trend shows no signs of slowing down. The balance of power implications here are staggering.

Just consider these numbers from a Cisco 2016 White Paper on the scale of hyperscale data centers.

Global IT infrastructure is now concentrated in the hands of very few and barring regulatory intervention things are only going to get worse.

The market has recognized this change, and global market capitalizations now reflect this huge paradigm shift.

World’s most valuable companies end of Q2 2010:

Now, end of Q2 2018:

The top seven global tech giants now account for 25% of the $20 trillion market value of the world’s largest 100 companies by market value.

So, what does this all mean???

Well, this past quarter, Amazon/Google/Microsoft bought more servers than Dell/HP combined sold worldwide. Think about that for a second. In 1999 global server sales were $57.5 billion for 3.92 million machines. Last year total server sales were $65billion for 10.2 million units. That’s what 20 years in distributed computing evolution looks like. This profound shift is obviously visible in the market values of leading tech companies. It’s how we have gone from the ‘Four Horseman of the Internet’ to ‘Hyperscale Super 7’. This is why Intel makes their new Skylake chips available to hyperscalers several months before anyone else, and precisely why they work closely with the likes of Facebook on hardware like their Lake Crest/Spring Crest accelerators. The fact that Lake Crest has been made available to hyperscale customers for software development purposes ahead of Spring Crest tells you how much the silicon model has changed. Intel essentially advertised this fact at their most recent analyst day.

The below slide shows that 50% of their datacenter cloud service provider CPU volumes are customized chips, and Intel management went further to state that they have 30 CPU SKU’s across the top ten hyperscalers.

The rise of the hyperscalers is also why Cisco (NASDAQ:CSCO) is designing custom Nexus switches for Azure, and why AMD 7nm GPU roadmap is what it is. Being a IT component supplier is tough business these days, and it’s going to get a lot tougher.

Why?

Simple, a shrinking profit pie. Hyperscalers buy in huge volumes, and as a result they demand significant price concessions. Consequently, component players have become ever more reliant on extracting profits out of non-hyperscale enterprise customers. You don’t hear much about the non-hyperscale enterprise datacenter piece of the pie, but this ever-shrinking group is still roughly 50% of server spend. And unfortunately for them, they are the ones bearing the brunt of the rising component ASP’s. This is of course not what you want to see as a long-term investor in the likes of an Intel or Cisco.

Here is why…

1) Hyperscale purchasing power and in-house resources provide leverage over traditional IT component suppliers to demand price concessions.

2) Non-hyperscale enterprise data center customers whose spend is shrinking because they are moving significant workloads to hyperscale public clouds end up being squeezed on ASP’s by component suppliers for remaining captive spend.

3) The fact that they are paying more than their hyperscale giants for hardware simply increases the value proposition of moving workloads to public clouds.

The cycle of 1-3 leads to the accelerated shrinking of the non-hyperscale more profitable pie, and an ever-increasing reliance on the hyperscale giants.

It’s the Apple supplier effect on a grand scale, and we are now passing the 50% tipping point. It’s critical to really grasp where we are at today, because things are going to change awfully fast. Right-now component suppliers are getting the best of both worlds. A big surge in hyperscale capex coinciding with a much more profitable mix on a shrinking enterprise pie. This is not going to be the story going forward. Just like Apple with Iphones, the hyperscale giants are extremely efficient when it comes to taking out infrastructure costs wherever they can. This is why they are designing custom silicon and you see all of them offering FPGA’s, CPU’s, GPU’s in their clouds. Simply put their core business model going forward will never allow another Intel DCG to exist. They want diversified suppliers, and differentiation from each other with respect to their cloud infrastructure services. This changes the entire IT investing landscape, and the one area where this impact will be the most profound is AI hardware.

I started this report with brief summaries of what many of the hyperscale giants were up to hardware wise because they presently dominate this market. Machine intelligence wouldn’t be where it is at today if it wasn’t for the hyperscale giants. They possess the compute scale, the multi-billion user data generating applications, unlimited financial resources, and machine intelligence human capital. If the hyperscalers had to invent a dream workload for the public cloud model, I assure they couldn’t dream up one better than ML/DL.

Why?

Well, just look at Facebook/Google/Amazon, no companies on earth provide more consumer enabled technology services that can benefit from becoming more intelligent by leveraging the data generated by these services and ml/dl technology. Consequently, these giants have invested massively in this space to improve their own products and compete with each other. And I am not just talking about offerings like Alexa and Google Assistant. We are talking about all the services these companies provide. Google search, Google translate, Gmail, Amazon product search, tagging photos on Facebook/Instagram, chatbots in Messeneger, Amazon’s warehouse robots, Amazon’s Go store technology, Alphago, Amazon Drones, Microsoft Brainwave, Bing, and countless other services/technology are being enhanced by AI.

In the process, they have written the major ml/dl software frameworks, built extensive ml/dl tools, run the largest production neural networks, and employ the leading researchers/scientists in the field.

Here is a slide showing how Google Tensorflow is fairing in the ML Framework war:

And here is a slide showing the different tools being offered Amazon/Google/Microsoft:

And now some more good slides from The State of AI 2018 presentation on talent:

And salaries for this talent…

Suffice to say the hyperscalers have fast become AI giants, and it’s no surprise that enterprises are now turning to them to leverage their tools.

Here is a slide of AWS customers leveraging ML/DL tools and services:

This is BIG BUSINESS, and naturally the public cloud giants are pouncing on it. Capital One (NYSE:COF) is integrating with Alexa to provide services for their customers. Expedia (NASDAQ:EXPE) used AWS ML tools to figure out what hotel photos to display to customers. Urban Outfitters (NASDAQ:URBN) is leveraging Cloud AutoML vision tools to identify items of clothing in their catalogue, so customers can better refine their searches. Hearst Communications is using Cloud AutoML natural Language to help tag and organize content across its vast portfolio of publications. Jabil is using Microsoft’s FPGA based Brainwave and Azure’s Machine Learning Workbench to build and deploy a neural net to spot defective components. The list goes on and on.....

As investors we need to look at what is transpiring and ask the right questions. If the transition of traditional compute workloads to the cloud is having such a profound impact on the IT investing landscape, how will AI be different? Well, as a starting point, AI and ML/DL is not a transition. If traditional IT workloads can trace their roots to the enterprise, ML/DL was incubated inside the cloud hyperscale giants. They literally gave birth to the model. The DNA is a mix of their big data, abundant human capital field expertise, unmatchable compute scale, ML frameworks, and ML/DL software/services. Thus, it’s no surprise that these companies also want to exert control over the hardware they are building these businesses around. The economic argument is obvious. At scale, a slight advantage in AI hardware TCO versus a public cloud competitor could literally mean tens of billions of dollars in profits over the next decade. The technological infrastructure differentiation argument is also pretty straightforward. Enterprises already realize that trying to replicate the DNA advantage hyperscalers possess in ml/dl is prohibitively expensive. The rent vs own analysis in ml/dl is currently heavily in favor of the renter. This means that the hyperscale public cloud landlord sales pitch to a customer simply boils down to showing that they can train their models faster and cheaper using their tools and infrastructure, and that this edge is how they will beat their competitors to market with better services. This is exactly why Google is not selling TPU hardware, and why the cloud TPU is built around Tensorflow. Google sales engineers are literally spending their time demonstrating to customers how they can avoid wasteful capex (why spend millions building a training farm that will be obsolete in six months…let us worry about that while you focus on your customers), and more importantly how those cloud TPU’s are faster and cheaper than anything their current public cloud competitors have to offer. Once you’ve made that sale, the gravy train is all the add-on services/tools you can cross-sell this customer.

So, the appropriate question isn’t why wouldn’t the most resource rich companies in history design their own custom accelerators for ml/dl, but rather why would they not?

The only answer to that question is because they can’t, and that’s clearly not the case.

The Next Platform recently published an excellent article titled “DESIGNING CUSTOM CHIPS IN-HOUSE IS THE NEW NORMAL.” Here is an excerpt from it:

Chip design and manufacturing is being disrupted by a new set of technical and economic enablers. Cloud giants designing AI chips is just the tip of a mass-customization asteroid impacting the computer chip manufacturing supply chain. There isn’t a single cause for this impact, there are many factors colliding at the same time: -Death of Moore’s Law leaves us with fast, large transistor count chips, even in mature previous-generation processes -New architecture directions based on multi-chip modules (MCM) and system-in-package (SIP) -Chip design tools maturing into complete development tool chains -Licensable intellectual property (IP) blocks make it easy to assemble chips -Multi-Project Wafers ((NYSE:MPW)) democratize fab capacity for prototyping and limited production Customers writing in-house software frameworks -Web giants create scale; emerging IoT giants aggregate into scale

Clearly the chip design and manufacturing industry has evolved in such a manner to allow the hyperscale giants to customize as they see fit. Now accepting this reality and everything else I just pointed out regarding hyperscale and ml/dl interplay, semiconductor investors should consider the future implications.

Here is how Nvidia sizes this market:

Now Intel:

And now AMD:

TAM’s are tricky. When they start to exponentially grow, it’s usually a good time to get skeptical. At the peak of the internet bubble, IDC was projecting that global server sales would hit just under $100 billion by 2005. Reality? It took almost 20 years to break the $60 billion peak we hit in 2000. Nvidia was calling their entire datacenter TAM $5bl in 2015, and now it’s $50bl. The NextPlatform had a little fun with this in a recent article by comparing the growth in Nvidia’s TAM to the amount of time their CEO’s keynote address at GTC takes.

What exactly is a $50bl market for GPU accelerators in the datacenter?

The short answer is 'a heck of a lot'. Intel, with essentially 100% datacenter server share, sold between $14-15 billion worth of processors and chipsets last year. $50 billion, within the context of the following market characteristics, is to put it mildly highly unlikely.

-Highly concentrated target customer base dominated by hyperscale giants. This is not what you want if you’re a tech sales rep used to making good margin of a large enterprise market filled with customers with varying degrees of expertise and bargaining power.

-The same concentrated customer base has the resources and technological incentives to design their own specialty chips or if need be purchase such technology. Again, not good news for long-term market dynamics.

-A shifting semi-conductor ecosystem that is conducive to custom specialty chip solutions. If IP and platform design tools needed to build custom ASIC’s are available to any company with a small talented pool of engineers, barriers to entry for competition fall. The Intel tick-tock days are clearly over, and that means a different type of market is going to take shape when it comes to chip wars.

-The excitement around a new technology has led to giant influx of capital into silicon startups targeting the space with the vast majority focused on the specialty training piece of the datacenter pie. We haven’t seen anything like this in 20 years. There now at least 45 AI chip startups with over $1.5billion in capital raised during 2017 focused on building ml/dl accelerators. This is not a capital-intensive business. What happens to an industry when a lot of capital flows into startups that are essentially teams of design engineers all chasing a business model for which survival is essentially predicated on a group of less than a dozen customers adopting your solution? I don’t think I need to answer that question, but you can be sure business model economic assumptions are going to be awfully fluid. One thing is for sure, the end customers are going to be happy.

-The Chinese have made ML/DL specialty semiconductor independence a national security priority. And as this problem is being addressed in real-time vs technology in which their western peers had decades head start, you can be sure they will be far more competitive. Cambricon, Bitmain, Baidu’s Kunlun, Horizon Robotics, and Alibaba’s NPU project are just some early evidence of the pace at which they are moving. Lenovo (OTCPK:LNVGY) (OTCPK:LNVGF) selling servers with Cambricon accelerators and Huawei’s project Davinci are other examples of this trend. A huge influx of capital into a non-capital-intensive sector is bad enough; the Chinese government subsidizing technological development in the same sector for national security purposes is even worse.

At this point you are probably thinking this guy hates everything. How can someone take something that is so exciting as ML/DL and AI and come out with such a bearish view? This is not true. I actually like a few of the ml/dl startups and would recommend any investor with an opportunity to invest in these companies to do so. This is because I believe that at current valuations there is money to be made off the excitement as these names grab the spotlight and go from no revenue to a few hundred million dollars in a very short-time. I also think with flawless execution a few of these startups have the unique opportunity to be a lot more. And while it’s not necessarily the focus of this report, there are some big picture ways for those who agree with my narrative to invest in semi-conductor companies today over the long-haul.

So, where would I look to invest?

In a chip world where another Intel or Qualcomm (NASDAQ:QCOM) doesn’t seem as likely, one theme I like is commodity chip companies. Micron (NASDAQ:MU) is a classic example of a stock that makes more sense today, especially when you consider how important memory-bandwidth is ml/dl workloads. When you factor in the highly consolidated nature of the industry and the fact that cloud providers/IOT device makers/automobile manufacturers are going to be buying lots of memory, you can make a good case for multiple expansion. Yes, they make a commodity product, but the upfront capex investment protects them from hyperscale competition. The Chinese will eventually have something to say about, but that’s still a bit of a paradigm shift for now.

The other theme to look at is the flip side of everything I have recently discussed. If you thought there were going to be a 50+ well-funded ml/dl specialty chip startups by the end of this year, how would you play that? The best answer to that question is investing in a semiconductor engineer fund because those guys are the only real guaranteed winners in this environment. If I was running a talent agency, I’d be looking into representing the top talent here. Guys like Jim Keller should be getting Lebron James esque engineer opt out clauses in their employment contracts. But as that is not a practical option, investors should focus on what all these startups need to get going. Any well-funded AI chip startup will likely end up at least taping out their chip design. So, chip design tool providers as well as companies like esilicon make a lot of sense. In fact, if eSilicon files to go public in the near future, that would be a very strong sign that my thesis is spot on. I also like power architecture companies, like Vicor (NASDAQ:VICR) which I wrote up a few months ago, targeting the ml/dl ASIC space. Chip companies with solid semi-custom businesses are also appealing. Broadcom (NASDAQ:AVGO) falls into this category, and so does AMD. Taiwan Semi (NYSE:TSM) also makes some sense as the leading global foundry though moving the needle there will be a lot tougher. And to be clear I wouldn’t sleep on Intel. They are hated right now because of process problems, the lack of a CEO, and AMD’s share gains. But the portfolio strategy they have pursued over the past several years now makes a lot more sense. Non-volatile memory, Networking, Storage, IOT, FPGA’s, Mobileye, and the recent eASIC acquisition give them an unmatchable portfolio to play custom silicon solutions provider to the cloud giants. You could even argue that the PC CPU commodity business as well as the potential commoditization of datacenter CPU’s is not that bad over the long haul. Intel is no longer just a CPU company. They can customize chips for hyperscale customers leveraging a very broad product portfolio. This is pretty remarkable a transformation which the market currently is uninterested in. As the stock continues to get cheaper, I would definitely take a closer look.

Valuing Nvidia’s AI Business

I think if an investor wants to understand what they are paying for Nvidia’s AI business, they need to take a sum of the parts approach to the pieces of the business that are more easily valued.

Here is how I do it based on my extensive research:

Segment 2017 revenue Multiple to revenue Market value Gaming GPU 4560 8x 36480 Prof GPU 934 6x 5604 Crypt OEM GPU 300 1x 300 Other OEM 477 1.5x 715 Nintendo Switch Tegra/non-gpu gaming 953 2.5x 2383 Tegra Auto 558 4x 2232 Nvidia GRID 180 10x 1800 HPC GPU 700 10x 7000 Machine Intelligence GPU 1052 ? ?

So, Nvidia x-datacenter is worth $47.6bl, and once you add in x-MI datacenter you get to $56bl. Note I left MI Datacenter GPU blank because that’s the hard to quantify piece of the pie. On an aggregate basis, Intel and AMD are trading at 3-4x sales. Nvidia is at 13x-14x.

Based on this analysis that leaves $100bl in market value being ascribed for MI datacenter or roughly 100x fiscal 2018 revenue for the slice. Now you can look at this as pure MI GPGPU residual value or a mix of let’s say $50bl MI GPGPU and $50bl option in future AI related GPU revenue in auto/robotics etc. Though generally speaking at this juncture I just prefer to look at from a broad machine intelligence multiple, which at this point is getting 100x sales.

I’ve also broken-down MI between the hyperscale super seven and the rest. This been a little tougher to project as they haven’t disclosed much here, but I have this group going from $200-250ml in fiscal 2017 and based on Nvidia disclosed growth rate (investor day slide) on the bunch to $510-640ml in fiscal 2018. You can also conclude Amazon is about 30% of the super 7 pie or roughly 15-20% of their MI revenue.

100x sales is no joke for a company of this size and is precisely why I have spent so much time observing how the market values Nvidia.

What happens if Amazon announces a custom chip or buys an AI datacenter focused chip startup?

If your stock surges on Tesla selecting your Drive PX platform, should it not fall when they dump you for their own chip 2 years later?

Shouldn’t an argument saying you’re an AI platform provider when Tesla selects you turn into one where you are considered simply a compute acceleration hardware provider when they disintermediate you?

Does it make sense when one sell-side analyst tells you hyperscale custom AI chips are not a concern, while the other tells you Google will be a 10% customer this year and hyperscale AI chip demand is the next decade story for Nvidia?

A few months ago I watched an Ebay presentation (NASDAQ:EBAY) of their in-house AI hardware(100 Nvidia GPU based nodes), and how they leveraged it to make Ebay image recognition better.

Here is the first slide detailing their AI hardware platform:

Then a couple weeks ago I read this Ebay blog post:

“Training a large-scale visual search model is an extremely challenging task. Using our on-premise hardware, eBay engineers and researchers spent months training a single model to recognize more than 10,000 product categories. We want to iterate much faster, even when we are working with datasets of tens or hundreds of millions of product images. Machine learning hardware is evolving very rapidly, and we faced an important choice when planning our next-generation visual search effort: should we purchase and deploy new ML hardware in-house, or should we move to the cloud? Building cutting-edge shared distributed computing infrastructure in-house requires us to wait months for certain components and is complicated and expensive. And each generation of hardware is soon surpassed by the next. We decided to evaluate Google Cloud Platform, which makes a wide range of powerful ML hardware accelerators available as scalable infrastructure. Because even our smallest datasets contain tens of millions of images, we were especially interested in Cloud TPU Pods, which can deliver up to 11.5 petaflops while providing the experience of programming a single machine. Our results are very promising: an important ML task that took more than 40 days to run on our in-house systems completed in just four days on a fraction of a TPUv2 Pod, a 10X reduction in training time. This is a game changer—the dramatic increase in training speed not only allows us to iterate faster but also allows us to avoid large up-front capital expenditures. We believe ML hardware accelerators such as Cloud TPUs and TPU Pods will become the norm for business AI workloads. With the availability of such resources at public cloud scale, many enterprises large and small will have the capability to innovate with AI. By adopting GCP’s Cloud TPUs as one of our strategic assets, eBay can ensure that our customers see the freshest possible product listings and find what they want every time.”

This isn’t some tiny company we are talking about here; this is Ebay. And here they are telling you that Google’s TPU allowed them to ‘avoid large up-front capital expenditures.’ Where would those expenditures have gone? You don’t need to be a genius to figure that out. And Ebay’s post makes a perfectly rationale argument. Why spend to build when you know the hardware may be obsolete in 18 months? Better to rent from giants who have the infrastructure compute scale business model to justify these cap-ex cycles.

It’s bad enough the Google’s TPU initiative means the world’s leading AI giant will be relying far less on GPU’s, but it’s even worse when you consider that other large enterprises can avoid spending with you by renting this hardware.

Has a single sell-sider pointed to this as even the slightest cause for concern? This is not some small enterprise saying renting TPU's make sense over building a GPU training cluster, but cash rather a cash-flow rich tech monster like EBay. Shouldn't one of the 15 notable sell-siders covering Nivida picked up on this? Ebay's provided rationale for this technological infrastructure decision essentially calls into question Nvidia's entire celebrated business Ml/DL AI model, and everyone missed this news???? But 100x multiples can turn into 40-50x awfully fast the minute anyone picks up on something like this or competing new hardware gaining traction in this vertical.

Which brings me to Graphcore….

Graphcore: The Real ML/DL Datacenter Semiconductor Pure Play

Cloud Tpu’s and potential lower-end ASIC commoditization are just part of the concern when you are paying $100bl for $1bl in machine intelligence hardware revenue. The other major concern is what happens when someone comes along with a new processor architecture that delivers actual performance gains. Graphcore is that company. Their Intelligence Processing Unit takes an in-memory approach to solving the memory bandwidth problems constraining GPU’s.

This is what that architecture approach looks like:

And this is what their accelerator card looks like:

Graphcore’s IPU’s are already in the hands of early-access customers, and the CEO has said they will be entering volume production this year. In a recent interview he even gave preliminary 2018 ‘conservative’ revenue guidance based on current order visibility of $50 million and has said they could potentially deliver multiples of that. For some perspective on what that means as far as Nvidia, just look at the often touted Resnet-50 benchmark numbers for the IPU.

Here they are:

Basically, 8-IPU cards will deliver the same highly software optimized performance of the $400K DGX2 on Resnet-50. Graphcore has also stated these cards will be priced on par with Nvidia’s Volta GPU’s. So, if you wanted to dumb this down and look at this just from a popular CNN benchmark perspective $100ml in Graphcore revenue is not just $100ml out of Nvidia pocket, it’s potentially 2-3x that. And you have to remember that CNN’s are where these GPU’s perform the best, once you get to RNN’s and more complex neural nets the IPU’s performance leap can be up to 100x.

How do you model the implications of this?

The short answer is you can’t. This is not some potential small headwind we are talking about here, but rather something that could be a serious major disruption in Nvidia’s entire hyperscale datacenter machine intelligence business. Are you going to make major future cap-ex outlays on new training GPU’s if you are currently testing a card that delivers significant speedup for the same price? Obviously not, and this is just Graphcore we are talking about here.

Consider a startup like Mythic which is addressing inference at the edge problem in ml/dl. Once a model is trained, you have set weight parameters that are static throughout the inference process. The problem is these millions or eventually billions of static parameters consume energy each time they are accessed from memory.

Here is a slide from Mythic's Hotchips presentation that demonstrates this problem.

How is this weight access energy consumption problem typically addressed at the edge?

This slide from Mythic's HotChips presentation shows how..

So, what's Mythic's solution to this problem?

Design an architecture that eliminates the energy cost of reading weights from memory. Here is a slide explaining the solution.

And the net result of this approach....

By developing an analog flash array surrounded by programmable digital circuitry, Mythic can achieve 4 TOPS/W. The Nvidia Jetson TX2 delivers 1 TOPS/W, while the datacenter housed Tesla V100 clocks in at .4 TOPS/W. And naturally Mythic isn't the only player here. A startup named Syntiant has designed an entirely analog network, and expects to be able to deliver 20 TOPS/W.

Then you have ex-Google TPU team at Groq whose Inference TPU is expected to deliver 8 TOPS/W to hyperscale datacenters.

Here is a slide from their website.

Then there is China's AI chip Unicorn Cambricon's MLU 100 Inference card specs which Lenovo is now offering with their server configurations.

Or Alibaba's FPGA based Deep Learning processor.

And their performance claims....

And let's not forget Google's TPU. The hyperscale giant just updated their cloud TPU website to show the TPU3'S performance numbers. Nothing surprising to those who have been following what's been going on there, but it is a bit of a killer to the single chip Volta criticism that had been harped on before based on the cloud TPU board's 4 chip configuration. The 180 Terraflop TPU2 board has already been blowing away the V100 on the big CNN benchmarks, but now TPU3 board is clocking in at 420 Terraflops. That brings the per chip deep learning performance, as if that's really matters to the speed/cost focused, to essentially on par with the V100.

Here is a screenshot from their website.

Now, if you are reading this and are just lerning about some of these developments for the first time; I have barely scratched the surface with what's going on in this space. This doesn't mean Nvidia is doomed in AI. I actually think from a developer/research perspective their GPU's are going to continue to fair well. But when you talk massive hyperscale production neural nets or inference at the edge, I think you need to start considering things are not going to play out as advertised by them. As we look out over the next 12 months, there should be no less than half a dozen non-GPU alternatives for cloud providers to evaluate when it comes to ml/dl training hardware. Personally, I’m quite partial to Graphcore being a winner here based on the research that I have done, but there is no denying the noise level in this space is going to get deafening as we make our way into next year.

Picking future chip winners is obviously challenging when you are looking at this type of rapidly shifting landscape, which is why I have focused on the far easier task of picking the clear loser in this type of environment.

Nvidia’s AI Honeymoon is Coming to an End

So, far Nvidia has been pretty much bulletproof when it comes to bad news.

The last six months have gone something like this…..

The CEO claims gaming demand is driving gpu revenue when evidence clearly shows a crypto mania; the market believes him.

Crypto mining collapses, and the market doesn’t care.

Nintendo Switch sales start slowing down, and nobody blinks an eye.

Google introduces a 100 petaflop TPU3 pod and shares close up.

Google announces notable cloud TPU enterprise customer wins, and nobody even mentions them.

Baidu introduces Kunlun, and the market yawns.

Google announces the long-planned availability of V100 in their cloud, and Nvidia shares pop.

Google introduces edge-TPU’s, more yawns.

Tesla selects Nvidia DrivePX and Jensen declares they are 5 years ahead of everyone and the stock rallies as analysts celebrate the potential of Nvidia as a clear autonomous platform provider vs. component solution like Mobileye.

Tesla dumps DrivePX for custom hardware clearly demonstrating DrivePX was nothing more than a component to them and analysts defend the stock citing non-material revenue and Nvidia’s broad platform technology in the space. Again, the stock shakes of a quick dip and closes up.

Jensen announces Uber has adopted Nvidia's automotive technology and the stock rallies.

Uber’s self-driving car kills a pedestrian, and Nvidia shares take some heat for a day. A few days later those losses are reversed as Nvidia CEO gets on CNBC and says "Uber and Nvidia's automotive technologies are completely different". Nobody bothers to ask what that means in the context of both Tesla and Uber’s news.

Nvidia's delivers lackluster operating results, and the story becomes 'graphics reinvented' as if the gaming GPU market and their constant need to drive upgrades and maintain 80% consumer share is what's driving the high multiple. Where were the questions on why datacenter, in the midst of hyperscale capex mania, grew sequentially at the same rate as Intel DCG despite being a fraction of the size??

Are we supposed to get excited about RTX when the combined Steam share of the GTX 1070 and higher series cards is less than 8%? How many gamers building $4k rigs are out there vs miners who were buying 1080ti's in the last year?

Anytime Nvidia has been hit with any potential concerns, the immediate analyst fall back is to take price targets up because of AI datacenter. What is AI datacenter to them? After an extensive survey of the landscape all I can say is it is exactly whatever Nvidia’s management tells them it is. That’s about to change.

Why?

Well, technological superiority for one. While I think what Google has accomplished is nothing short of amazing and should concern every chip company let alone Nvidia; the market has so far lacked a quantifiable way of capturing the impact of the TPU. And despite the fact that the TPU is clearly eating directly into V100 sales, Google as a public cloud provider, has been quite diplomatic in its approach to how it positions its baby so far. Now I do expect more disclosures like the Ebay cloud TPU blog post to start turning some heads, but it’s the upcoming dedicated silicon competition that will really get Wall Street’s attention. Once cloud service providers start sharing that they are experimenting with or deploying new types of chips for training, well, Nvidia is going to find itself on the defensive, and that’s somewhere where it hasn’t been in a long time.

How does the ‘more you buy the more you save’ slogan hold up when there are faster accelerators being sold for the same price or less?

What happens when you can no longer claim fastest chip and node for Resnet-50? (btw-Reality is the TPU2 is far faster, but the marketing machine at Nvidia has keyed on the fact that a board has 4-chips. This would be like arguing that a GPU makes more sense to mine bitcoin over an Antminer S9 because its powered by nearly 200 BM 1387 chips. And now that TPU3 is up to 420 TF what's your argument? But like I pointed out earlier, Google is diplomatic.)

Well, you start running 'Supercomputer' sales and leasing promotions.

Picking on the CPU has been easy because it’s not designed for parallel workloads, and so far thanks to Google’s non-marketing hype driven approach the TPU has not been much of an issue in the financial press. But for the new wave of AI chip startups Nvidia is the target. Volta is going to be ripped for being a bloated multipurpose chip for which ml/dl researchers are wasting money on by every AI chip competitor. Turing, based on the Siggraph presentation, is more of the same problem. At the ML/DL level, Turing reflects the TPUification of the GPU as Nvidia is touting lower precision 4bit/8bit integer math headline numbers now. So, we have a prof visualization product line that on the low end units is supposed to also double as inference accelerator engine??

Why are you paying thousands of dollars for HPC and Graphics when you simply want the best deep learning accelerator will be the first thing they ask customers. Our AI training accelerator card delivers 3-15x on these benchmarks, and its costs just as much or less than the V100 will be the next thing they say. Price per deep learning teraflop is going to be a big weapon used to market these new offerings, and Nvidia is going to have to respond. But if their past conference call comments on the TPU are any indication, they don’t seem to be prepared for this environment. Jensen has cited flexibility of the GPU as the main defense against any questions regarding the TPU, and that’s exactly what new AI focused silicon startups are about to target them on. Flexibility, as far as the competition is obviously going to highlight, means subsidizing Nvidia’s HPC and graphics business by wasting money and valuable performance on technology you don’t need.…

Anyway, the next six months should be very interesting, but for now……

Disclosure: I am/we are short NVDA.

I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.