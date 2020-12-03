In the long-term, Intel’s growth in AI training could pressure Nvidia’s data center segment, although the AI segment overall is expected to keep growing substantially.

Gaudi seems to deliver meaningfully higher performance per mm2 of silicon, which means Habana could take performance from Nvidia if it moves to leading edge manufacturing.

The first substantial is entering the market as AWS – the largest cloud provider – announced EC2 instances for training based on Habana’s Gaudi accelerator, a late 2019 Intel acquisition.

Investment Thesis

Nvidia’s (NVDA) fastest growing business is its data center. Most of that segment's growth is based on the compute needs for the AI revolution: Nvidia has optimized its GPUs with Tensor Cores, used for training AI models.

Given Nvidia’s performance and first mover advantage with its Tensor Cores, as well as software investments, it has an uncontested lead in this part of AI – in the other part of AI, called inference or using the AI model, the CPU remains mostly used.

This is now changing. At Amazon’s (AMZN) AWS re:Invent event, Intel (INTC) launched what is arguably the first substantial competition for AI training silicon in the data center. From Intel’s late 2019 acquisition, Habana’s Gaudi chips will be available in AWS EC2 instances in the first half of 2021.

These events are part of the collision course between Nvidia and Intel as described in September.

Habana Gaudi announcement

At AWS re:Invent, Amazon and Intel announced a collaboration to deploy the latter’s Habana Gaudi training chips. Amazon announced EC2 instances with up to 8 Habana Gaudi chips, and claims those will deliver a 40% higher price performance compared to current instances based on Nvidia GPUs, although it was not mentioned if those were based on the 2017 Volta or the 2020 Ampere architecture.

The following quote provided indicates that the Gaudi chips will replace existing Nvidia instances as the first step in a longer-term roadmap:

“We are proud that AWS has chosen Habana Gaudi processors for its forthcoming EC2 training instances. The Habana team looks forward to our continued collaboration with AWS to deliver on a roadmap that will provide customers with continuity and advances over time.” –David Dahan, chief executive officer at Habana Labs, an Intel Company

The instances will become available in the first half of 2021. The chips support the common TensorFlow and PyTorch frameworks among others. Habana also has its own software suite, called SynapseAI, which Intel says allows to port GPU-based models to Habana.

Habana also announced that it has a Gaudi2 in development on TSMC’s (TSM) 7nm.

Habana, a start-up of dedicated AI accelerators, was acquired in late 2019 by Intel, and its Gaudi and Goya chips (for AI training and inference, respectively) are based on 16nm.

Discussion

Roughly a year after the acquisition, until this announcement, it seemed like the Habana acquisition had actually set Intel’s AI strategy back by a year: in mid-2019, Intel’s previous acquisition, Nervana, had presented its NNP-T chip for training. This chip, just like Gaudi, is built on 16nm, and was announced to go into production in 2019. By contrast, CEO Bob Swan announced that Gaudi was merely in proof of concept stage, during the quite recent October earnings call.

Nevertheless, the Gaudi accelerator has a differentiated feature as Habana claims it is the “only AI processor to provide the game-changing advantages of integrated, on-chip RoCE v2”. That term refers to an industry-standard Ethernet-based interconnect, which allows to scale to training with thousands of chips. This compares to Nvidia’s proprietary NVSwitch and NVLink.

Meanwhile, Nvidia has been selling its 16nm V100 since late 2017, and the 7nm A100 since 2020. These chips have grown Nvidia’s data center business to over $1 billion per quarter in 2020 (excluding Mellanox). While still relatively small compared to Intel’s CPU data center business, AI is expected to continue growing as the fastest growing workload in the data center.

Intel has seen this coming for quite some years now, and as mentioned had previously acquired Nervana, in 2016 already, seeking to compete against Nvidia. In any case, with Habana’s Gaudi launching with the world’s largest cloud provider, AWS, that means Nvidia is now getting the first real competition in this space.

Performance

While Habana is still a process node behind (and no timeline given for Gaudi2), the value proposition for Habana chips, besides possibly on pricing, is that Nvidia’s chips could still be seen as repurposed GPUs. By contrast, as chips from the ground up designed for AI workloads, Habana could have performance and power advantages.

For example, previously Nervana had claimed it was able to deliver substantially higher performance – despite about the same theoretical peak throughput as the V100 – because its more optimized architecture could achieve a higher effective hardware utilization.

To that end, Habana claims a “8-card Gaudi EC2 instance can process about 12,000 images-per-second training the ResNet-50 model on TensorFlow”.

This performance seems to fall somewhere between the V100 and A100. According to Nvidia, the V100 and A100, also using 8 cards, respectively achieve a throughput of 10,036 images/sec and 17,343 images/sec. This indicates that Gaudi is able to achieve higher performance as well as performance per mm2, compared to the V100 on the same process technology. (The exact die size of Gaudi is not known, but it basically guaranteed to be smaller than the V100, as the latter is as large as semiconductor equipment can support a monolithic silicon chip to be.)

This suggest that once Habana moves to leading edge technology, it could surpass Nvidia for performance leadership in this already over $4 billion market and potentially take meaningful market share over time. While the overall market will keep growing, this could ultimately pressure Nvidia’s growth.

Xe HP

Intel has a comprehensive approach to AI. While it betting on dedicated accelerators with Habana, it is also taking a similar approach as Nvidia by putting its own tensor cores in its upcoming discrete GPUs, called the Xe Matrix Extensions (NYSEARCA:XMX). In the Xe HPC (Ponte Vecchio), it is called the data parallel matrix engine.

Earlier in 2020, Intel had demoed Xe HP at its Architecture Day event. In this demo, the pre-production 4-tile Xe HP achieved over 40 TFLOPS in regular FP32 throughput, which is over 2x of what the A100 is capable of.

If this performance advantage extents to the XMX tensor core performance, which seems plausible as Intel has described it as “petaflop-scale” GPU, then this means Intel could also take performance leadership from Nvidia with its discrete Xe GPUs.

Intel announced in November that Xe HP was available in its DevCloud to select developers.

Takeaway

The recent re:Invent announcement is significant because it fundamentally changes the AI (training) landscape. It marks the first time Nvidia is getting substantial competition in this space. AWS claimed Gaudi achieves 40% higher price performance compared to current GPU instances.

The 16nm Gaudi card seems to deliver higher performance than Nvidia’s last-gen V100 on the same process technology, which means it delivers higher performance per mm2. This implies it should be able to compete for absolute performance once it moves to leading edge manufacturing technology with future Gaudi chips. Habana also has an additional value proposition with its integrated Ethernet-based interconnect, which eliminates the need for proprietary solutions for multi-chip scaling.

Habana as a former start-up still has some catching up to on the manufacturing side, but a milestone AWS customer win combined with high performance per dollar should deliver a compelling value proposition. Meanwhile, Intel also intends to launch its own AI-infused “petaflop-scale” GPUs for the data center in 2021.

These events are unlikely to slow Nvidia down, nor will they do so any time soon, but going forward competition is something Nvidia investors will have to take into account, as Intel is building its own AI training business.

