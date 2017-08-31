Radeon Instinct performance claims to be better than Pascal, but not Volta.

Rethink Technology business briefs for August 31, 2017.

AMD announces Baidu collaboration, while Baidu partners with Nvidia

Objective, third-party reviews of Advanced Micro Devices (AMD) Radeon RX Vega 64 graphics card have demonstrated conclusively that for gaming and other workloads, Vega is far less power efficient than the Nvidia (NVDA) Pascal generation GPUs. I detailed the difference between AMD Vega and Nvidia Pascal in my article AMD's Unsolved Graphics Problems.

Since then, AMD has endeavored in various ways to mitigate the negative impression left by the reviews, which is perfectly understandable. Today's Tech Brief looks at some examples of this.

On August 28, AMD announced a collaboration with Baidu (BIDU). The announcement was titled “AMD and Baidu Join Forces to Advance GPU Computing in the Datacenter with Radeon Instinct MI Series” and contains a quote from Forrest Norrod, AMD SVP of Enterprise, Embedded and Semi-Custom:

AMD is the only company with the capability to deliver both the high-performance GPUs and CPUs required to power the next generation of cloud datacenters. Together, AMD and Baidu will leverage the two companies’ world-class technology and software engineering capabilities to create a comprehensive and open ecosystem to address the growing demand for datacenter workloads and provide more intelligent human-computer interaction.

The headline and quote make it sound as if AMD and Baidu are marching forward hand in hand to achieve advances that no one else can achieve. And it certainly implies that there's no problem whatsoever with the power efficiency of Vega based Radeon Instinct GPU accelerators for the datacenter.

The problem is that in the announcement both AMD and Baidu studiously ignored the fact that Baidu had announced a similar partnership with Nvidia in July. In fact, that announcement was much higher profile, having been made by the president and COO of Baidu at Baidu's inaugural AI developer conference. Baidu's CEO, Robin Li, made news when he took a Baidu self-driving prototype to the conference.

The Nvidia partnership announcement had a lot more meat on the bones and included the following (to quote from the announcement):

Speaking in the keynote at Baidu's AI developer conference in Beijing, Baidu president and COO Qi Lu described his company's plans to work with Nvidia to: Bring next-generation Nvidia Volta GPUs to Baidu Cloud, providing cloud customers with the world's leading deep learning platform.

Adopt Nvidia Drive PX platform for Baidu's self-driving car initiative, and develop self-driving cars with major Chinese carmakers.

Optimize Baidu's PaddlePaddle open source deep learning framework for Nvidia Volta GPUs and make it widely available to academics and researchers.

Bring AI capabilities to Chinese consumers by adding Baidu's DuerOS conversational AI system to Nvidia Shield TV.

Baidu's president and COO Lu is also quoted as saying:

Today, we are very excited to announce a comprehensive and deep partnership with Nvidia. Baidu and Nvidia will work together on our Apollo self-driving car platform, using Nvidia's automotive technology. We'll also work closely to make PaddlePaddle the best deep learning framework; advance our conversational AI system, DuerOS; and accelerate research at the Institute of Deep Learning.

Neither announcement speaks to the issue of unit sales of the respective GPU platforms, and that's really the crux of the matter. Baidu, like other cloud providers, may be happy to offer AMD's GPU platform to those cloud developers that might want it. But what Baidu uses internally is another matter altogether.

The Koduri Twitter “interview” and other obfuscations

Wccftech ran an article yesterday that consisted in large part of a series of Tweets from Raja Koduri about Vega efficiency issues:

This is the first time I've ever seen performance/watt dynamic range highlighted as a feature. Of course any processor, CPU or GPU has a range of performance/watt based on clock rate and voltage settings. Everyone who has done any overclocking of a CPU or CPU knows that boosting clock rate (and certain supply voltages) has the effect of increasing performance at the loss of power efficiency.

The article then goes on to make the point that Vega 64 is much more power efficient at its Power Save mode setting. To flip that around, what that suggests is that to achieve performance comparable to a GTX 1080, AMD had to overclock the GPU, which is why it's so much less efficient than a 1080 in its Balanced power mode.

Even though Vega 64 is more power efficient in Power Save mode, it's still less efficient than a GTX 1080 or 1080 Ti. As measured by Tom's Hardware, the gaming power consumption of RX Vega 64 in Power Save mode is 196.7 watts. Whereas the gaming power consumption of the GTX 1080, measured a year ago, was only 173 W.

In my article “AMD's Unsolved Graphics Problems” I calculated power efficiency using data from Tom's Hardware, which had tested the RX Vega 64 in Balanced Power mode for its gaming results. Just to remind the reader, here are the results from the article:

I revisited the calculations just for RX Vega gaming performance assuming the Power Save mode power of 196.7 watts and a 4% across the board performance hit in terms of frames/sec. This is what was claimed to be the performance penalty in the wccftech article. Tom's never actually measured performance in Power Save mode.

Using Power Save mode does reduce the gap between RX Vega 64 and the GTX 1080, but it doesn't eliminate it. The year old GTX 1080 still averages an 18% efficiency advantage. The GTX 1080 Ti is even better at a 24% advantage.

Radeon Instinct performance claims to be better than Pascal, but not Volta

I have made the inference that Vega is less efficient than Pascal in general, and that this must necessarily translate into power efficiency in typical datacenter use cases such as deep learning. A spec sheet for the Radeon Instinct MI25, which uses a Vega 10 GPU, begs to differ.

It claims that the MI25 is superior to the Tesla P100 (the highest performance of the Pascal generation) in performance/watt:

However, the chart above comes with a big footnote. The number quoted for the P100-16 of 75 GigaFLOPs/Watt is not the result of any testing that AMD did. It's simply the result of dividing the Nvidia specified half precision performance of 18.7 TeraFLOPs by the rated TDP of 250 watts. There's absolutely no way to know whether AMD's test methodology for determining performance is consistent with Nvidia's.

I've seen so many comparisons done by companies such as Intel (INTC), Google (GOOG), and AMD to the effect that their AI solutions are more energy efficient than Nvidia's GPUs. Invariably, the comparisons are self serving to the point of being meaningless. I don't even believe Nvidia's comparisons either.

I'd really like to see an independent third party do some legitimate performance comparisons between the various AI oriented processors such as GPUs, FPGAs, and CPUs. I have yet to see anything I believed.

In any case, I think it's significant that AMD chose to compare with the Tesla P100 rather than the newer Tesla V100. Using Nvidia's specs for the V100 and following AMD's methodology, the comparison is not so favorable:

Tesla V100 non-Tensor core half-precision performance/Watt: 100 GFLOPS/Watt

Tesla V100 Tensor core half-precision performance/Watt: 400 GFLOPS/Watt

The upshot of all this is that I don't see that Vega poses any threat to Nvidia in the datacenter or in gaming. The investment thesis for Nvidia, based on growth in the datacenter, gaming, and automotive, remains intact.

Nvidia is part of the Rethink Technology Portfolio and is a recommended buy.

