Nvidia will release Pascal (refresh) and Volta next year; therefore AMD will be able to gain some market share, but it remains the fact that VEGA is nearly 1 year away.

VEGA has also been shown during a DOOM 4K ultra game session with a maximum of 68 fps, but the GTX 1080 is able to run at higher fps peaks.

MI6 (Polaris 10) and MI8 (Fiji) have heavy architectural limitations against Nvidia P4 and P40, but MI25 (VEGA 10) looks to be a serious contender.

A few days ago, AMD (NYSE:AMD) has shown its new professional video/compute card lineup for the enterprise sector and it has recently shown some additional demonstrations about the new Ryzen CPUs.

AMD is obviously focusing the public attention around its overall platform, which is becoming more and more interesting, and it is finally providing some very interesting product previews.

In this article, I want to focus on the professional video lineup for the enterprise market, that is subdivided into three solutions with three different architectures.

As I will show you, the first two solutions are not so competitive due to technical issues and/or outdated architectures, but the top solution, the MI25 powered by the VEGA 10 architecture, looks to be very interesting and competitive. Surely, the initial enthusiasm must be restrained since the architecture is quite late in comparison to Nvidia's (NASDAQ:NVDA) Pascal, while VEGA 10 will probably still show the high power consumption behavior that characterizes Polaris 10 and the entire GCN architecture.

In addition, the Pascal refresh and Volta are not so far and the latter will probably make Nvidia's performance lead very rock solid, but it is undeniable that AMD has the chance to gain some market share in the high-end enterprise market, at least in 1H 2017.

Not to mention that VEGA 10 will also hit the consumer market, but even here, it will have to discount the late arrival. AMD probably will not gain consistent additional market shares, apart from those market segments where the GCN optimized programs are the majority. The same goes for the gaming sector, where VEGA will simply conquer those who play AMD-favored games or those loyal customers who were simply waiting. On the contrary, it is Ryzen that's the real possible spark to change AMD's destiny, but the actual market price already accounts for this expectation, partially at least.

MI6 and MI8

First of all, the AMD MI6 employs a Polaris 10 GPU chip, which is able to elaborate up to 5.7 FP32 (Single Precision Floating Point) TFLOPS (Tera Floating-point Operations Per Second) and it has a TDP of 150W. This card is designed similar to the consumer version RX 480, and it also keeps all of its limitations: not very high clock speeds and heavy throttling under heavy loads (-33% in comparison to its turbo clock and -24% in comparison to its base clock for a regime power of 3.9 FP32 TFLOPS).

MI8 is instead powered by a Fiji GPU (28nm), and it simply recalls the R9 NANO consumer video card, which was a very interesting video card thanks to the implementation of HBM modules and a very low TDP of 175W despite the very high core count (4096 GCN cores). However, this video card, in order to restrain its power consumption and heat generation under the 175W TDP limit, has to employ a massive clock throttling under heavy loads.

The R9 NANO has a base clock of 1000 MHz for a theoretical computational power of 8.2 FP32 TFLOPS, but such value is usually not reachable: This card generally works at 875 MHz under medium load and it barely reaches 7.0 FP32 TFLOPS. But under heavy load, it works even below 700 MHz in order to restrain the power consumption, and it reaches a computational power below 5.7 FP32 TFLOPS. In addition, the HBM adoption is still limited to 4 GB of memory RAM, which may be detrimental against the Nvidia solutions (8-24 GB).

These results must be compared against the actual Nvidia lineup, composed by the Nvidia Tesla P4 and P40. The P4 is essentially a GTX 1080 (Pascal GPU) with 2560 cores but it shows a heavily reduced base and boost clock and a heavily restrained power consumption (maximum TDP of 75W instead of 180). This video card is able to reach up to 5.4 FP32 TFLOPS, while its base clock performance is about 4.1 FP32 TFLOPS; in addition it employs 8 GB of GDDR5 modules.

The point is that it is known how efficient this architecture is and it is known that the GTX 1080 reference card works very close to the base clock under heavy loads, while the GTX 1060 works even between the base and turbo clock under heavy load; a very different behavior in comparison to Polaris or Fiji.

Not to mention that the power consumption difference is huge (75W against 150W or 175W) and therefore the power efficiency is quite higher. I am quite sure that the contained TDP of this P4 will not trigger any relevant throttling, because the high core count and very low clock speed is the most efficient way to heavily reduce the power consumption (but it is obviously more expensive than a smaller GPU with higher clocks).

The Nvidia Tesla P40 is instead the professional version on steroids of the new TITAN X and it employs the complete full GP102 GPU with 3840 CUDA cores and a maximum FP32 capacity of 11.8 TFLOPS, while the base clock FP32 capacity is about 10.0 TFLOPS and it is equipped with 24 GB of GDDR5: It is clearly a product of another class.

But the additional issue concerning the MI6 and MI8 products is that Polaris 10 and Fiji do not support INT8 and FP16 operations: FP16 is essential to neural networks, deep learning and relative applications, but even INT8 calculations are becoming competitive and in certain cases too fast to be avoided (deep learning): INT8 provides double speed in comparison to FP16, a lower memory consumption and lower precision, but a lot of inference and neural applications do not require a higher precision.

Therefore, Nvidia products are able to calculate FP16 operations at 2X speed of FP32 operations, while INT8 operations are executed at 4X speed of FP32 operations. This is where AMD really starts with a great disadvantage, and it needs to be extremely aggressive (very cheap) in order to gain some market share with these 2 products. Unfortunately, the result will be the reduced profits.

MI25 VEGA

On the other hand, the MI25, which is powered by a VEGA 10 GPU, is another different story. This card is rated for a maximum of 12.5 FP32 TFLOPS with 4096 cores. Therefore, it will exhibit a turbo clock around 1,525 MHz and it will be equipped with HBM 2.0 modules for an expected bandwidth of 512 GB/s.

Personally, I think that the bandwidth will be beyond 512 GB/s but it is not granted that the MI25 will hit the 1 TB/s of bandwidth: It is true that the GCN architecture is memory bandwidth hungry, but 1 TB/s of bandwidth would imply a +100% boost of bandwidth for a "mere" +50% of computational power in comparison to Fiji (remember that the use of FP16 operations essentially reduce the memory size and bandwidth requirements for each operation; therefore, in this case there is no need for additional bandwidth).

This card will partially battle against the Tesla P40 but also the P100, a solution that implements HBM modules up to 720 GB/s of bandwidth, and it also shows a maximum computational power of 10.6 FP32 TFLOPS or 21.2 FP16 or 5.3 FP64 TFLOPS.

There are some considerations to be done on this comparison:

- Nvidia Pascal is less memory-hungry in comparison to the GCN architecture; therefore, even the 720 GB/s bandwidth against an hypothetical 1 TB/s would not be problematic.

- VEGA is a new architecture, but it still has various characteristics which are common with the Polaris architecture, and it is designed in order to catch good frequencies with realistic TDPs (<=300W); therefore, its inherent efficiency must be upgraded at least a little. In fact, considering that Polaris 10 has a TDP of 150W while VEGA employs +20% turbo frequency and +78% GCN cores, the architecture has obviously undergone a revision in order to sustain such power below 300W.

However, the energy/GNC cores ratio probably denotes a relevant throttling in order to maintain that TDP power consumption under sustained loads, moreover since the heat dissipation is passive. I am quite confident that this card will consistently throttle under heavy load, because this architecture, even if it has been redesigned and tweaked, it still fundamentally remains a GCN architecture on steroids ad refinements, with the use of the same lithography process.

Obviously, this is my personal opinion, but the GCN architecture has always behaved like this, iteration after iteration, and the first data that AMD has shown during the presentation make me consider VEGA to meet heavy throttling under heavy loads.

- Nvidia has released the Quadro P6000: While it is designed for a different professional use, this video card has a TDP of only 250W and a base FP32 power of 10.9 TFLOPS, meaning that the maximum FP32 TFLOPS capacity will be over the 12 TFLOPS.

Anyway, this solution is obviously competitive and it is FP16 enabled, which is fundamental for the deep learning and neural network markets. However, the lack of mention about INT8 and the relative consistent chance that INT8 calculation will become more and more present for the inference applications, it may be a relevant disadvantage since Nvidia would instantly double its calculation firepower, but at the moment, the INT8 diffusion still has to spread.

Gaming preview

Vega has also been shown during some games and it was equipped with 8GB of HBM and 4096 GCN cores. While we do not know the effective clock speed, we have been able to see that this GPU is essentially able to run DOOM 4K Vulkan at ultra settings up to 68 fps. For comparison, even if the GTX 1080 could not reach such targets in the first months, it is possible to find some YouTube videos of GTX 1080 reference stock video cards, which are able to catch similar fps with the same settings, probably thanks to recent driver developments.

There is also confirmation from videocardz.com that the GTX 1080 custom video card performances are higher: Custom GTX 1080s run at an average of 68 fps, which means that the peak fps is higher. The result is that the GTX 1080 is comparable or faster at the moment.

We have to consider also that DOOM with Vulkan is a benchmark more favorable to AMD. Therefore, finding that the performance is similar to a GTX 1080, it makes quite clear that AMD will have to design a dual GPU video card in order to battle the GTX 1080 TI and the Titan X.

There is another comparison that is accessible online: Videocardz.com has made a little comparison between an editor's system (i7 6800K and GTX 1080) and the VEGA GPU with 8GB of HBM 2.0 with 512 GB/s of bandwidth, based on the game AOTS. It comes out, at least from these tests, that the GTX 1080 is faster from +6% to +70%, depending on the specific VEGA test comparison.

It must be remembered that the various VEGA configurations may resemble underclocked/standard/overclocked configurations and that this GPU is still not refined, therefore, it is difficult to define a precise scenario. However, even the highest benchmark from VEGA among these results, it is slightly lower than the GTX 1080 configuration, which probably means that VEGA will be generally comparable to a GTX 1080.

This scenario shows also that VEGA, while it is more efficient and faster than Polaris, it does not provide the same efficiency jump that we have seen from the 28nm GPUs to the actual 14nm GPUs. It is true that VEGA employs a fully reviewed V9 architecture, but the result resembles more a tweaked and optimized Polaris architecture, maybe exploiting some lithography refinement, like the Pascal refresh will do.

However, AMD still has some interesting tweaks with its latest driver release. In particular, the chill option is quite interesting and useful and it may help AMD in order to reduce the power consumption, cutting the excessive performances but avoiding to reduce the lower framerate down valleys. However, this will be the argument of my next article.

Nvidia Pascal refresh and Volta

But here comes the real issue: AMD is going to release VEGA 10 nearly 8-10 months after Nvidia Pascal GTX 1070 and GTX 1080, and it is very likely to achieve performances similar to the GTX 1080. Obviously, AMD will likely set its price tag in order to be competitive, but this video card still comes a lot of months after its competitor, and this means that profits will be partially reduced. This means also that a consistent part of the potential customers already have bought the GTX 1080 or GTX 1070, and it is hard to think that they will spend additional money in order to gain a marginal upgrade or similar performances.

To be frank, it is to be seen if the price of the upcoming VEGA solutions will be more price competitive, since they will be equipped with HBM 2.0 memory modules, which are clearly more expensive than the GDDR5/X modules.

In addition, Nvidia is working on the Pascal refresh architecture, which is expected to provide an average of +20% performance in comparison to the first Pascal generation and an average -15% price reduction: Expect similar TDPs but higher base and boost clock frequencies, enabled by some architecture refinement and a very likely lithography node refinement.

I personally forecast that Pascal refresh and Vega will provide similar efficiency improvements compared to Polaris and Pascal: While Pascal will simply improve its already good efficiency balance, VEGA will be able to match the GP102/104 performance (on paper) with a higher but decent power consumption; a target that was not possible with Polaris without a crushing clock reduction.

There is also the chance that the Pascal refresh will introduce a situation where the GTX 1060 would be rebranded into the GTX 2050 (1280 CUDA cores), the GTX 1070 would be rebranded into the GTX 2060 (1920/2048 CUDA cores), the GTX 1080 would be rebranded into the GTX 2070 with 2560 CUDA cores, the GTX 1080 Ti would be rebranded into the GTX 2080 (with 3328 CUDA cores), while the GTX 2080 Ti and Titan X would employ the full 3840 CUDA core configuration.

In this way, Nvidia would probably retain a safe lead against the AMD solutions, but this kind of solution must meet architecture improvements and lithography refinements (or a node change from 16nm to 14nm for example).

But the scenario becomes more problematic for AMD since Nvidia will release Volta in the following months, and this architecture is expected to set the game on a completely different level: The first relevant news is about the Summit supercomputer, and after some analysis, it comes out that each GV100 HPC card will provide the astonishing value of 9.5 FP64 TFLOPS (double precision). This means that the Volta architecture will be able to calculate 19 FP32 TFLOPS or 38 FP16 TFLOPS, which substantially doubles what Pascal is able to do right now.

This seems coherent with the anticipated improvements related to Xavier SoC (powered by Volta), and it will be interesting to see the future implementation of the CVA (Computer Vision Accelerator), which may be partially responsible for its great INT8 capacity.

The only "luck" for AMD is that Nvidia will probably release Volta in 2017 only for the professional and enterprise market, while it will hit the consumer market in the end of 2017 or in the first months of 2018, which means that for the next months there will be a more balanced clash between Vega and Pascal refresh in the consumer market.

What to expect

As I have repeated a couple of times, VEGA looks to be a valid architecture that lugs three fundamental issues:

VEGA 10 will be released (or even simply make a paper launch) after 9-10 months of the high-end Nvidia Pascal launch. This means that even a part of the loyal AMD customers have changed their video card with a Pascal one, not to mention the rest of the market.

Given its projected power consumption and the usual throttling that I expect to see (maybe a little lower than Fiji and Polaris), AMD will probably need to implement a dual GPU video card, or a power-hungry beast in order to battle the Titan X and GTX 1080 Ti.

VEGA will likely still be an energivorous architecture in comparison to Nvidia Pascal.

Let's take a look also to the Steam Hardware Survey, which shows video card shares month per month. The overall picture is the following:

GTX 1050 - <0.16% actual share

GTX 1050 Ti - <0.16% actual share

GTX 1060 - +0.50% in November - 1.53% actual share

GTX 1070 - +0.29% in November - 1.59% actual share

GTX 1080 - +0.11% in November - 0.86% actual share

RX 480 - +0.09% in November - 0.37% actual share

RX 470 - <0.16% actual share

RX 460 - <0.16% actual share

We see that the GTX 1080, which is far more expensive than a RX 480, still grows its share faster in comparison to the RX 480, not to mention the other GTX 1070 and GTX 1060. Even if Steam is not a sales index but an index of the gamers' video card application, it is evident that Polaris 10 is still struggling, while Pascal is simply growing month by month (GTX 1060 is growing 4.5 times than a RX 480).

In particular, we see that the GTX 1070 is still being adopted massively: Such a big adoption partially cuts down the potential of VEGA 10 for the consumer market, because only a few enthusiasts or high-end customers would change and spend additional money for a +10/25% of performance and probably a lower headroom for overclocking (due to the higher power consumption).

AMD will find the enterprise and professional markets as better targets, at least with VEGA 10 given its massive TFLOPS firepower, even if it is a situation where VEGA 10 will probably suffer some throttling in order to meet the TDP requirements (but lower throttling thanks to its high TDP cap in comparison to the desktop environment). However, the lack of INT8 operations support may be detrimental if the INT8 adoption continues to grow in the inference market.

The AMD MI6 and MI8 solutions instead, look already quite behind due to their luck of FP16 support in addition to the INT8 absence: FP16 is fundamental for the deep learning market; therefore, AMD will have to greatly push on the price convenience in order to counterbalance the performance and lack of operations support in addition to the consistently higher power consumption. But at the same time, a heavily reduced price tag will obviously heavily cut the potential profits of this market segment.

Frankly, I do not see many relevant possible improvements for AMD in the discrete GPU field, apart from those market segments relative to those programs, which are more optimized for the GCN architecture and the loyal AMD customers. It will surely improve some market share, but probably only for a short period of time.

Where AMD will better find a real new source for revenues and profits, it will be the CPU and APU markets with ZEN/RYZEN. The first preview looks very good in comparison to the FX 8xxx and 9xxx series, where the FX 8370 has simply crashed in certain scenarios. But for a deep view on this topic, wait for my next article.

At last, the market view: AMD has risen a lot this year, reaching a peak of $12.42 per share. The growth has been astonishing, and obviously, the market is expecting great results from AMD, in particular from RYZEN.

AMD is building a good structure going from the GPUs (professional lineup, Polaris and VEGA, consumer lineup) to the CPUs (RYZEN and the future relative APUs with integrated HBM modules) and it is certainly better positioned in comparison to the previous years. Even the various joint ventures helped AMD in order to get the cash needed for these projects.

However, from the stock price point of view, my fear is that too much expectation is already included in the actual stock price: The risk to see the classic buy on the rumors and sell on the news is more than a personal opinion. If the price stock was under $10 per share, I would have been very positive on a buy position, but with a price which is well beyond $10 per share, I consider the chance of a retracement more than a simple chance: There have been a lot of buyers at $3.00-$5.00-$7.00-$9.00 per share, and the actual price is already alluring for some additional sales.

It does not mean that AMD cannot rise beyond such a price:

AMD will gain back the attention of its loyal consumer base with VEGA 10, which provides higher margins than Polaris.

Xbox Scorpio and PlayStation 4 Pro are sources of low profits but solid revenues, at least for a couple of years.

MI25 is a valid product that could battle Nvidia, at least for some months until the arrival of Volta.

Ryzen, at least watching the first third party benchmarks of a series of engineering samples with slightly lower clocks than what is supposed to reach the final product, looks to be finally competitive: In multi-thread benchmarks, the 8-core version seems to set its performance between the i7 6800K and the i7 6900K. Considering that the i7 6900K is priced up to $1100, the expected price of Ryzen's top offer - $500-$600 - could be really competitive and very difficult to be opposed by Intel (NASDAQ:INTC) in the short term. On the contrary, the benchmarks about the gaming sector show a different story, where Ryzen is generally slower than Intel but it improves a lot in comparison to older AMD CPUs. However, the gaming benchmarks do not show the real single thread difference that may be found out through single thread benchmarks. I personally expect to see a wider disadvantage against Intel in single-thread programs, which makes Ryzen a less suitable CPU for the consumer market, but very effective for the professional market.

Ryzen's top offer is expected to hit the market at $500-$600, a price which would really give some competition to Intel. AMD may be able to regain some market shares exploiting the competitive price and the consistently updated performances. In addition, there is a part of AMD fans that have switched to Intel unwillingly, and this kind of customers will be glad to go back to AMD. This is not something to be underestimated.

It is obvious that AMD will be able to improve its revenues and profits during the next months, which would be a very welcome breath of fresh air. But AMD must exploit this situation at its fullest, because Intel has different cartridges to shoot:

10 nm architecture at the end of 2017, while AMD will still release the ZEN+ with a 14nm lithography. Intel will be able to add more cores without increasing the CPU surface too much, and without getting higher TDPs.

FPGA implementation in conjunction with 3D Xpoint technology.

3D Xpoint SSDs.

Intel is reportedly working on a completely new architecture in order to substitute the Core architecture. It is rumored that this architecture will not implement legacy compatibility instructions sets and SIMD in order to power up the single-thread and multi-thread capabilities. It will enable far more efficient and smaller cores. This is expected in 2019/2020.

Resuming, my expectation is that if AMD will provide the more optimistic performance forecast in conjunction with a competitive price tag, the share price will spike. However, I personally expect ZEN to provide good performances in multi-core applications, but it will also show some disadvantage in the single-core applications and consumer market, where its growth potential may be not so high. Therefore, my fear for a retracement of "selling on the news" remains consistent.

If you are a conservative player, simply wait for the first official benchmarks and buy on the eventual spike related to fantastic results: You will not exploit the surprise factor, but the strategy is nearly no risk.

On the contrary, if the results will be as expected, look out for a possible retracement: If that happens, the buy opportunity below $8-$9 per share may be really gluttonous.

On the other hand, if you are an aggressive player, give it a shot even at these prices, maybe investing a little amount of your portfolio (5%): You may be able to exploit the surprise factor and if the results will pose a serious threat to Intel in the short term, you can expect a very high share price increase towards $20 per share.

