In the last article, I mentioned the upcoming AMD (NYSE:AMD) enterprise solution called "Naples." This CPU is said to be equipped up to 32 cores and 64 threads, in order to be competitive in the enterprise computing and server sectors.
I am quite convinced that this CPU will be a consistent improvement against the actual AMD solutions, but if we take a look at the recent benchmarks, something looks to be off track in comparison to Intel's (NASDAQ:INTC) actual offer.
Since Naples uses the ZEN architecture, the architecture implementations and improvements are essentially the same: SMT (Simultaneous Multi Threading) adoption, improved branch prediction through a double path for each BTB (Branch Target Buffer), increased cache sizes, increased cache bandwidth, increased schedulers, quad issue FPU, from 9 to 7 cycles to charge the FPU, stack engine, move elimination and other implementations.
But I want to focus on the L3 cache system: Naples is composed of up to 8 blocks (or CPU complex - CCX), and each CCX shows 4 cores with L0, L1 and L2 caches. Each CCX has also 8MB of a 16 associative way L3 cache, divided into blocks of 2MB per each core, and each 2MB block is divided into two sub-areas of 1MB.
It must be underlined that the L3 cache is fully inclusive and fully shared.
During the last month, different Geekbench benchmarks of Naples have leaked and they depict a particular performance level.
The first benchmark showed a performance of 1141 in single thread tests and 15620 in the multi thread tests with Geekbench 4. In particular, this benchmark showed a read error for the L3 cache that was rated at 0 Kb.
The second series of benchmarks showed a single thread performance of 984 points and a multi thread performance between 15041 and 16957 with Geekbench 3 instead. WCCFTECH wrote about a gain in comparison to the first benchmark, but this is not correct because Geekbench 3 generally provides a better score against Geekbench 4 by 10-15%, therefore, talking about a gain is actually nonsense: these benchmarks are different and hardly comparable.
Against Xeon E5-2699 V4
Since the 32 core version is projected to be the top performance CPU from AMD, I wanted to compare it with the most similar configuration actually available from Intel, in particular the Xeon E5-2699 V4 that shows two CPUs and 22 cores for each CPU. The result is a comparison between 44 cores and 88 threads from Intel, and 64 cores and 128 threads from AMD.
The slowest Geekbench 3 benchmark available under Linux, it shows that the Intel configuration scores 2507 (+155%) in the single thread test and 81629 (+381%) in the multi-thread test.
This difference is simply enormous, but it may be caused by several variables:
- Naples may still be late on its development roadmap, but it would be quite strange since Zen is right behind the corner and Naples uses the same architecture,
- Naples may have run only at the base clock frequency, but this wouldn't cancel the huge performance difference,
- There are mistakes about the correct amount of L3 cache read by Geekbench (or the operative system), if the cache is not properly working the performance would be affected (but not to such extent),
- Since we are talking about a dual processor configuration, it is possible that one of the two CPUs is still not working, even if Geekbench does not offer a great scalability ratio. In addition, Naples actually reaches a x17 scalability ratio, while Intel reaches a x23 scalability ratio with 2 CPUs and a x15 ratio with a single CPU. In any case, the performance difference would still remain huge.
Obviously, Naples has been designed in order to power servers, with a high number of virtualizations and independent on-going threads thanks to its 32/64 cores/threads per CPU, but this could be an advantage only when the performance is comparable to its competitors at least. In addition, Intel is going to release its new set of Skylake server CPUs and Xeon Phi multi-format solutions in the near future.
Xeon Skylake and Xeon Phi
Intel is releasing its new socket for the server and deep learning sector, but the socket size is simply astonishing, since it nearly doubles the LGA 2011-3.
The LGA 3647 is meant to support the new Xeon Phi CPUs, Knights Landing, a CPU that is equipped with 16 GB of 3D MCDRAM (a proprietary 3D memory that is similar to HBM) and 72 enhanced Silvermont cores (Haswell compatible), enhanced with 2 AVX 512 bit VPUs. This CPU is simply enormous and it is equipped with six DDR channels.
This CPU, being Haswell binary compatible, is substantially able to autonomously run in both socketed or coprocessor mode, greatly improving Intel solutions' computing capabilities. This is also enhanced by the use of a 4 SMT architecture that is able to run 4 threads on each core. This solution is being battled by Nvidia (NASDAQ:NVDA) for the coprocessor side, while AMD is going to release its HBM Enterprise APUs with ZEN and GCN architecture in mid-2017. However, given the projected core number and performance difference, it is hard to think about a clash with similar fire power: a Naples CPU with an integrated GCN GPU and HBM modules cannot provide a very high additional performance due to the thermal power limitation, since the Naples CPU is already projected around 150W of TDP. In fact, rumors talk about a 16 core version with a Polaris 10 equivalent GPU embedded on the die, which would drive 4 TFLOPS at 32 bits: too low in order to battle Knights Landing, which delivers more than 6 TFLOPS 32 bits, with the additional advantage of being completely autonomous. Not to mention that we are talking about 32 threads and four DDR channels against 288 threads and six DDR channels (Intel). The targets are essentially different and AMD APU cannot use the integrated GPU to run general purpose threads, since it must be driven by the CPU part.
We have to consider also that Intel is going to release its 10 nm CPUs between the end of 2017 and 2018, while Knights Hill and Knights Mill will come in 2018 with the 10 nm lithography, which will bring a consistent performance upgrade. AMD, instead, is going to release its 12 nm and 7 nm products only by the end of 2018 and year 2019.
In addition, Intel is going to use the LGA 3647 socket for Skylake processors too, opening the chance to see bigger CPUs than the rumored 26-28 cores versions, meaning that the relative core number advantage for Naples may be essentially non-existent. Skylake (Purely) will also integrate the 100G OmniPath Interconnect, AVX 512 instructions, Cannonlake graphics and FPGA integration, giving the first consistent results from the Altera acquisition: a great packet that will surely entice many customers.
Naples L3 Cache issue
Another thing that must be underlined is the L3 Cache issue: the most recent benchmarks show that Naples is equipped with 64MB of L3 Cache per each CCX, providing a monstrous amount of 512MB L3 Cache per CPU, and various websites are excited about these tremendous numbers. The point is that such size is probably completely wrong.
This is likely caused by some reading error from Geekbench, and it is easily explained by the fact that the cache is fully shared and inclusive. Naples is simply based on the ZEN architecture that contemplates the use of 2MB of L3 cache for each core and each CCX is composed of four cores with L3 shared cache. Finally, we have 8 CCXs for the total amount of 64MB of L3 Cache. From here, given the fact that the architecture is still not official, some program error is easy to be encountered.
In addition, if we are supposed to build 512MB of L3 cache using the denser 14nm lithography from Intel, we would nearly reach 1000 square mm of die size only for the L3 cache: such size would nearly double the die size dimension of the Nvidia GP100 and we are still not accounting for the other components like 32 CPU cores, the instruction caches, multimedia codecs and encoders, 4 memory controllers, other controllers and so on. It would be substantially impracticable, extremely expensive and it would be highly questionable the usefulness of such a monstrous cache size.
AMD Naples and Snowy Owl (the Naples APU version) are still far away from their release date, however, considering the first benchmarks, the comparison against Intel upcoming solution doesn't look good for AMD, at least for now.
It is true that these are only preliminary benchmarks, but Naples uses the ZEN architecture, which is going to hit the market within three/four months: therefore, it cannot be so late in its development roadmap.
Snowy Owl (the Enterprise APU), while there is still no benchmark available, it doesn't look to provide enough raw power: it will have 16 cores and it will be equipped with HBM modules and something which is equivalent to the actual Polaris 10, in order to provide 4+ TFLOPS at 32 bits. This is too low in order to counter off traditional Xeon CPUs or Knights Landing that is capable of 3D MCDRAM, 6 DDR channels, 6+ TFLOPS at 32 bits, 72 cores or 288 threads and its chance to exploit every optimization for Haswell binaries. Not to mention the Xeon Skylake with 6 DDR channels, FPGA integration, Intel OmniPath or the combo Knights Hill/Knights Mill set for the year 2018.
Where AMD may be able to really create issues to Intel is the consumer APU sector for those who want to play mid-low level games with integrated graphics and get good decent graphic performance with the integrated graphics. AMD may release its Raven Ridge solution with a single HBM integrated module for 128 GB/s of video memory bandwidth and 768 GPU cores. The point is that the projected APU will have to be under the 45W TDP, it will integrate 4 CPU cores, but even the HBM module will provide some heat generation and power consumption. This means that the GPU will have a maximum of 20-25W of TDP in order to calculate graphics.Therefore, the integrated GPU will probably deliver 1,5+ TFLOPS 32 bits with a frequency around 1,000+ MHz.
Take for example the M385X in Rise of the Tomb Rider: it scores nearly 20 fps in FHD on Notebookcheck, but the future integrated graphics will probably deliver around 0.85x (accounting for architecture improvements) of the M385X performance, for something around 17 fps. This is not so far from the Intel HD Graphics that scores 12.2 fps in the same conditions. Considering also the Kaby Lake graphics provide a graphics boost around +20/+40%, I don't think that we will see big differences between these two kinds of top graphics APUs/SOCs. Anyway, that is the sector where AMD may be able to battle with Intel, and that is where AMD must focus its efforts, since the laptop market may provide some positive result.
By the way, given the odds, my vision on AMD remains negative at the moment. There is no sufficient evidence for a real big return from AMD, while Intel is providing very nice upgrades with Knights Landing, Kaby Lake and upcoming Skylake for servers (in particular OmniPath and the FPGA integration). I still reiterate my judgment to stay away from AMD, at least till the moment when some new third party and positive benchmark of ZEN will come out.
Disclosure: I/we have no positions in any stocks mentioned, and no plans to initiate any positions within the next 72 hours.
I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.
Additional disclosure: The author does not guarantee the performance of any investments and potential investors should always do their own due diligence before making any investment decisions. Although the author believes that the information presented here is correct to the best of his knowledge, no warranties are made and potential investors should always conduct their own independent research before making any investment decisions. Investing carries risk of loss and is not suitable for all individuals.