Advanced Micro Devices (AMD) is set to release an update on the company's big core architecture on January 14th, but due to a server error at PurePC.pl (which has been taken down, but I'm sure these slides will circulate like wildfire before the 14th) the official slide deck was inadvertently leaked.
Before I begin, a quick note about my articles. Some have asked that I break the length of my articles up, whereas some enjoy the length. Also, some have suggested that I do not include new information in the conclusion. Understand that I write for a large audience, so I write with very specific intentions:
- Provide enough data and links to allow readers to come to their own conclusions
- I suggest clicking most of the links in my article if you want more information; I do not cherry pick facts, and you can see that my articles are already very long, so I cannot include everything I would like
- I write my conclusions as almost standalone pieces, so for those that aren't interested in the details can skip to the conclusion to get the gist
- I include new information in the conclusions frequently because the stuff you read at the beginning and the end of the article will be most remembered and have the most impact. Because of the length, I prefer to try and make the conclusion "stick." So in this interest, I will often include some new information in the conclusion that ties all the pretty pictures and bar charts back to something that is useful
Notice that page views is not in my list of intentions? I prefer to provide what I feel is a more complete overview, and would much rather put out a complete picture in one sitting rather than multiple smaller articles. This being said, if you're in a crunch for time, just read the conclusion. If you're interested in the details but get lost in the middle, take a break and come back. And as always, I welcome feedback in the comments section.
Let's roll our sleeves up and get started...
Kaveri Design Goals
Kaveri will be the first major retooling of AMD's big core lineup since Bulldozer was launched. The focus of Kaveri, much like Haswell, is to push performance at lower wattages in order to better compete in the notebook space.
Kaveri was also designed around GlobalFoundries' 28nm bulk process.
The one sticking point I see with this decision, AMD addresses directly in the company's tech presentation: at higher frequencies TDP scales much worse.
Why do I consider this a sticking point? In pure x86 performance, Kaveri will be measured against Richland when the chip launches on January 14th. When these reviews hit the web, I'm pretty sure some are going to be quick to point out that there is no real performance gain between Kaveri and Richland in those pure x86 workloads. This is due to the lower clocks offsetting the architectural enhancements.
Note: This is a leaked benchmark which may turn out to be true, but it jives with my "gut" of what we will see with Kaveri. The score is roughly on parity with an A10-6800k in this one instance, and Cinebench is a popular benchmark for review sites. This matches to the point in the slide regarding lower clocks I pointed out above.
However, at lower clocks is where AMD states we will see the biggest improvements.
PCMark 8 tests overall computing experience (web browsing, light productivity, etc.), and 3DMark Fire Strike tests the GPU. So when comparing Kaveri and Richland chips at similar wattages, it looks like there will be a slight improvement in pure CPU performance, and a much more marked improvement in GPU performance. In theoretical performance, Kaveri achieves the same score as Iris Pro, and is ~67% faster than HD 4600.
I think the above slide dictates the true potential of Kaveri, and that is a large percentage of the performance looks to scale to lower wattages. The 45W Kaveri chip retains 80% of the performance of the 100W chip regarding GPU performance in theoretical benchmarks.
But most users care how their chip functions more than the 3DMark numbers it posts.
This shows expected framerates when gaming on various titles at 1080p while using the settings as shown (low, medium, high, max). I have picked these 3 benchmarks specifically because they provide direct points of comparisons by using the specific titles' built-in benchmarking application. This should make the tests more directly comparable.
For BioShock Infinite on low quality settings at 1080p, performance looks to be roughly on parity with Richland. However, looking at TDP scaling in BioShock shows that mobile Richland takes a massive performance hit.
Both Sleeping Dogs (using ~38 fps for Kaveri) and Just Cause 2 (~70 fps) show gains respectively of 40% and 10%.
Looking at overall performance, based on this initial information it looks like the top end Kaveri parts will offer somewhat similar performance to top-end Richland chips in standard x86 workloads, and anywhere from similar to drastically improved GPU performance.
The real kicker will be if AMD is truly able to scale down performance to lower TDP parts, and we will have to wait for later benchmarks to show if the company can accomplish this.
The Unknowns of Kaveri: Mantle and HSA
AMD's Mantle API is aimed to reduce CPU overhead to enhance gaming performance. This can either lead to higher framerates, or similar framerates but with higher graphical fidelity.
The above slide shows a huge performance for a Mantle enabled title. What I am not saying is that all games will see a 3x improvement in FPS. However, I think there will definitely be situations where Mantle would be extremely beneficial.
Mantle will allow developers to squeeze more performance from their games, making the titles playable on more hardware, which will increase the number of potential consumers. The tradeoff is the extra work required by devs, and this is the reason I call Mantle a "wildcard." It looks extremely interesting, but it is ultimately up to developer support, and that will depend on how useful Mantle is.
Unfortunately, Dice had to go "bug stomping" in BF4, so we will not see the Mantle update until sometime later this month, but this slide piqued my interest:
Again, this slide has some marketing speak ("up to") built-in, so we will have to wait until we see actual results. However, if you recall the BF4 benchmark teetering at 30 fps above, even a 10% improvement in frame rates puts a buffer between Kaveri and the 30 FPS line, which most gamers and review sites consider a consensus cutoff for playability.
If AMD is bundling BF4 with Kaveri, this could help to move a few chips out the door.
Source: AMD's CES Conference, no link
Moving on to HSA, there is likely going to be a lack of available software for benchmarking when Kaveri first releases.
The above benchmarks respectively show what I believe to be Sandra's options (the financial kind, think something along the lines of HFT) number crunching algorithms, spreadsheet number crunching of analyzing share price data, and binary tree searches. According to Sisoftware's website, its software has optimizations for Kaveri.
These big bars above aren't meant to distort reality in any sense; they show special use cases where AMD's architecture will be very strong, similar to litecoin mining. Note that in the source slides, AMD has dedicated several slides to currency mining as well. During the CES conference, the LibreOffice presenter (didn't catch his name) paralleled this performance gain of shaving off milliseconds against HFT platforms that will pay ridiculous prices for shorter lengths of fiber to connect to various financial networks.
All these special use cases involve working with large number sets to perform numerous simple calculations as quick as possible, which is where levaraging GPU strength comes in.
I believe the "i5-3870k" is a typo, and should be an i7-4770k based on the fine print at the bottom. The i5-3870k does not exist. The closest chip would be an i5-3570k.
AMD explains this with a brief discussion of the GCN architecture. You may hear the term "FLOPs" thrown around, which stands for floating point operations. "IOPs" stands for integer operations, and these higher IOPs help do things like manipulate picture files. IOPs are also apparently good for cryptocurrency mining.
In general, Kaveri is not going to change the standing in pure and traditional x86 workloads between AMD and Intel (INTC), and the clock speed reduction will likely mitigate many of the gains at the high end regarding pure CPU performance. However, there are likely some very specific special use cases where AMD's chips will shine, but keep in mind these will appeal to only very specific audiences.
If AMD is able to retain a majority of the performance at lower TDPs, this would drastically change this argument, but we will have to wait for official benchmarks to know if this is the case.
Also, AMD's GCN architecture is much better for GPU compute when compared Richland's iGPU, and this should help in OpenCL and various productivity benchmarks that exploit the GPU (specifically benchmarks that do not leverage HSA, only GPU compute).
Conclusion: Understanding Kaveri's Target Market and Performance Point
I know I spent the balance of this article discussing performance per watt, but in actuality AMD has been gaining traction in the desktop market, and more specifically in China, where AMD's market share is ~40%.
China is AMD's biggest market -- AMD generates more revenue in China than in the rest of AMD's markets combined, and by a large margin:
The desktop is the space, and specifically the APU product, which bears like to discuss as being uncompetitive. This market share increase is being driven by Richland, which has the biggest advantages of price and iGPU performance when comparing against Haswell.
Regarding pure CPU performance, it will likely be nothing to write home about, with architectural gains being offset by lower clock speeds. Integrated GPU performance and special use cases which utilize OpenCL and HSA features will likely show the most drastic improvements, and with the performance improvements in iGPU being fairly title specific depending on where the bottleneck is in the title being rendered.
Regarding the margin impact from die size, Kaveri is roughly the same size as Richland, so there is no positive here. The price of Kaveri appears to be slightly above that of Richland, so there is not much of a difference here either.
But if the iGPU gains actually deliver, as well as potential gains with Mantle, then this is an improvement on the front which I believe is probably driving sales of these chips.
If these "uncompetitive" products have been driving increases in desktop market share (verifiable), and if this performance can be scaled down to lower TDPs (we'll have to wait and see on this one), then Kaveri will likely shine a little brighter.
|Intel HD 5100 (Iris without Graphics Cache)||948|
Performing an extremely rough back of the envelope calculation, if 3DMark scales somewhat linearly with TDPs in the 35W-45W range, we're looking at around 1000 or so, which is within the range as desktop Richland chips, and just above Intel's 40 EU Iris GPUs (ex the graphics cache).
Reading through the Raymond James technology conference transcript and various AMD marketing slides, it is abundantly apparent that AMD is marketing the company's chips by "showing the benchmarks." I believe this is why we see the focus on PCMark 8/BasemarkCL/3DMark Firestrike scores as a common theme throughout the APU lineup and AMD presentations, and in these metrics AMD's marketing slides show an improvement in all areas, and larger improvements at lower TDPs.
Regarding benchmarks that review sites use, think about the flavors of benchmarks that are tested on chips with integrated graphics:
- Pure/theoretical CPU performance
- Pure/theoretical iGPU performance
- Real World CPU performance
- Real World GPU performance
- Special use cases, such as productivity benchmarks that use OpenCL acceleration
- Power consumption
Off the bat, my guess is that we will see CPU performance roughly on parity with Richland at the ~100W TDP performance level, with highly single threaded apps showing a slight performance regression due to lower clocks, and multi-threaded benchmarks showing a slight improvement due to the architectural changes.
I think productivity apps that are OpenCL accelerated will see a decent little uptick in performance, along with some decent gains for iGPU in specific titles that are not bottlenecked by memory bandwidth.
iGPU represents an interesting scenario in this case.
Note, I do not believe these slides indicate performance when the Kaveri APU is ran in crossfire with the GPU. The slides show very comparable framerates in various gaming titles on an R9 270X.
AMD's APUs when crossfired with the GPUs did not receive very good reviews before, so this is a big if here, but if crossfire works better with the new APUs and GPUs, then it begins to make much more sense to pair an AMD APU and GPU together so the iGPU performance isn't wasted. This could play to drive gains in both notebooks and desktops, as AMD just launched three mobile GPU SKUs.
The thing that will influence the feel of a PC the most for the largest percentage of users, and that is mechanical vs. solid state storage. I doubt the majority of users care what the whetstone or drystone benchmarks show for CPU performance. They care about the time it takes for them to press the power button on their PC until they can watch videos of babies riding around on Roombas (do not do this with your baby) on YouTube.
The common use cases - web browsing, running office apps, etc., are all heavily influenced by storage speed.
Gaming and price is another big case, and I believe this is why we see AMD gaining market share in the desktop. AMD offers very good integrated graphics at a solid value.
If you are on a limited budget, the money saved with an AMD build allows for the money to be spent elsewhere, like on a dedicated GPU that can crossfire with the AMD APU, or an SSD, or to launch products at lower price points. This is why I believe we're seeing Steam Machines based on AMD hardware launching at the lower price points.
For those that point out that "good enough" is a pipe dream, I will ask then why are tablets devouring PC sales, and why are AMD APUs gaining market share in desktop?
As for the special use cases and power consumption, best case for Kaveri is that Dice irons out the bugs in BF4 and we see reviews launch with Mantle for Kaveri, but this happening before the 14th may be quite the stretch. Hopefully we also see review sites with the 45W part in hand, giving us an indication of what kind of performance we can expect of mobile SKUs.
In summary, Kaveri looks to improve the feature, what I believe to be integrated graphics, that has helped AMD move the A10 desktop APUs, and I can see pure CPU performance increases being a toss up.
But the real benefit of Kaveri if the gist of the presentation is correct is Kaveri looks to scale the bulk of this benefit to mobile TDPs, while the focus on the desktop will be delivering a better all around experience. Mantle and HSA are wild cards, but hopefully if Mantle is ready by the 14th we will see some performance metrics for Kaveri and Mantle, and early HSA benchmarks look to show strong performance gains.
These same arguments used against Kaveri aptly applied to Richland as well, and that did not stop Richland from gaining traction in the desktop.
The same arguments could also have been made for Intel as of late in the desktop space. THG's review of Haswell: The Core i7-4770k Review - Haswell is Faster; Desktop Enthusiasts Yawn. Here is the concluding statement from the review:
"So, for the second time in a week, we're disappointed. Haswell has a lot to offer, just not to desktop enthusiasts. Intel's attention is fully in the mobile space, and we can tell."
Kaveri in the desktop space is more about the overall user experience, and in the mobile space looks to attempt to provide more performance at similar TDPs. Note that everything in this article is just my best guess at where we'll see Kaveri land, but we should see a slew of new benchmarks next week to see how the chip actually fares.
Additional disclosure: I own both shares and options in AMD, and actively trade my position. I may add or liquidate shares at any point, and may initiate a small hedge via puts prior to the next earnings release.