In this article, I will take an in-depth look at Advanced Micro Devices' (AMD) upcoming "Steamroller" architecture and Kaveri APU. I will begin with a brief excerpt from the Q&A session during the most recent conference call in order to relate the product launch back to the investment thesis.
This piece will be fairly technical, so if you do not enjoy the in-material, skip over the paragraphs regarding the architectural changes and read "Hardware Summary," the summary of the "28nm Production" paragraph, "Software Summary," and finally the "Conclusion" paragraph. Also, much of this article is based on publicly available information, as well as leaks, so some information in this article may be different from the actual design.
Wafer Payment Update And A Brief Recap Of The WSA And Amendments
During the Q3 earnings call, Mr. John Pitzer of Credit Suisse asked:
Thanks Rory. And then my follow-up for Devinder, given inventory target for the December quarter and can you guess bring this up to speed on where we stand with the wafer side of supply agreement with GlobalFoundries for this year relative to your obligation that you had in Q3, Q4?
Mr. Kumar replied by stating:
Yeah. As I said in my prepared remarks, I expect inventory levels to remain essentially flat from where we ended Q3. I recall that it's a steep ramp in the business, we introduced of the semicustom products, the new product that Lisa referenced earlier R7, R9 series. So I expect that in the Q4 timeframe, [indiscernible] Talking about the WSA from Globalfoundries. We are on track to meet the commitment for the 2013 WSA and on the 2014, if that's what you are referring, we are in discussions to figure out the pricing and wafer of volumes for 2014 and I expect those to close within the 30 or 60 days.
This question and answer followed Mr. Pitzer's opening question regarding a softer PC market going into the first half of 2014.
AMD's 2009 annual report states the company is obligated to GlobalFoundries for manufacturing certain products. However, during 2012, AMD underwent a series of WSA (wafer supply agreement) amendments that both served to arrange a different cost structure for purchasing wafers, as well as allowing AMD the ability to manufacture "certain "28nm APU products" at a foundry other than GlobalFoundries for a "specified period of time."
All available evidence points to Kaveri being one of the major products GlobalFoundries will manufacture for AMD, so it will serve to drive revenues in the PC and server market, as well as meet wafer requirements at GlobalFoundries. Therefore, I feel it is worth a long look at the architecture so investors understand the potential benefits of the upcoming architecture.
Looking At The Evolution of AMD's "Big Cores"
AMD's most recent line of "Big Core" chips began with the "Bulldozer" architecture. Joel Hruska of Extremetech explains the design tradeoffs AMD made in order to hit higher frequencies and make BD cores easier to ramp. But he goes on to explain how some of these tradeoffs caused other performance issues throughout the core.
"Piledriver," released in October of last year, consisted mainly of tweaks to the existing BD architecture -- not a major redesign. These tweaks resulted in slightly higher IPC and frequency for AMD's updated architecture. AnandTech shows identical die size and transistor counts for these chips. Bulldozer and Piledriver are both manufactured using a 32nm node at GlobalFoundries.
Comparing performance increases between AMD's FX-8150 "Bulldozer" based chip and the FX-8350 based on data from HotHardware, we can see AMD delivered on both fronts.
The FX-8350 delivered an 8% higher single threaded performance, and a 16% higher multi-threaded performance, using Cinebench as a metric. Note the FX 8350 has a slightly higher stock frequency than the FX 8150, but AMD was able to achieve an overall lower power consumption for the FX 8350 despite the higher frequency -- meaning AMD improved performance/watt between Piledriver and Bulldozer.
Contrasting AMD's results against Intel's (INTC), Intel was able to obtain a 7% increase in both single and multi-threaded performance, but with Intel improving performance/watt. Perf/watt has been a key driver for Intel, as the focus in computing is shifting to mobile devices.
In a previous article, I had noted that at the high end, Intel's performance has been stagnating as the company focuses on improving performance/watt. This is evident by looking at performance progression of Intel's CPUs.
Again, using benchmark data from HotHardware, Intel achieved an even smaller performance increase -- showing only a ~5% increase between Ivy Bridge and Haswell in performance, albeit at a lower power consumption.
AMD is set to release the "Kaveri" APU by the end of this year, with the APU available for retail very early in Q1 of 2014.
Looking at the most recent leaked desktop roadmap for Intel, we can see that it looks like plans have changed and Intel will be releasing a socketed Broadwell chip for desktop. But, during Intel's most recent conference call, CEO Mr. Brian Krzanich disclosed that Broadwell would be one quarter behind original plans.
With Intel pushing Broadwell back an additional quarter, and leaked roadmaps suggesting desktop Broadwell appearing late in 2014, AMD has a better window of opportunity here for Kaveri, with the window growing or shrinking depending on availability.
AMD's Steamroller Architecture Detailed
I began the above section with a description of Joel Hruska's breakdown of the Bulldozer architecture. Diving a little more in-depth, part of his explanation as to why Bulldozer struggled was that certain design choices were made to increase the length of stages throughout the chip, along with latencies, to allow for higher clocks. Another reason is thought to be bottlenecks on the front-end with regards to the decode hardware.
As you can see based on the slide above, again from HotHardware, we see Bulldozer struggling to keep up with Thuban (yellow bar is single threaded performance). This becomes an even bigger deal considering the higher clock speed for Bulldozer.
Steamroller will be the first major revamp of the Bulldozer architecture, with one of the design goals being to explicitly improve single threaded performance.
Additional design considerations include both a focus on overall performance by trying to eliminate bottlenecks from shared resources, as well as more active power management to lower performance/watt.
Improving the Decode Front-End
I know enough to understand what I am reading, but rather than try and put this in my own words I will defer to Mr. Anand Lal Shimpi's explanation:
One of the biggest issues with the front end of Bulldozer and Piledriver is the shared fetch and decode hardware. This table from our original Bulldozer review helps illustrate the problem:
Steamroller addresses this by duplicating the decode hardware in each module. Now each core has its own 4-wide instruction decoder, and both decoders can operate in parallel rather than alternating every other cycle. Don't expect a doubling of performance since it's rare that a 4-issue front end sees anywhere near full utilization, but this is easily the single largest performance improvement from all of the changes in Steamroller.
He goes onto explain the tradeoffs of beefing up the decode hardware as being increased die size and power consumption, but then explains how these can be offset by improvements in other areas. The image below is taken from PC Watch:
Although it does not appear to be an official drawing, it does look to be the most detailed image depicting the in-depth changes made to the front-end.
In the article on PC Watch (linked to above), Mr. Hiroshige Goto explains he believes Kaveri will come equipped with 512 stream processors, equating to 8 CUs.
To give an idea of GPU performance, THG reviews AMD's HD 7730 GPU, and the article refers to the HD 7730 as the "Harbinger of Kaveri."
Based on comparing similar data from Anandtech and THG, we can see that the performance between the HD 7730 and Iris Pro is very similar. Note the benchmarks vary slightly (THG uses higher resolution, but lower quality settings compared to AnandTech).
I would like to illustrate two points. The first being the HD 7730 puts a popular, but aging, AAA title on the verge of playability at full HD resolution with some of the eye candy turned up. Second, the GDDR5 version of the HD 7730 offers a sizeable, performance increase above the DDR3 version.
I bring these up because it is important to note that Kaveri could actually make newer titles playable on an economical iGPU solution. Intel's Iris Pro iGPU is reserved for the more expensive Intel CPUs. The second reason is that 8 CUs seems to be about the right balance between processing power and memory bandwidth limitations, if AMD goes with DDR3 instead of the rumored GDDR5.
As a final point of contrast, AMD's Jaguar chips use 2 CUs, equating to 128 stream processors and utilize a single channel IMC (integrated memory controller).
1080p resolution represents ~2x as many pixels as 720p. While barely applying any AF to rendering Skyrim, the 2 CUs in Jaguar cannot reach the 30fps playability threshold. Compare this to the 6 CUs in the 7730, which manages to almost achieve 30 fps at 1080p, higher details, and more eye candy. Given that Kaveri will likely use a dual channel DDR3 IMC, and 4x the GPU hardware as Jaguar, Kaveri should be a huge step up from Kabini and Temash.
Application Acceleration Using GPU Compute
AMD released a "white paper" in June of last year detailing many of the changes between GCN architecture and VLIW4. A common theme throughout the paper is the added resources to make the graphics architecture more suitable for GPU compute than VLIW4.
Looking at CompuBench results for AMD's Radeon 7660G iGPU (GPU in A10-4600M) vs. the Radeon HD 8400 (A6-5200 Kabini iGPU), the 25W top of the line low wattage Jaguar core is competitive with the 35W Trinity APU. This becomes more substantial when you compare die size. The Jaguar core is just over 100 mm^2, whereas the Trinity APU is 246 mm^2. The GPU on Jaguar likely constitutes around 30% or less of the total die area, whereas the die area on Trinity is likely closer to 35%-40%. The GPU is where the OpenCL benchmarks above take place.
Lastly, I would like to think about OpenCL performance in relation to a recent blog post on AMD. An article on LegitReviews picks apart the blog post and explains the performance increase in compute and iGPU performance over the i5-4670K.
Keeping in mind GCN is designed for GPU compute, there should be additional performance gains when AMD releases GCN based APUs.
As a final note, take a look at two additional benchmarks from THG:
Note that Intel CPUs are not using OpenCL in the above benchmarks, although they do support it. This review is somewhat older, and was created specifically with AMD's APUs in mind, so the Intel CPU only tests (not utilizing OpenCL) were added as a point of comparison.
To quote THG directly:
Add it all up, and the results definitively show that GPU-based acceleration of some sort should be mandatory for anyone with a significant amount of editing work to process. Not only do content creators need to keep an eye out for hardware able to accelerate their favorite applications, but they also need to pay attention to how their software of choice utilizes that hardware. And if the tools you're using don't yet take advantage of GPU-based acceleration, it's worth finding out why. Not all workloads are ideally suited to the sort of parallelism that a GPU introduces. But when it comes to media-oriented tasks, many do, in fact, benefit. The question now becomes how quickly vendors will make this support widely available throughout their wares.
I have highlighted the last sentence to point out that the performance gains are not free. But if you watched the latest Apple event on October 22nd, you'll notice that even Apple highlighted OpenCL acceleration as an important change for "Mavericks." I believe OpenCL acceleration will be an important emphasis on computing going forward, and one that plays to the strength of AMD's graphics IP.
Why The OpenCL Benchmarks Above Are Irrelevant
The entire aim of HSA is to create cohesive hardware capable of efficiently utilizing GPUs or other specialized DSPs efficiently and with minimal software overhead.
HSA has many facets, but the one I would like to touch on briefly is "heterogeneous queuing," or "hQ" for short.
Images in this section courtesy of Softpedia
The "old way" (meaning the benchmarks above, devoid of HSA) requires the CPU to pass data to the GPU before the GPU can assist with workloads.
HSA changes this dynamic, allowing the application to speak directly with the GPU and bypass the CPU. It also allows the GPU and CPU to pass data back and forth, but eliminates the requirement of all data to be routed through the CPU. Preventing the necessity to copy data between the GPU and CPU is more efficient (power consumption), at the same time as the reduced latency increases performance.
The last slide I will touch on is more to illustrate an ideal of HSA. The HSA Foundation is centered around taking an open source approach to heterogeneous computing. I use the example of this last slide to demonstrate one of the ways in which this open source concept is implemented. Information that is capable of being passed directly to the GPU will be formatted to a specific protocol, ensuring standardization between various hardware vendors. Contrast this against CUDA, Nvidia's (NVDA) proprietary standard for GPU compute. In CUDA, programs are written specifically for Nvidia hardware. If you wanted to run that same program on AMD, you would have to re-write the code. HSA will attempt to eliminate this overhead needed to port code between vendors via code designed to open standards that will run across a wider array of hardware.
This is not a guarantee by any means. Sometimes better technology fails to catch on. Laser discs were awesome. Who didn't like record sized CDs?
hQ will likely require some extra work from the programmer, or at the very least the programmer to become accustom to the programming style. Neither of these are overnight changes.
Add to this Intel does quite well on OpenCL benchmarks, and the more expensive architectures utilizing Crystalwell are impressive in GPU compute.
Comparing Luxmark benchmarks from THG and AnandTech illustrates a few important points. The first (previously mentioned above) is that GCN is much better at most compute tasks than VLIW4 (Radeon 6000 series). The second is that Intel's Iris Pro mops the floor with the AMD's VLIW4 APUs, as well as the Nvidia mobile dGPUs.
To sum up this paragraph, the GPU compute of AMD's new APUs should be a large improvement, but right now there are a lot of unknowns. To simply look at current benchmarks may be valid for when Kaveri first launches, but it will not really be indicative of performance if HSA catches on.
If you skipped the technical details, begin reading here. This paragraph also contains new information as well.
"Steamroller" and "Kaveri" will be important for the future of AMD.
An APU (accelerated processing unit) consists of both an CPU and an integrated GPU.
Based on leaked and public information, AMD appears to have attempted to fix some of the bottlenecks inherent to Bulldozer's CPU side of the equation. These fixes also include attempts to improve performance/watt.
Looking at the GPU, AMD is updating the APU with hardware that is better for gaming and general purpose computing.
The above technical description is not an attempt to over-inflate hopes for Kaveri by any stretch of the imagination. Specifically looking at the GPU, I do expect the iGPU in Kaveri to be stronger than the iGPU in Richland and Trinity (the APUs Kaveri is replacing). Regarding CPU performance, I am not quite sure where it will fall out. My guess and hope is that CPU performance is slightly to moderately improved, but with an improvement in performance/watt. By increasing some hardware in part of the processor, AMD can make sure the processor is not starving for information. At the same time, this could allow the company to lower clock frequency, making the chip use less power and possibly a little easier to manufacture.
28nm Production at GlobalFoundries
I will provide a brief summary of this information at the end of this section. It contains some fairly technical information.
Rumors surfaced back in 2011 of AMD scrapping plans for 28nm chips originally planned at GlobalFoundries. 2012 was the year AMD dealt with restructuring and penalty payments regarding the WSA (wafer supply agreement). In hindsight, we can see the benefit of paying GlobalFoundries to allow TSMC (TSM) to serve as a second source for AMD. GlobalFoundries lost out on quite a bit of business from console chips, as well as AMD's Jaguar chips.
AMD generated revenues of about $80M in the first quarter the console chips were being manufactured. To put this in perspective, the WSA amendment to allow TSMC to second source cost AMD $703M, split between a $425M cash charge and the transfer of AMD's remaining stock in GlobalFoundries to GlobalFoundries. Given a console life cycle of five years or so, the revenue and income generated from consoles alone could make up for the $700M charge in the first 2 years or so, provided sufficient demand for consoles.
Waiting for GlobalFoundries' 28nm to mature likely also benefits AMD's cost structure for Kaveri and Steamroller (my speculation). Yields and performance could be higher now compared to earlier, meaning AMD could likely see more good die per wafer, and have better binned chips for higher-performing parts. Think of binning as a measurement of a chip's characteristics.
To quote TechPowerUP:
It looks like TSMC and GlobalFoundries are both having serious yield problems with their 28 nm process nodes, according to Mike Bryant, technology analyst at Future Horizons and this is causing a rash of non-working wafers - to the point of having nothing working with some chip designs submitted for production. It seems that the root cause of these problems are to do with the pressures of bringing products to market, rather than an inherent problem with the technology; it just takes time that they haven't got to iron out the kinks and they're getting stuck...
Mr. Mike Bryant, the analyst quoted by TechPowerUP, also refers to GlobalFoundries' "gate-first" approach as "problematic."
An article in EEJournal explains that a "gate-last" approach is easier to implement, with the trade-off being that "gate-first" is more dense; you get more transistors/mm^2 using a "gate-first" approach. The article also states that Intel switched to a "gate-last" approach at 45nm, with TSMC making the switch at 28nm. If "gate-last" is indeed easier, it could explain why GlobalFoundries has lagged TSMC in 28nm production.
However, we are now starting to see production signs come to life for GlobalFoundries and other members of the Common Platform Alliance.
The Common Platform Alliance was formed by the members shown above to share manufacturing costs and R&D efforts between companies to reduce cost and accelerate roadmaps.
A rumor from the sometimes way-off site DigiTimes mentions that GlobalFoundries and Samsung have been attempting to steal customers from TSMC as the companies have managed to improve yields. We know based on data from ChipWorks the A7 chip is made using Samsung's 28nm process, which is also a "gate-first" process. And the process has excellent density, with the A7 totaling around 10M transistors per mm^2. GlobalFoundries is shipping 28nm mobile SoC products for RockChip. Note that GlobalFoundries has 3 different versions of 28nm for different applications, and Kaveri will likely use a slightly different process from the mobile chips.
Based on available information, I believe that Kaveri and Steamroller based chips will be filling these requirements going forward, along with 32nm Piledriver CPUs and APUs. According to HotHardware, Kaveri was taped out in November 2012 at GlobalFoundries. The author of the article explains away the DigiTimes rumors regarding a Kaveri delay by pointing out how aggressive a one year timeline from tape-out to production would be.
This chip will likely be one of the bigger and more complex chips AMD will manufacture at GlobalFoundries, provided GlobalFoundries is serving as the foundry.
Leaked information from SemiAccurate shows the TDP of these chips are within the range of current APUs, and are much higher than those of true mobile products, meaning these chips likely utilize a slightly different process than the RockChip SoCs.
At this point, we have seen tweets regarding APU development boards being shipped to devs (notice the date), various leaks of Kaveri Benchmarks (I, II), Dr. Su's Kaveri demo at Computex, the Oculus Rift Kaveri demo more recently, and during the Q3 conference call management stated Kaveri is still on track "for the channel this quarter."
BSN described the initial headaches of ramping Llano as due to manufacturing the GPU portion of the APU chip on a process designed for a CPU, and reported that the Bulldozer ramp went well. GlobalFoundries is shipping 28nm products for other companies, and the RK3188 chip being manufactured for RockChip comes equipped with a CPU and GPU on the same die. Note though that the chip is tiny at ~25mm^2, according to ChipWorks.
To tie the information in this section together, I believe Kaveri will likely be manufactured at GlobalFoundries. Based on wafer purchasing requirements, AMD has to make something at GlobalFoundries, and based on leaks I believe that "something" will be Kaveri. AMD and GlobalFoundries have had issues before with Llano, and AMD reportedly scrapped the company's Wichita and Krishna projects that were supposed to be 28nm chips manufactured at GlobalFoundries.
But now Samsung and GlobalFoundries, both members of the Common Platform, are shipping 28nm products containing both a CPU and a GPU. Although the RockChip SoC that GlobalFoundries is shipping is tiny, the A7 chip Samsung is manufacturing for Apple is larger. It is also impressive, packing in a ton of transistors/mm^2. AMD maintains that Kaveri will launch into the channel this quarter as well.
Kaveri shipping is an important milestone for AMD. If Kaveri was taped out less than a year ago, getting it into the hands of consumers by now would have been a feat. Also, by waiting on GlobalFoundries 28nm production to mature, AMD could have saved itself some headaches in the form of yield issues affecting margins or revenue (my speculation). To quote an article from Anandtech:
Also note that AMD isn't going to be as focused on delivering high performance products on the absolute latest process node. It views Brazos as one of its biggest successes to date and that architecture was built on a 40nm process with an easily synthesizable architecture. It's likely that the future of AMD is built around more of these easy to manufacture SoCs rather than highly custom, bleeding edge CPUs.
The nerd in me wants to see more information regarding Kaveri to see what the chip is capable of, and the investor in me wants to see more information so I can see the actual timing of AMD's product release and the ensuing impact to financials. I expect to see more information regarding Kaveri at APU 2013, and an update from a financial standpoint during the Q4 earnings report. Keeping in mind Intel stating Broadwell has been delayed one quarter, my hope is to see mobile Kaveri chips for retail prior to Intel launching Broadwell.
To look at software, I will examine common use cases that consumers use PCs for: gaming and content creation.
Games like Candy Crush can be played on a phone. Games that are more demanding require better hardware. If users want to play the most demanding games they need a CPU capable of preventing bottlenecks, along with enough GPU hardware to render the game at whatever fidelity the consumer is shooting for.
The above graphics refer to AMD's "Mantle." Notice the references to CPU performance. The first graphic mentions CPU overhead and the inability to access all the CPU horsepower. The second graphic mentions better CPU and GPU performance by optimizing the software to run better across more CPU cores.
Games are typically designed with consoles being the target platform and are then ported to PC. Going forward, games that are designed for the consoles will be designed to utilize the CPU cores more evenly - and this happens regardless of Mantle's adoption.
To illustrate my point, I would like to look at comments from two separate articles on the Eurogamer website. The first comment is from an article with Mr. Matt Higby, one of the developers working on Planetside 2.
"It's very challenging to split those really closely connected pieces of functionality across in multiple threads. So it's a big engineering task for them to do, but thankfully once they do it, AMD players who've been having sub-par performance on the PC will suddenly get a massive boost - just because of being able to take the engine and re-implement it as multi-threaded.
"I'm very excited about that because I have a lot of friends, lots of people who are more budget minded, going for AMD processors because nine times out of ten they give a lot of bang for the buck. Where it really breaks down is on games with one really big thread. Planetside's probably a prime example of that."
Because the next generation consoles are utilizing 8 of AMD's lower power Jaguar cores, developers have to ensure game engines and games are designed with that in mind.
The second comment, from another article on Eurogamer, is regarding building a future proof PC.
We approached a number of developers on and off the record - each of whom has helped to ship multi-million-selling, triple-A titles - asking them whether an Intel or AMD processor offers the best way to future-proof a games PC built in the here and now. Bearing in mind the historical dominance Intel has enjoyed, the results are intriguing - all of them opted for the FX-8350 over the current default enthusiast's choice, the Core i5 3570K.
Perhaps it's not entirely surprising - Crytek's Crysis 3 is a forward-looking game in many ways, and as these CPU tests by respected German site PC Games Hardware demonstrate, not only does the FX-8350 outperform the i5, it also offers up an additional, minor margin of extra performance over the much more expensive Core i7 3770K - a processor that's around £100 more expensive than the AMD chip. Only the six-core Intel Core i7 3930K - a £480 processor - beats it comprehensively.
The article does state that if power consumption is a concern, Intel would be the better choice. But in terms of absolute performance in gaming applications, the FX-8350 will be the better processor to future proof a gaming rig according to the article. This comment makes more sense using the first comment as a frame of reference. Games will be developed to more equally weight the workload of each CPU core, which will play to the strength of AMD's design.
The adoption of Mantle would be the icing on the cake to the above idea.
Content creation, media consumption, and Microsoft (MSFT) Office style applications are the other common use cases for PCs. Of these usage scenarios, content creation requires the most compute horsepower. OpenCL acceleration should help AMD close the gap here on performance.
The design goals for Kaveri seem to be to improve overall performance, but while keeping performance/watt in mind. My hope and guess for Kaveri's CPU is overall performance will be at least mildly improved, but at a lower power consumption.
My hope for the GPU is that it will be better all around - at both GPU compute and for gaming. If you take a large step back and look at the 10,000 foot view, you will see that the graphics cores being utilized in AMD's next generation APU, HSA, and OpenCL application acceleration all de-emphasize the importance of the CPU while playing to the strength of the GPU. Mantle and more equally weighted threads will serve to de-emphasize the CPU in gaming scenarios.
After the Q3 conference call, AMD bears stated Intel gunning for the low end segment of the PC market as a concern for AMD. I consider it somewhat of a concern as well in the near term as Intel aims Bay Trail in devices that will encroach on the low end of the PC space, which typically serves as AMD's playground.
Overall, this could prove to be a precarious situation for Intel. For example, in Q3 Intel's PC Group generated $8.4B in revenue. I have also previously estimated that AMD makes ~$250M +/- per quarter in server revenues, leaving ~$500M or so to PC sales, using Q3 as an example. Exact amounts are not as important as magnitudes.
Intel's PC Group revenue is ~17x that of AMD's revenue from PCs. And Intel is competing in this space with the newly released Bay Trail line, which is a ~$30 chip. If these chips are so good, why do consumers need to spend $100 or more on a laptop with a Core processor? Seeking Alpha contributor Ashraf Eassa recently published a piece on the Asus T100 Bay Trail powered 2-in-1. My question is how many consumers that would have opted for a Core powered laptop will go for the nifty 2-in-1's?
I call Intel's situation precarious because it seems the company will be coming down to pick up pennies in the low end ($500M in revenue or so that AMD has each quarter) at the risk of dropping dollars on accident at the high end (low end sales of $30 processors cannibalizing the sales of $120+ Core processors for mobile).
PC market share could prove to be a sticking point for AMD in the near term, provided Intel is successful at targeting the low end. However, I question whether this trend will be sustainable for Intel based on the reasons above. Also, with the impending release of Kaveri, I would not be surprised to see PC revenues be further depressed as Richland sales could fall as consumers wait on Kaveri.
While Intel is coming down to the low end to pick up pennies, AMD set its sights higher with the Kaveri APU. Mind this APU will not knock Intel from its performance perch, and I am not suggesting as such.
What I am suggesting is that Kaveri will be the first major overhaul to AMD's Bulldozer architecture since the inception of the design in late 2011. In Q2, AMD drove higher sequential desktop revenues with the introduction of Richland. In Q3, AMD stated this momentum in desktop sales continued (although no specific processor was cited during the call). A strong showing in Kaveri could more than offset Intel's attempts to gain traction in the low end. And keeping in mind that Kaveri will essentially be a different architecture (improved CPU + new GPU) than Richland, this alone could spur upgrades and drive revenues.
Desktop Kaveri is supposed to show up in early Q1 2014, with mobile Kaveri showing up sometime in Q2. Provided the performance of the desktop unit is strong, this should translate to mobile performance.
AMD is attempting to shift the software to more favor the company's hardware. Part of this shift will require effort on AMD's part to push HSA and OpenCL acceleration in software, and this will not be an overnight change. The software side of the equation that will not require much effort on AMD's part is gaming, based on the explanation above regarding games being designed to more equally utilize each core. This will not likely be a quick change either, as coders have to get used to balancing the CPU weighting.
The part of the change that will happen quickly is the hardware aspect of Kaveri and Steamroller. Kaveri will attempt to eliminate some of the bottlenecks of the original Bulldozer design, as well as a drastic update to the GPU.
With Kaveri launching so late in Q4, it will likely not have time to affect this quarter's financials positively. What I am hoping for in the near term is evidence as to how Kaveri could affect AMD's product stack going into 2014.
Given that Kaveri is AMD's first HSA chip that will be released to the public, I am hoping AMD will lock down a more definitive date as to when we can expect the chip to be available for retail, disclose more of the hardware design, as well as possibly seeing announcements for HSA software support at APU 2013.
A $50M decline in PC revenues for AMD was cited as the cause of the 30% sell off post Q3, and this $50M was lost mainly due to lower notebook sales. During the Q3 call, an analyst from Nomura estimated AMD's notebook sales were down 10%. It seems as if analysts are concerned as to whether or not AMD can protect the PC revenues the company generates.
Kaveri's launch in Q1 2014 could continue to build on the momentum AMD started in the desktop channel when the company introduced the Richland APU in Q2. Provided the performance is good, AMD could then carry this momentum into the second quarter in 2014 with a strong showing for Kaveri in mobile. This strength could allow AMD to grow, rather than preserve, revenues going into 2014. But all this is dependent on performance and availability, which we will not know more about until near the end of this quarter.
Additional disclosure: I own both shares and options in AMD and actively trade my position. I may add or liquidate shares/options at anytime. I am short NVDA via a very small number of puts that I may liquidate at anytime.