So, for the past three months I have been banging the table, metaphorically of course, about the future prospects for Advanced Micro Devices (AMD). And it seems like the market has finally caught up with that analysis. Last week, AMD's stock closed very strongly, a point noted by Peter Pham over the weekend, and set itself up for a breakout above $2.75 per share.
Well, with the announcement of the details of HUMA - Heterogeneous Unified Memory Architecture - on Tuesday, the modest price targets Peter discussed in his article were pretty much blown out of the water. AMD closed Wednesday at $3.22 and followed up on Thursday closing at $3.41. Volume on both days was very heavy indicating both heavy buying and fueled by a bit of short covering.
AMD is not the only major tech stock to break out recently, and one could make the argument that it was just AMD's turn, after Microsoft's (MSFT) better-than-expected earnings allayed investors' fears about the future. Microsoft closed on Thursday above $33 per share, a multi-year high and a harbinger of a potentially massive move out of its very long range trade. Intel (INTC) pushed back over $24 on news that it has picked Paul Otellini's successor. Even Nvidia (NVDA) looks like it wants to make a run to $15 per share, though for the life of me, I still do not understand that investment thesis.
The Two-Headed Hydra Solution
hUMA (part of AMD's Heterogeneous Systems Architecture) is something that we've known AMD has been working towards for a long time. The problem lay in not knowing the details and now we do. What we know is that not only will the CPU and GPU share the same memory space directly, the fact that it is cache coherent means that the hardware enforces data integrity making software development easier. Moreover, because hUMA can address virtual memory that opens the way for mixing and matching processors of all types to create truly purpose-built computing units.
Make no mistake; HSA is an important step in the evolution of computing.
Why is this? General purpose CPUs are good at single-threaded tasks while GPUs are parallel processing beasts. When AMD started down this road it anticipated a market that is only just emerging. It made the design decision with Bulldozer to, in effect, cripple its single-threaded floating point ability to allow the GPU, much better suited to the task, to handle the multi-threaded tasks. The problem is that most software at the time was still heavily single-threaded. Some of this was due to Windows, some of it simply that the benefits did not outweigh the changes and some of it was software designers writing code optimized for the dominant player in the market, Intel.
But, with each successive APU the firm has released - from Brazos and Llano to Kabini and Richland - the merger of the CPU and the GPU continued, thereby simplifying the design and raising the performance; not enough to challenge Intel's single-threaded performance, but they were becoming better products, albeit slowly. As we continue into 2013, we see more high-end games and applications are heavily multi-threaded and the performance gap per dollar is swinging AMD's way as GPGPU computation is becoming more important along with price.
We are getting a hint of what hUMA will offer with Kabini/Temash but, and this was recently confirmed by Sony's (SNE) lead architect Mark Cerny, that the PS/4's 8-core Jaguar-based SoC will have a shared memory architecture that is likely hUMA-based. We will see AMD release a consumer version of this SoC at some point in 2013 - likely having 4 cores.
But with the upcoming Kaveri, based on AMD's next big CPU core, code-named 'Steamroller,' we will finally see a fully integrated memory architecture to which the CPU and GPU will have equal access.
The important takeaway about hUMA and HSA in general is that it can radically simplify the SoC, creating a number of design synergies that do away with currently redundant caches and copy/fetch units that are needed to maintain data integrity. This article at Ars Technica does a good job of breaking it all down.
Power Savings Through Simplicity
Now systems designs can better utilize a smaller number of components - read memory chips that need to be powered up and down - improving overall system power usage as well as streamlining the code needed to make it all work. Power utilization is not just function of process node. If a chip on a 22nm process needs twice the transistors as a chip on a 28nm process to achieve the same performance there is no advantage to the 22nm process. In fact, the difference in process size becomes a necessity to compete with the more streamlined architecture being produced on a cheaper process.
This will drive die-size down along with heat generated, simplifying overall system design and reducing costs. I keep hearing from Intel apologists that AMD sells its chips so cheap because it has no pricing leverage. That may have been true in 2011-12 but in 2013 it will be Intel that has to accept lower margins as the price/performance gap has narrowed significantly, especially at the price points where most computers are sold.
One can almost make the argument that part of the reason why we have seen so much done to improve power gating and process shrinkage is because of the enormous complexity of the multiple memory sub-systems. Haswell is a product dominated at the design level of trying to overcome these complexity issues through sophistication. On the other hand, combining the system memory into a single unit that processors of all stripes can write to directly should open up all new design vectors for power and heat management by reducing complexity and raising total CPU/GPU utilization rates.
Think about this in the context of the upcoming PS/4 and the SoC and 8GB of GDDR 5 that it will be using. There is no way Sony will be selling the PS/4 for $599 to start out with. To be competitive in today's market, the price will have to be below $399 and more likely $349 or $299 to have a prayer and do everything in no more than 100 watts under full load, likely a lot less. This is, by far, the most ambitious APU AMD has designed and yet it will power a device that has to be cheaper than a typical mid-end laptop.
The HSA standard that AMD is putting together along with other members of the HSA Alliance, which includes Qualcomm (QCOM), ARM Holdings (ARMH) and Texas Instruments (TXN), is creating a framework for different instructions set to co-exist on the same chip fulfilling the promise of OpenCL and the future of instruction set neutral code.
The move this week in AMD's stock on hUMA's announcement is confirmation that the market finally sees where this all leads.