The Hybrid Memory Cube ("HMC") is an interesting bird. It was announced by Micron (NASDAQ:MU) and Samsung (OTC:SSNLF) way back in 2011. The goal was to reduce power consumption while dramatically increasing performance - a computing panacea. The original goal was to begin mass production in 2015, but that was silently bumped (Micron is only sampling modules that are quite stale). I believe this delay might explain Micron's lack of execution:
Since the HMC is very much misunderstood by pretty much everyone outside of the supercomputing industry, allow me to better explain why it was invented in the first place. Intel (NASDAQ:INTC) outlines the problem to be solved in one of its patents on the technology:
Optimization of memory bandwidth, power efficiency and form factor are becoming increasingly important as memory causes significant bottlenecks to future microprocessor systems. It is common for most CPU systems to utilize a dynamic random access memory ("DRAM") based bulk memory solution to provide capacity and bandwidth. However, DRAM process technology is primarily optimized for capacity and cost to the sacrifice of both bandwidth and power efficiency. On the other hand, logic process technology conventionally used for CPUs are optimized for logic density, power efficiency and bandwidth with the drawback being higher cost and lower memory density.
Essentially, DRAM is built using a process that is incompatible with the much more capable CPU-style logic process. But, because DRAM requires on-chip logic to control the physical memory cells, manufacturers have to make sacrifices in bandwidth and power consumption. Thusly, the logic on today's DRAM chips runs only in the hundreds of megahertz - an order of magnitude below where it could be in an idealized situation.
With double, triple and quad data rate techniques, the industry has conjured a clever solution to deal with the slow DRAM clock, but it is less than ideal. They've also invented "embeddable" DRAM ("eDRAM") that is compatible with the high-performance CPU fabrication techniques but this is lower in density and consumes more power.
The HMC was invented to remedy the compromises of DRAM: separate the DRAM "control logic" from the physical DRAM cells and then reconnect them using chip stacking technology. While these slides are a recap from a previous article, they are necessary to do a better job explaining HMC. With the information presented above, these should make more sense:
HMC is much closer to idealized because it has eliminated the compromise of integrated DRAM and logic (i.e. - the logic base is now all logic while the DRAM is now all DRAM). Without the burden of integrated logic, the DRAM-only chips can be fabricated with higher density, higher performance and lower power consumption.
Once stacked onto the "logic base" chip, the DRAM chips can then be partitioned into "slices" to facilitate a balance of bandwidth and power consumption: each slice can be thought of as a single DIMM, so unused portions can be powered off in order to decrease power consumption at the expense of bandwidth.
Conversely, data reads and writes can be divided up over many (or all) of the slices in order to increase performance - the current HMC specification performs to nearly 4 terabits per second. This is an astronomical number compared to today's best-performing DDR4 systems that perform at just a tenth of this number.
With all control logic moved onto the dedicated "logic base" chip, you can now fab it with high-performance, CPU-style techniques (think up to 4Ghz instead of 0.4Ghz). Most importantly, the logic base now has the processing power that is necessary to communicate over a low-cost serial link (where low cost "serial" is the opposite of high cost "parallel").
Current compromised integrated logic memory DRAM technology is higher in cost because it leverages a wide, low-speed parallel interface that is necessary in order to deal with the compromised clock speed. For those of you who were around before USB ("Universal Serial Bus"), a parallel printer cable was the only way to go - and it was a substantial investment because each signal required a dedicated wire. If your printer was ten feet away, then you were in for a >$100 investment.
With USB, the variety of signals were "serialized" into fewer wires using a much higher clock speed that could "time share" those wires in order to communicate the same information at a substantial cost savings. Nearly all of computing has transitioned from parallel to serial interfaces - parallel ATA was replaced by serial ATA, parallel PCI was replaced by serial PCI Express, and USB displaced a variety of other parallel buses.
The compromise of integrated logic and DRAM is the only obstacle in the serialization of system memory. For example, a DDR4 DIMM parallel interface has 288 pins that need to be routed through the motherboard to various resources:
With the clever solutions put forth in the HMC, a substantial amount of cost, complexity and space can be removed from computing systems because memory can now be accessed over a narrow, high-speed serial bus instead of a wide, low-speed parallel bus.
This serial interface is what differentiates the HMC from the competing High Bandwidth Memory ("HBM") from AMD (NASDAQ:AMD) and Nvidia (NASDAQ:NVDA), who still maintain the expensive parallel interface. The HMC's serial interface is a huge advantage because it allows flexible, daisy-chained arrangements of HMC devices. From a Micron HMC patent application:
There's is also an HMC hub specification that can be configured in a variety of structures. From the same patent application:
It looks complicated but it is just Lego-like processing and memory. With this sort of flexibility, you can create a computing, memory and storage fabric that can scale in all three dimensions - with integrated fault tolerance. This is really astonishing technology.
Summary and Conclusion
If you ever teach a yodeling class, probably the hardest thing is to keep the students from just trying to yodel right off. You see, we build to that.
- Deep Thoughts by Jack Handy
This article represents the first of a series (serial?) of articles that are prerequisite in understanding what is about to happen in the computer memory and storage markets.
I've discovered a likely explanation for this curious structure but it isn't what you think.
Part Two: Everybody Gets a Core.
Disclosure: I am/we are long INTC, MU.
I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.
Editor's Note: This article discusses one or more securities that do not trade on a major U.S. exchange. Please be aware of the risks associated with these stocks.