Intel Optane: Someone Unplug The Smoke Alarm

| About: Intel Corporation (INTC)
This article is now exclusive for PRO subscribers.

Summary

Intel rolled out a DRAM-killing Optane on a Sunday.

They forgot to issue a press release and website feature.

The final product specifications are curious.

Last Sunday in front of a dark and empty auditorium, Intel (INTC) executives silently allowed a journalist embargo to expire. This embargo was put in place during the week prior for journalists who were invited to Intel's Folsom campus for a disclosure on the P4800X memory product - the first new commercial memory technology to hit the market in the last 30 years.

Instead of reiterating his sentiment in my own words, I will just link to an article by David Manners over at ElectronicsWeekly.com - The Curious Saga of 3D XPoint. It is worth a full read but it is summarized as follows:

The strangest product launch in the industry's history has been the saga of Intel's 3D XPoint.

I'm in full agreement with David here but, as usual, I'm going to up the ante a bit. The P4800X specifications are the key to this end. We know from the previously-disclosed photos of 3D XPoint that the chips themselves contain a bunch of extra space for error correction. If you look closely, you can see each memory "tile" separated by the faint horizontal and vertical address lines:

3D XPoint Dies 3D XPoint ECC Word From this inspection, we can see there are 576 memory tiles on each chip (remember, there are two decks in 3D so each tile has one beneath it). Since memory architecture is only efficient in multiples of 2n, we know that Intel and Micron (MU) designed the chip with an extra "spare tire" bit for every byte (where a "byte" is 8 bits).

So, for actual data storage, there are 512 memory tiles each with 32 megabytes for a total of 16 gigabytes. The additional "spare tire" capacity amounts to the 64 remaining tiles which works out to a 2 gigabytes. Without going into senseless minutiae, just realize that each chip is designed to be an island unto itself.

That is to say that if a single 16 gigabyte chip is deployed into a product, then that chip is designed to meet the rated specifications for capacity, latency, bandwidth and durability without any outside assistance. If they needed more durability, then they would have built more spare tires on the chip.

Curious

From the PC Perspective write-up (and many others), we also know that each of the P4800X cards includes 28 of the 16 gigabyte 3D XPoint chips ("dies" as they are sometimes called after being sliced and diced from the wafer) for a total of 448 gigabytes. If there is already an extra two gigabytes for error correction on each 16 gigabyte chip, then why is the total capacity of the P4800X rated at only 375 gigabytes?

Upon further inspection, it is revealed by Intel's Memory Drive product specification sheet that the capacity is further reduced to 320 gigabytes when using the P4800X as system memory in lieu of DRAM:

3. Total physical capacity is 375GB. Total usable capacity toward Memory Drive is 320 GiB.

There is really only one reason to explain such a poor utilization of raw capacity in a device like this: durability is too low. That is, the spare memory will be put into use as the actual in-use memory wears out and is retired. This is done with NAND flash now but to a much, much smaller degree (e.g. - a drive with 512 gigabytes of raw capacity is now typically sold as a 500 gigabyte drive with 12 gigs of spare capacity).

When put into use in place of DRAM, wear is highest. So they need that extra 40 percent spare tire capacity in order meet their advertised durability. And, wouldn't you know it, the current durability rating also hints toward this. From the PC Perspective write-up:

Those with a keen eye (and calculator) might have noted that the early (terabytes written) values only put the P4800X at 30 DWPD [drive writes per day] for a 3-year period. At the event, Intel confirmed that they anticipate the P4800X to qualify at that same 30 DWPD for a 5-year period by the time volume shipment occurs.

Intel committed to shipping Optane "for revenue" in Q1 and it appears that they've met that goal. But, like their photonic products, they're only shipping a very curious Optane product to "select customers." Does anyone know who these customers are?

Curiouser

If you take a walk down Memory Lane, you might remember that, back in 2014, HGST (now Western Digital (WD)) pieced together some of Micron's crusty old 45 nanometer phase change memory chips in order to snatch the IOPS (input/output operations per second) record:

It is important to note that this record still stands today as the Intel device is advertised at only 550,000 IOPS (at a queue depth of 16, for the techies).

The 2014 HGST attempt with Micron's old PCM handily outperforms Intel's much-lauded Optane P4800X. But, as Seeking Alpha's Electric Phred has already pointed out, the previously advertised 3D XPoint performance specifications are all over the board, with Micron's 8-lane add-in cards (even their tiny 200GB part) advertised at 1.9 million IOPS.

Amiss

The entire HGST exercise was done simply to demonstrate that, although the NVMe protocol was an improvement over the PCIe protocol, it was still leaving lots of performance on the table. HGST explained that they were only able to achieve such impressive numbers by choosing a fundamentally different PCIe architecture called polling. Their 2014 white paper sounds a lot like Intel does today:

The main motivation for the work we present in this paper is the desire to build a block storage device that takes advantage of the fast readout of PCM to achieve the greatest number of input-output operations per second (IOPS) permitted by the low physical latency of the memory medium. While spectacular numbers [6] of IOPS are touted for flash-based devices, such performance is only possible at impractically high queue depths. The fact remains that most practical data center usage patterns revolve around low queue depths [7, 8], especially under completion latency bounds [9]. The most critical metric of device performance in many settings is the round-trip latency to the storage device as opposed to the total bandwidth achievable: the latter scales easily with device bus width and speed, unlike the former. Under this more stringent criterion, modern flash-based SSDs top out around 13 kIOPS for small random reads at queue depth 1, limited by over 70 µs of readout latency of the memory medium (our measurements).

The paper goes on to describe how they implemented this polling architecture. When 3D XPoint was announced, Intel updated their storage development site to outline this very need for 3D XPoint. They previously articulated the problem in plain English well (although this language has now been removed, the link above is to an archived version of the site):

Polled Mode Drivers (PMDs) are continually awaiting work instead of being dispatched to work. Think of the challenge of hailing a cab downtown on a busy Saturday night, hands waving as cab after cab passes with someone already in the back seat. Think of the unpredictability of the wait, the impossibility of saying how many minutes might be spent waiting on the curb for a ride. This is what it can be like to get a "ride" for a packet or block of data in a traditional interrupt-dispatched storage I/O driver. On the other hand, imagine the process of getting a cab at the airport. There is a cab driver watching, sitting at the front of the line, pulling up reliably in a few seconds to transport passengers and cargo to their intended destinations. This is how PMDs work and how all the components of (storage performance development kit) are designed. Packets and blocks are dispatched immediately and time spent waiting is minimized, resulting in lower latency, more consistent latency (less jitter), and improved throughput.

A number of questions are provoked as follows: if Intel has already developed the capability to unlock the vast majority of performance in Optane, then why aren't they using it? Why are the advertised specifications going to change between now and general availability. Was HGST really using Micron's old PCM or is it possible that they were using pre-production 3D XPoint in order to achieve 3 million IOPS?

Conclusion

I have concluded nothing. This week's Optane disclosures have brought us no closer to the truth aside from Intel finally admitting that they'll be using the technology to displace and/or augment (depending on the application) DRAM as system memory.

This sort of announcement normally would have devastated DRAM makers Micron, Samsung (OTC:SSNLF), SK Hynix (OTC:HXSCF) and the others. And it definitely should have rattled AMD (NASDAQ:AMD) - but it didn't because something is burning in the kitchen over at IMFT. This burning could be the reason that Mark Adams and Mark Durcan have jumped out of the CEO seat.

Or it could be my over-active imagination again.

Homework

Please study the photonic disclosures that Intel presented at the 2010 Hot Chips conference - especially this one:

Disclosure: I am/we are long INTC, MU.

I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.

Editor's Note: This article discusses one or more securities that do not trade on a major U.S. exchange. Please be aware of the risks associated with these stocks.