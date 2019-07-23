Science-fiction pioneer Robert A. Heinlein was an author, aeronautical engineer, and retired Naval officer who coined the term "grok," which is defined, in part, as "to understand intuitively," where "intuit" is "a quick and ready insight."

This is an important matriculation of the term "understanding" when it comes to Moore's law because, while many people are quick to spew-off their well-rehearsed, superficial understanding of Gordon Moore's transistor doubling observation that we've all come to know and love, they don't grok it in that they could tell you what Intel (INTC) meant when they critiqued AMD (AMD) for "gluing chips together" instead of making a proper single-chip CPU.

Moore's law is the observation that the number of transistors in a dense integrated circuit doubles about every two years.

I've been toying with this article for too long - first rejected by Seeking Alpha for being "in the weeds" and then becoming way too verbose for my own taste. For the sake of brevity, I need to get to the point. Moore's law seems simple if we just substitute "single chip" in lieu of "dense integrated circuit" because that's what's been true since the advent of the observation. But the industry now concedes that they will need to leverage chiplets in order to keep pace. This is where the "glue" accusation came in. Intel is essentially implying that AMD's implementation of chiplets doesn't qualify as a "dense integrated circuit." And they're right.

It is necessarily going to take some paragraphs in order to make this clear (albeit still very technical). So, starting at the beginning, let's go back to the nanoscale image of a Chipworks cross section for a typical modern processor that I discussed in my last article, which is a prerequisite for this one if this image seems alien to you:

Source: Dick James, Chipworks

Time Enough for Love

This is a very important image to be grokked by investors. To recap, while this is a good snapshot of the nanoscale complexity contained within a typical modern processor, realize that it doesn't even come close to representing the full "3D" complexity at the macro level. But that's neither here nor there with respect to the scope of this article. Just realize that transistors are impossibly small.

Also realize that, while a typical processor now contains billions of transistors, there are only 28 of them present in this cross section - tiny fins in the silicon (dark layer at the bottom). This is important because all that other stuff above is just wiring (in light grey) called "metalization." The smaller metalization layers at the bottom connect the transistors to each other in the complex arrangements required for binary computation. The higher levels of metalization become increasingly coarse in order to interface the chip with the outside world.

This is the most important aspect to understanding Moore's law - it is not currently feasible to externally interface transistors from chip-to-chip at nanoscale precision. To get a better idea of this issue with respect to this precision requirement, we can zoom out on this cross section:

Source: Dick James, Chipworks, annotations by Stephen Breezy

That seemingly enormous bump on top of those tiny circuits is called a "micro-bump" in the chip packaging industry. They use the prefix "micro" because, on a human scale, these bumps are still quite tiny (measured in the tens of microns). The cool kids simplify the term "microbump" down to "bump" in order to save a few thousand syllables over the course of their everyday. The point here is to realize that transistors are impossibly small.

These bumps are there to get the data in and out of the chip in the form of binary electrical signals. In a multi-chip computing solution - which is pretty much everything non-trivial in today's world of divided, never-the-twain processor and memory, realize that sending data through these bumps is a tremendous sacrifice versus doing it all in the time and energy-efficient nanometer silicon at the bottom of a single chip.

The more that you can do on a single chip, the more you can avoid the giant speed bumps. When Intel talks about "gluing chips together," they're talking about interfacing multiple chips through these speed bumps. In order to relay a signal to another chip, you've got to fill a gigantic pipeline of capacitance using a tiny spigot. And this is requisite in both directions - out of the processor and into the other chips and then back. This takes an eternity and consumes much more energy versus on-chip data transport.

If a processor wants to read or write data to DRAM, for example, the data needs to traverse many hundreds - sometimes thousands - of these bumps in both directions (out of the CPU and then again into the memory). A single DDR4 module has 288 pins, for example. Most of these pins are tied directly back to the processor. At the bleeding edge, Intel is rolling out processors ("Cascade Lake") with 12 memory channels, necessitating a 5903 individual connections to the system board.

Source: AnandTech, Ian Cutress

This absurdity is neither a sustainable nor cheap method of compensating for the increasing divide between processor and memory. In the world of computer science, this divide is known as the von Neumann Bottleneck. In many instances, modern processors largely sit idle, just waiting for data to traverse these bumps between processor and memory.

This excruciating journey of the electron between chips is the key to grokking Moore's law, which is rooted in both time and energy.

Time for the Stars

In the past, I've tried to articulate the conundrum of existing photonic technology. To recap, photonic ("optical") communication technology is orders of magnitude faster than electrical communication technology but the materials and processes used to make photonic chips have been fundamentally incompatible with the silicon ("CMOS") manufacturing process. That is, photonic chips have been fabricated on ultra-fast, standalone chips and then interfaced through an ultra-slow, high-energy electrical interface back to the regular computing environment.

This is backwards.

So there's necessarily lots of time and energy wasted converting between the electrical and optical domains, which is why high-speed photonics are only currently leveraged in expensive data center and telecom gear. Just realize that those implementations are compromised by the high-energy, high-latency, electrical chip-to-chip interface. Curiously, in 2016, Intel announced that they had developed a process that would allow the two technologies to coexist on the same chip - panacea. But, to date, Intel has yet to release any processors with integrated photonics.

Moore's law was required because of those pesky bumps, which could only be pushed to pathetic nanosecond speeds. With Intel's silicon photonic capabilities sitting next to those silicon transistors at the bottom of the chip, the need for Moore's law largely vanishes. Transistors, which operate with picosecond switching speeds, can now speak directly to photons, which operate with even faster attosecond switching speeds - without traversing the metalization layers that include those giant speed bumps. Behold, a wafer of Intel's forthcoming Ice Lake processor chips:

Source: CNET

Clearly visible on each processor die are integrated photonic cavities. Integrated silicon photonics finally realized. Grok that these ports bypass all of the metalization layers and speed bumps and drill down directly to the transistors - bypassing all of the electrical delays that largely necessitated Moore's law. Chip-to-chip communication at light speed. For the first time, a "dense integrated circuit" can span multiple chips and Moore's law matters heaps less.

In my opinion, this is one of the most important technology developments to come along since the advent of the transistor. Affordably, and at full-speed, we can now leverage light instead of electricity to tie chips together at the transistor level, chip level, motherboard level, and beyond. Intel indicates that these are Thunderbolt ports which was initially developed as a photonic technology but eventually converted to electrical, promising to change back some "years" into the future. It looks like we've reached that point.

With Ice Lake's four 40Gbps Thunderbolt ports, this works out to a total of 160Gbps of bandwidth. Now, Thunderbolt is primarily comprised of PCIe lanes with four operating at 8Gbps for a total of 32Gbps out of the 40 total. With PCIe 4.0 already shipping, this number gets bumped up to 264Gbps. Both of those numbers are staggering for a processor that has a power specification ranging from 9 to 28 watts. It definitely isn't likely that they're going to leverage all of this bandwidth for external connections.

Stranger in a Strange Land

Fourteen years ago, Intel told us that their expectations were that the metalization layers ("copper wiring" in the context of the linked article) would become the fundamental limitation of computing. They also told us that they were betting on photonics in order to overcome this limitation. The relevant quote is here:

Optical connections can carry thousands of times more data per second than copper wires can. But ­existing optical components, which are made out of such exotic semiconductors as gallium arsenide and indium phosphide, are far too expensive for use in individual computers or even local networks. If you could make optical devices out of silicon, which is cheap and, at least for a company like Intel, easy to manufacture, that would change everything. The move to silicon optics would add a basic new capability to silicon chips: the ability to manipulate and respond to light. Companies would likely exploit that capability first by replacing copper connections with optical links in networks. But eventually, silicon photonics might also replace copper wires between processors within a single chip. Chip ­designers also envisioned using silicon optics in the ­internal clocks that microprocessors use to execute instructions, dramatically increasing clock speeds–and thus computing speeds. -Pat Gelsinger, Intel SVP

In a previous article, I surmised that the marketing term "Optane" was a mash-up of the words optical and octane. With some additional years worth of Intel disclosures now public, I am much more confident in that prediction. It all starts to come together with Ice Lake:

The photonic PCIe lanes now offer much lower latency so it makes a lot more sense to put system memory on that bus.

For extra performance and durability, the mysterious and unexplained "memory-side cache" that Intel introduced with Skylake will sit in front of next-gen Optane, which will probably leverage 16 to 32 layers of cells for a capacity of at least 128GB to 256GB per chip to be used as system memory and storage for both CPU and GPU.

In order to save the cost of DDR4's 288 pins per channel, the cached Optane will be tied back to the processor with a photonic link (likely running the Thunderbolt ports in a lightweight, high-speed NVMe mode).

It is likely that low-end systems will leverage one of the four photonic ports while high-end systems leverage two.

To further mitigate the slower Optane performance as system memory, Intel's mysterious high-bandwidth, low-latency cache will be implemented as a last level cache on the CPU-side. They may play some other tricks in this cache but that's out of the scope for this article.

Because this new architecture will do much more with much less, Intel's "Project Athena" exists in order to shift consumer focus from pushing cores, gigahertz, and total system memory over to actual real world computing power ("experiences," in Intel lingo).

Intel is giving away the Thunderbolt specification to be used in USB 4.0 because a single-chip processor with integrated photonics wins versus a two-chip solution, every time. The hodgepodge of HDMI, DisplayPort and other cables will finally collapse and Intel will be very likely to find themselves providing both sides of multimedia connections (inside both set-top box and television or monitor, for example).

The Moon is a Harsh Mistress

Yes, with their Cascade Lake series of processors, Intel is also now guilty of "gluing" together processor chips, albeit with much more elegance and energy efficiency than AMD. Behind the scenes, however, Intel has what I am calling "Moore's Glue" or what academics have been calling a "photonic network-on-chip" ("NoC" in shorthand). I suspect that this is what provoked the "glue" accusation in the first place - Intel has photonics coming down the pipeline in a big way.

When Optane first arrived, it was criticized by everyone because its performance was hindered by its placement on the very high-latency (by comparison) PCIe bus. With an on-chip photonic PCIe bus, latency will decrease dramatically. It will also bring the high performance Omni Path fabric capabilities on-chip. I fully expect that we'll see Intel's "near-far" memory hierarchy implemented using some implementation of their on-chip photonic technology.

All of this means that the "dense integrated circuit" definition of Moore's law will now be extended beyond a single chip, which is why Intel is still producing chips at 14 nanometers and not investing much in EUV technology.

Source: Hotchips/Intel

Takeaways

When Intel announced their mesh processor fabric, I giggled a little bit because this all adds up. With a photonic NoC tying together the various chiplets of the processor (a la Cannonlake), all of the cores become directly accessible to each other in an all-to-all manner. So a ring bus no longer makes sense. With photonics present on Ice Lake, I can't fathom how far ahead is versus their competitors.

Curiously, one would assume that the insiders at Apple (AAPL) would be the first to see what's on Intel's road map. And that's what's puzzling - from reputable sources, there have been plenty of rumors that Apple will be dropping Intel in the near future in an effort to homogenize their development platform on ARM processors. It follows that Apple has rebuffed Intel for something better.

I have a hunch what "something better" is but that is an article unto itself. Here's a hint:

Micron (MU) manufactured photonic DRAM for the academics back in 2016 (video here).

Intel has acknowledged Micron for their "photonics processing" (such as in the Hotchips slide 28).

The chalcogenide material used in 3D Xpoint / Optane is widely considered the Holy Grail of photonics by industry experts.

From an Intel suit against a former employee who joined Micron, we know that "only a few hundred people in the world have specialized knowledge pertaining to 3D XPoint, and the processes for developing and manufacturing 3D XPoint are not written in any textbook or taught in any school."

With this week's trade curfuffle between Japan and South Korea, Apple, Amazon (AMZN), Microsoft (MSFT), and Google (GOOGL) (GOOG) have all sent representatives to gauge the fallout. I've speculated in the past that, as memory gains computational features, then it becomes an even more important part of the system.

This could lead to some interesting acrobatics in the industry this year.

Disclosure: I am/we are long MU. I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.