The first AMD (NASDAQ:AMD) Ryzen CPUs are on the trucks for delivery to customers this week. This represents the first real competition to Intel (NASDAQ:INTC) in the personal computing and data center space in over a decade. AMD's share price has grown like bacteria in a Petri dish over the past year as the performance of Ryzen solidified versus Intel's current technology.
But this is where AMD investors need to be careful.
Image courtesy of Kalin Nikolov Koev.
About a year ago, I came across an Intel patent application which I had dismissed as vaporware because it seemed simply too far-fetched - there was just an astronomical amount of technology involved. And I've found that, sometimes, patents are filed to cover the base "just in case" the sky happens to turns blue over something insurmountable.
This particular Intel patent application seemed like it fell into this bucket and so I quickly moved on. As the year progressed, however, Intel began regularly using "CNL" as an abbreviation for their upcoming Cannonlake processor. When I came across the patent more recently, I noticed that the CNL abbreviation was present - I'd discovered Cannonlake's design.
And, boy, is it a doozy!
METHOD AND APPARATUS FOR STACKING A PLURALITY OF CORES
US Patent Application 20160092396
Published: March 31, 2016
Assignee: INTEL CORPORATION
Filed: September 26, 2014
A choice quote from the application:
 The embodiments described herein provide value on several vectors:
 1) Cost reduction: Two smaller 3D-stacked die have better yield than one large, monolithic die. A cost savings of up to $350M per server processor program has been estimated depending on the actual configuration used.
 2) Reduced time-to-market: The uncore (bottom) die can be designed ahead of time and the sea-of-cores built when the client core is ready. This will enable server processors to ship about 3-6 months after the client processor introductions (today this time lag is over one year). In particular, all the I/O circuits (DDR, PCIe, QPI) can be debugged on the platform ahead of the core availability.
 3) Better mesh RC performance: Since the uncore (bottom) die is built in an older process technology, it has lower mesh latency and higher frequency due to lower RC delays in the older process and shorter mesh routing underneath the cores. The mesh frequency increase is 10% with the reduced 1-cycle horizontal latency or up to 38% with the existing 2-cycle horizontal latency.
 4) Mixing odd and even processes: Can support both Core (even process) and Atom (odd process) cores with the same uncore die. This is particularly useful for the micro-server market segment where there are presently two different product lines and a lack of an integrated south complex with the large cores.
 5) Lego-like ability to incorporate big and little cores, graphics, FPGAs, customer designed accelerators and additional L3 slices: This provides an unprecedented flexibility to customize server processors at assembly time for specific OEM work-loads and compute requirements.
And probably the most curious bit:
 FIG. 7 is a block diagram contrasting the use of a software instruction converter to convert binary instructions in a source instruction set to binary instructions in a target instruction set according to embodiments of the invention. [...] Similarly, FIG. 7 shows the program in the high level language 702 may be compiled using an alternative instruction set compiler 708 to generate alternative instruction set binary code 710 that may be natively executed by a processor without at least one x86 instruction set core 714 (e.g., a processor with cores that execute the MIPS instruction set of MIPS Technologies of Sunnyvale, Calif. and/or that execute the ARM instruction set of ARM Holdings of Sunnyvale, Calif.). The instruction converter 712 is used to convert the x86 binary code 706 into code that may be natively executed by the processor without an x86 instruction set core 714. This converted code is not likely to be the same as the alternative instruction set binary code 710 because an instruction converter capable of this is difficult to make; however, the converted code will accomplish the general operation and be made up of instructions from the alternative instruction set. Thus, the instruction converter 712 represents software, firmware, hardware, or a combination thereof that, through emulation, simulation or any other process, allows a processor or other electronic device that does not have an x86 instruction set processor or core to execute the x86 binary code 706.
There are a number of possible implications here. The stated goal is to run x86 (i.e. Intel architecture) code on non-Intel architectures like ARM (owned by SoftBank (OTCPK:SFTBY)) and not the other way around. Is Cannonlake a processor-agnostic foundry platform that will be open to all comers? Or is Intel moving away from x86? Is Apple (NASDAQ:AAPL) involved? Does this have something to do with the FPGA provision? I'll leave this discussion as an exercise for the reader.
We're also seeing STT-MRAM used in lieu of SRAM as an option. They're showing a three-fold increase in density (9MB MRAM versus 3MB SRAM). The relevant discussion is here:
 As illustrated in FIG. 18, an additional embodiment may include an L3 cache 1801 implemented as a sea-of-cores in the latest process technology. In one embodiment, L3 cache slices are added in 2 MB increments using 6T SRAM or 6 MB increments using STTM. While this is an expensive option, high-performance computing users may be willing to pay extra for configurations beyond the L3 cache size supported on the basic uncore die. In addition to the extra 48 MB of L3 cache 1801 (for a total of 84 MB L3 cache per socket), this embodiment also includes 8 large cores 1802, and a customer-designed accelerator 1803 on a single bottom die 1804.
The whole document is peppered with references to high-performance computing which typically translates to the combination of data center and supercomputing. Will we ever see Cannonlake in the PC and mobile space?
Over at The Fool, Ashraf Eassa appears to have discovered that Coffee Lake is just a 14 nanometer version of Cannonlake. This would help to explain all of the head-scratching that is going on with respect to Intel's four-year dwell at 14 nanometers - they're moving to a radically new "Lego-like" 3D architecture and that is taking some time. In order to side-step the bull in all markets at once, they need to use all of their available fab capacity.
The key is that stacking transistors in three dimensions is a great way to put Moore's Law onto life support. The critical paths between transistors can get much shorter - so you can run them much faster (as described in paragraph 0096 in the patent). Coffee Lake is telling of this situation. It has drawn criticism for being the fourth 14 nanometer part in as many years.
But we know that Coffee Lake was only recently added to Intel's road map. Why? Because Cannon Lake's 3D design performs so well that it's able to breathe life into the older process. Intel will create an "uncore" base die for PC and mobile markets and slap their latest cores on top. Will there be a smartphone variant that runs on the battery-sipping Atom cores while on the road while morphing into a full-blown Core i7 when you plug it into your dock?
In bullfighting, the term "suerte de capote" translates into "act of the cape." That is, the bull is guided into attacking the matador's cape instead of the matador himself. I believe that this is what Intel has done. Now that AMD has caught up, they're likely to find themselves rushing through the matador's cape with lots of inertia. This applies to TSMC (NYSE:TSM), Global Foundries and Samsung (OTC:SSNLF) as well.
While Intel is taking a lot of heat with the continual delays, it appears that they are quietly investing in the security of their future. Although the company is touting that Coffee Lake will have ">15%" better performance than their current family of CPUs, that greater-than symbol might as well be a cape.
If there were partners to welcome into this secret, Microsoft (NASDAQ:MSFT) and Apple would be at the top of my lists (I'll leave my tinfoil hat aside for the time being). Could Coffee Lake reach levels of power efficiency that obsolesce desktop processors?
The good news is that Coffee Lake was pulled forward into 2H17. That's still an eternity in the face of AMD's margin-eating Ryzen. Intel is going to be in for a couple of rough quarters.
But that should be a good buying opportunity.
Disclosure: I am/we are long INTC.
Editor's Note: This article discusses one or more securities that do not trade on a major U.S. exchange. Please be aware of the risks associated with these stocks.