As my readers are quite well aware, I'm a big fan of Apple's (AAPL) silicon development team. These folks are first to market with an ARM64 based chip, and quite frankly, that's nothing short of impressive. However, the Street is now turning its focus on a potential Intel (INTC) versus Apple debate with respect to low power CPU design prowess, using the benchmark "Geekbench 3" (which is really the only cross-platform, low-level benchmark available) as validation of some pretty bold claims.
It is my belief that the test results that have been published (and talked about even in my own article) with respect to the "Silvermont" part are neither comparable nor representative of what the chip, in a properly optimized software environment, is capable of. Further, it is my belief that the actual test itself, while certainly helpful for trying to get a feel for how certain processors behave, may not be entirely representative of CPU performance in any meaningfully complex workload.
Silvermont Verusus Cyclone: Apples And Oranges?
Performance in a given piece of code isn't just dependent on "how good" the processor is, but also on how well the software is optimized for a given processor architecture. The quality of the code generated by the compiler can make a night-and-day difference on the performance of a given chip in a given software environment.
Anyway, so the benchmark that is being used to compare Silvermont and Apple's Cyclone is known as "Geekbench 3". This is a benchmark that essentially runs a bunch of very small sub-tests and then spits out a score that many in the mobile world have been using as the be-all, end-all of processor performance benchmarks. I'll get to in just a moment why this is likely flawed, but the really interesting tid-bit is that only Real World Tech's David Kanter (seriously, if you care at all about semiconductrs and chips you need to follow his work and his Twitter feed) even bothered to ask seems to have asked which compilers were being used for which platforms.
Well, Mr. Kanter did, and here's what we know:
For the iOS, Primate Labs uses Apple's Xcode (this is Apple's software development suite) for iOS and OS X, GCC ("GNU Compiler Collection")4.8 on Android, and Visual Studio 2012 for Windows. For those of you unfamiliar with these, GCC is a compiler that has been ported to a wide variety of processor architectures, meaning everything from ARM, MIPS and X86-64, POWER, SPARC and more (the list can be found here).
Now, what's interesting is that Intel actually develops its own software development tools, which includes its very own compiler. While the GCC and Visual Studio 2012 compilers are excellent and can generate code for a wide variety of processor architectures, Intel's compilers are specifically designed and optimized to generate the best code for Intel's own chips - very much in the same fashion that Apple's compiler very likely generates the very best code for its own processor architectures.
To give you an idea of how big the difference can be on the same processor when a better, more targeted compiler is used, here are some Windows and Linux SPECint_base2006 results (the SPEC CPU benchmarks are the gold-standard in the industry for processor-oriented benchmarks) comparing Microsoft's (MSFT) Visual Studio 2012 and Intel's C++ compilers:
In a well-known benchmark, a more suitable compiler can lead to performance gains of anywhere from 32% to a whopping 145%.
GCC Isn't Optimized For Silvermont Yet
At the Intel Developer Forum, during a briefing on the Silvermont core, Silvermont chief architect, Belli Kuttanna noted that Intel's own compiler ("ICC") is optimized for the Silvermont microarchitecture today, and the upcoming GCC 4.9 also included Silvermont optimizations. The current GCC 4.8 (with which Geekbench is compiled for Android) is not Silvermont-optimized. Now, keep in mind that every processor is different from both an instruction set extension standpoint, as well as from an actual microarchitectural standpoint. Types of code generate that'll run really well on, say, a Cortex A15 may not run so well on a Silvermont or vice versa.
So when we look at the Geekbench 3 results and we see the code for the Apple chip being compiled by a compiler written by the same company that designed the chip, while the binary running on the Intel chip being compiled by what is likely a less-optimized compiler (that isn't really aware of the Silvermont core yet from an optimization standpoint), things start to look pretty non-comparable for those trying to make absolute proclamations of the relative performance levels of these CPU cores.
Comparing Intel's "Sandy Bridge" to Apple's "Cyclone" In Geekbench Subtests
Something that I thought would be interesting to look at would be the 64-bit Apple Cyclone Geekbench results that Anandtech got against an Intel "Sandy Bridge" down-clocked to a constant 1.6GHz to get a feel for how Apple's "desktop class" architecture stacks up to an actual desktop class architecture. The Apple Cyclone results are from Anandtech, and the Sandy Bridge results were run on an i7 2600K downclocked to 1.6GHz (I couldn't get it to run any lower) I present single thread results, as the i7 has twice as many cores (and each core is multi-threaded), which wouldn't exactly be a fair result.
|Subtest||Intel "Sandy Bridge" @ 1.6GHz||Apple "Cyclone" @ 1.3GHz|
|AES||1200 MB/s||846.2 MB/s|
|Twofish||62.3 MB/s||55.6 MB/s|
|SHA1||162.5 MB/s||477.3 MB/s|
|SHA2||71.0 MB/s||102.2 MB/s|
|BZip2 Compress||5.61 MB/s||4.52 MB/s|
|BZip2 Decompress||7.51 MB/s||7.56 MB/s|
|JPEG Compress||20.5 MPixels/s||16.8 MPixels/s|
|JPEG Decompress||52.1 MPixels/s||40.3 MPixels/s|
|PNG Compress||1.04 MPixels/s||1.14 MPixels/s|
|PNG Decompress||15.8 MPixels/s||15.2 MPixels/s|
|Sobel||53.5 MPixels/s||58.0 MPixels/s|
|Lua||1.15 MB/s||1.33 MB/s|
|Dijkstra||4.51 Mpairs/s||4.05 Mpairs/s|
Floating Point Results
|Subtest||Intel "Sandy Bridge" @ 1.6GHz||Apple "Cyclone" @ 1.3GHz|
|BlackScholes||6.50 Mnodes/sec||5.92 MNodes/s|
|Mandelbrot||1.24 GFlops||929.9 MFLOPS|
|Sharpen Filter||837.2 MFLOPS||857 MFLOPS|
|Blur Filter||1.07 GFLOPS||1.26 GFLOPS|
|SGEMM||4.42 GLOPS||3.34 GLOPS|
|DGEMM||2.30 GFLOPS||1.66 GLOPS|
|SFFT||1.33 GFLOPS||1.59 GFLOPS|
|DFFT||1.07 GFLOPS||1.47 GFLOPS|
|N-Body||641.8 KPairs/s||582.6 KPairs/s|
|Ray Trace||2.02 Mpixels/s||2.31 MPixels/s|
I have to say that even in a test that may or may not be representative of "real", more complex CPU workloads (and given that Apple's own compiler can do a much better job optimizing for its own chip than a generic compiler for the Intel chip), I am incredibly impressed by what the Apple team has done. They've effectively eschewed the nonsensical "core count" race and instead have built what looks to be a "Core-level" CPU core at low frequency for a smartphone (remember: scaling to 3x the frequency, which is what the Intel "Core" chips can do, is completely non-trivial). Of course, performance is a combination of frequency and performance per clock (and I'd bet that it'd be very difficult to scale "Cyclone" up frequency wise without a major redesign and some compromises along the way), but the point is that Apple designed an excellent smartphone part that maximizes the "user experience". More per-core performance, rather than "moar coars".
Good Job Apple, But Intel's Job Is Still Harder
Apple's CPU core is very impressive, and on a per-clock basis, in Geekbench 3, is very competitive with Intel's "Sandy Bridge" CPU core. I'd like to see how the Intel chips (both "Silvermont" and the bigger cores) perform when the Intel platform has the benefit of an optimized compiler (as Apple's chip does), and I would also like to see what kind of power consumption Apple's Cyclone core gets at full CPU load, but this is impressive on Apple's part.
Now, before you think that this is an ultra-bearish note on Intel, do remember that Apple's entire CPU R&D efforts focus on a single CPU core, optimized for one particular device, with a fairly fixed software ecosystem. While Apple has built one SoC with "A7" based on the Apple "Cyclone" core, off-the-shelf GPU IP, and probably a bunch of its own other IP sprinkled throughout, think about what Intel (and the other merchant vendors) need to do.
First off, Intel builds 2 CPU cores: a "small" core (Atom) that goes into many different SoCs. Here's the list of the different SoCs that Intel has built just around the "Silvermont" core:
- Merrifield - this is a 2 Silvermont core + Imagination Tech Series 6 SoC intended for smartphones
- Bay Trail-T - 4 Silvermont core + Intel Gen 7 GPU SoC intended for tablets
- Bay Trail-M/D - Bay Trail-T with integrated Ethernet, SATA, and other PC I/O
- Avoton - 8 core micro-server oriented chip (with integrated ethernet, SATA, USB, etc.)
- Rangeley - Avoton with a built-in cryptography engine
- Bay Trail-I - a part for in-vehicle infotainment (unknown specifications at this time)
Then, of course, here are all of the chips built around the "Ivy Bridge" and "Haswell" cores coming this year:
- Haswell-DT - 2-4 core part that comes packed with GT2 graphics
- Haswell-ULT - 2 core part with either GT2 or GT3 graphics, as well as an on-package PCH
- Haswell + Crytalwell - this is a notebook oriented part with 4 CPU cores, GT3 graphics, and an on-package, Intel-designed eDRAM
- Ivy Bridge-EP (6 core variant) - this is a server oriented chip that uses Intel's "Ivy Bridge" CPU core. New memory controller (4 channel), 15MB of L3 cache
- Ivy Bridge-EP (10 core variant) - another server oriented chip using "Ivy Bridge". Shares the same memory controller as the 6-core variant, but includes more L3 cache
- Ivy Bridge-EP (12 core variant) - yet another server chip, but this time it's a 12-core variant with 30MB of L3 cache. The difference here is that in order to keep the cores better fed, it splits the memory controller into two parts, and the die itself features a more complex ring interconnect structure
- Ivy Bridge-EX (15 core variant) - this is a 15 core version of Ivy Bridge for servers, but it supports both DDR3 and DDR4 controllers, and needs to be validated for use in 8 socket systems.
Now, on top of all of these SKUs for all of these end markets (that all have to be validated rigorously for use cases that are often much stricter than phones/tablets), the company needs to engage and support a wide variety of customers with a wide variety of needs (product, cost, support, etc.). On top of all of this, Intel not only designs these chips, but it actually manufactures them on a leading edge manufacturing process (that it develops entirely in house). Let's also not forget that Intel does its own packaging and test in-house, something that the vast majority of semiconductor companies outsource to third parties such as Amkor (AMKR) or Advanced Semiconductor Engineering (ASX).
Also, did I mention that Intel has to support a wide-variety of operating systems including Microsoft's Windows, many flavors of Linux, Google's (GOOG) Android, and so on, across consumer, embedded, and datacenter environments?
So, yes, Apple designed a great, low power CPU core and system-on-chip for its iOS devices, but to compare it to a merchant vendor like Intel from both technical and business perspectives is a bit naïve.
So, What's The Bottom Line?
The bottom line is simple: Apple's new processor/SoC is a very impressive part for smartphones, and the benefit of owning the entire software and hardware ecosystem from soup to nuts is pretty evident here in that the transition to 64-bit will probably be seamless. It is important, however, to not get carried away. Apple products sell because they're Apple products, so I'm not sure that comparing an in-house chip optimized for a high margin device compiled with Apple's own compiler, with Intel's chips is really going to yield anything scientifically conclusive.
On the business side of things, we can conclude the following:
- Apple Will Not Switch To Intel Designs In The iPad/iPhone - I think it's clear that Silvermont and Cyclone are in the same performance league (although nothing is conclusive about power consumption at full load for Cyclone - my guess is that the Intel part is more frugal on power by virtue of process technology and more sophisticated power management), and given the pace of enhancements in both camps, it is unlikely that Intel will put out something so incredibly ahead of what Apple's team can do in the low power space that Apple will feel compelled to "switch".
- MacBook Air And Above Still Likely Not To Shift To Internal Design - While per-core, per clock performance of the Apple Cyclone is comparable to "Sandy Bridge" in Geekbench 3.0, Apple's design is probably targeted for low operating frequency, while Intel's Core chips can scale to ~3x the clock speed of an A7 in max turbo. Further, Intel's process lead is much more evident in the "big core" space than it is in the low power space, so it'd be really difficult for Apple to compete there on performance/watt in either CPU or graphics