Intel's (INTC) latest and greatest Haswell chip already looks outdated. While Haswell brought massive power efficiency improvements to the table, all while coupled with some seriously slick new instructions that double peak floating point throughput, the processor world ain't seen nothin' yet. Haswell will be replaced by a chip known as Skylake in 1H 2015, and from what I can already tell, it appears that it will be a performance monster.
How Do You Achieve More Performance (Per Watt)?
Traditionally speaking, there are a number of ways to improve the performance of a processor. These typically involve one (usually more) of the following:
- Increase The Clockspeed - the clock speed (or clock rate) of a processor tells you the frequency (cycles per second) the chip is running at. The higher the clockspeed, the more work can get done per second.
- Increase The Instructions Per Clock - whenever you see us tech nerds talk about "IPC," we are talking about "instructions per clock." Now, note, that a processor can execute thousands of different instructions and that each type of instruction can take different amounts of time to execute. When people say "IPC" they really mean "the average number of instructions executed per cycle in a representative workload." This is particularly hard to do, particularly when the primary constraint is power consumption.
- Increase Parallelism (i.e. "More Cores") - this is the most "straightforward" way to try to improve the raw performance of a chip: stick more processors onto the die. Of course, not all software can take advantage of multiple cores, and even then many parallel workloads still have serial code segments that bottleneck the whole thing. For consumer CPUs, the "optimal" range is 2-4 cores.
- Increase The Work Per Instruction - another way to improve performance is to add instructions that simply do more work. The upside is that, if implemented properly and if software is aware of the instructions, this can lead to a substantial performance benefit. The downside is that software developers aren't always so keen to go and optimize all of their software for the latest and greatest instruction sets right away.
The problem that CPU designers face each and every day in this world is to try to improve performance while improving the performance per watt ratio. Since the power envelopes for today's devices remains pretty much fixed, this really means "keep power consumption the same, improve performance" or "lower power consumption but keep performance the same." Typically speaking, it takes a move to a new process node to really see a dramatic improvement, but of course a move to a smaller process isn't enough - there needs to be a chip design to take advantage of all that the new process has to offer.
That's why Intel does its "tick-tock" methodology; in one year, it moves an existing micro-architecture to a new process node, buying a good chunk of performance/watt improvement by virtue of the better transistors, and then in the next it develops a brand new micro-architecture to fully capitalize on this process node. Rinse and repeat.
Skylake With AVX3.2
Intel's latest "tock" - known as Haswell - did two of the above to improve performance per watt; it does more work per clock than its predecessor, "Ivy Bridge" and it improves work per instruction (substantially in the case of floating point). Clock speeds have remained flat from "Ivy Bridge" and the core count stayed the same. The chip brought about massive battery life gains (to the point where people are now ditching their iPads for a shiny new Haswell-powered MacBook Airs) while at the same time bringing modest performance increases for legacy (i.e. not optimized for the new instructions) code and huge gains for optimized code (peak floating point throughput has doubled, integer vector widths have doubled). Haswell is pretty sweet, but it's old news. Say hello to Skylake - the "tock" that follows Broadwell:
The details on this core are still scarce, but we know the following:
- Built on the 14nm process (so that gives the chip designers a lot more room to play with on both power and transistor count)
- Utilizes next generation DDR4 memory
- Supports the next generation PCIe4 standard (the AMD (AMD) bulls/Nvidia (NVDA) bears that claim that Intel is killing PCIe look pretty foolish right now)
- Supports the AVX3.2 instruction set
Now, the last part is pretty important because we know that AVX3 brings yet another doubling of the peak floating point throughput for the processor cores:
This suggests that for anything involving floating point intensive calculations (think games, spreadsheets, 3D modeling, and even all of the perceptual computing stuff that Intel keeps touting), and assuming Intel's software developer relations team does its job right, Skylake should be a monster. Now, the interesting thing is that Intel is going to be packing this kind of processing power into Ultrabooks and high end tablets.
But more importantly, this type of advance helps to extend Intel's lead in the HPC space, where software is always compiled for the latest and greatest, and where floating point performance per watt is absolutely critical. Anybody who thinks that Intel has forgotten about the folks who need high performance is, once again, proven wrong. 16+ of these cores packed onto a single 130W+ chip for supercomputers and data centers will be absolute beasts that are sure to show up on the Top 500 list when the systems around the chips are out.
Intel is serious about high performance computing, and the fact that the company is - yet again - doubling the peak floating point performance of its high end processor cores is a testament to this. This is just yet one more datapoint that proves that Intel isn't the "dying" company that the media likes to portray it as, and Intel investors should take heart in knowing that the company's products are getting even more competitive and addressing even wider segments of the broader processing market with each generation.