Preview The 8-core battle at the high end By Nebojsa Novakovic
Monday, 7 September 2009, 13:23
LAST WEEK we covered some details about the upcoming IBM POWER7 processor, which is expected to be the second shipping 8-core general purpose server CPU after Intel's Nehalem-EX.
And no, Sun's Niagara with its ultralight cores is not a general purpose CPU, so it doesn't count.
Just like Intel's ultra high-end server offering, POWER7, IBM's flagship CPU for 2010, is a huge die, large cache monster, immensely powerful on its own yet capable of being very well connected to many of its siblings to compose very large, well scaled multiprocessor systems.
How do these two processors compare? Well, both are 45nm process behemoths with 8 cores per die, each with out-of-order execution and some degree of internal multithreading. The Nehalem-EX is expected to have 8 cores with 2 threads each, running at anywhere between 2.66GHz and 3GHz at launch in the next 4 months, while the POWER7 will have 8 cores with 4 threads each, running at up to 4GHz at launch sometime in mid-2010. So, POWER7 should be faster and more powerful from the raw hardware resources point of view, but at the cost of being half a year later to market.
Looking at each core, the Nehalem-EX core can process up to 4 instructions - some simple, some complex - per cycle, and 4 floating-point (FP) operations per cycle. Not bad at all for what is the most powerful X86 core in business today. POWER7 can do up to 6 simple instructions per cycle, and up to 8 FP operations per cycle if running 4 fused multiply-adds. Again, the raw power of the POWER7 core is somewhat higher. But then, so was the POWER6, yet it fared badly in benchmarks.
The caches? Both are really cache-rich, so to say. Nehalem-EX's 8 cores have a shared pool of 24MB L3 SRAM cache with a fast kilobit-wide ringbus between the different cache segments to speed up access. On the other hand, POWER7 has 32MB of L3 eDRAM cache for its 8 cores. In either case, each processor core has its private low-latency 256KB L2 cache too.
How about memory? Nehalem-EX has 4 buffered DDR3 channels per chip, where, using on-board buffers, every channel splits into two actual 64-bit DDR3-1333 DRAM paths. If the buffers had the abilities like FBD AMB (Advanced Memory Buffer) chips, you might be able to do simultaneous read and write transactions on each channel, effectively doubling the bandwidth. Either way, you're looking at some 50GBps of memory bandwidth per CPU chip, not bad at all.
In the case of POWER7, though, there are two 4-channel DDR3 memory controllers, for a total of 8 channels of memory and a claimed 100GBps total memory bandwidth. Now, this definitely cannot fit into the rumoured common G34 socket with AMD's Magny-Cours or Bulldozer CPUs, as those only have 4 memory channels.
Neither would POWER7's proprietary multipath 360GBps (yes, GigaBytes not gigabits) connections to neighbouring CPUs, up to 32 of them, fit into the nearly 4 times slower 4-channel HyperTransport 3 setup on the AMD G34 socket. The Nehalem-EX 4-channel QPI interconnect, if running at 6.4GTps, would give you above 100GBps bandwidth to the other 4 neigbouring CPUs - yes, also three times slower than the POWER7, but still far from slow in reality. Also, the Nehalem-EX's symmetrical north-south-east-west QPI arrangement can scale to hundreds of sockets without extra glue logic. Look at the SGI - sorry, Rackable - UtraViolet and such systems coming soon.
Now, last but not least, the instruction set architecture, probably the most important point. POWER7 continues on the old POWER ISA architecture path, including the PowerPC-specific Altivec extensions that were in the POWER6. While PowerMac is no more, IBM still has sizable markets in mainframes, minicomputers and of course servers and clusters for the new CPU.
On the other hand, Nehalem-EX is, simply, 64-bit X86. A straight win there, whether you like the X86 or not. Everything runs, all the vendors have to use it, and there'll be a myriad of support chipsets, peripherals, software, drivers, apps, and of course every operating system out there, minus AIX and VMS, I guess. You'll even have dual-processor extreme workstations, some overclockable, with the dual "Beckton" Nehalem-EX CPUs for 16-core Skulltrail-followon monsters to appease gamers' wet dreams and engineers complex visulisations. Just like their server counterparts, many of these will be easily upgradeable to the expected "Eagleton" 12-core 32nm chips with 36MB cache a year or so later. Unfortunately, I don't think that we'll ever see a POWER7 workstation.
Why not? Well, I think workstations are important to enable access to a given architecture to as many developers as possible, resulting in more optimised and tuned code, and of course more apps at the end. Whatever raw performance gains POWER7 has, there will always be more effort put into X86 chip code tuning and optimisation.
Finally, the price. It's too early to talk about POWER7 prices, but, if the current trends are anything to watch, expect a Nehalem-EX to be at least 3 times cheaper than the POWER7 per total system CPU unit. I won't be surprised to see an even larger price differential.
That's all for now. As more details emerge, look for more coverage here. µ
- absolute raw performance - CPU, memory, I/O
- immense scalability within the 32 socket limit
- committed large vendor behind despite a mostly single-platform environment (Power Linux didn't take off as expected).
- it is the fastest X86 chip at launch, and it is X86 so everything runs, workstation or server
- near-limitless scalability without custom wizardry, most of it easy to reach even with Windows
- much cheaper and comes out half a year earlier.