by Greg Pfister
In simple speed, the newest Cell’s clock rate is actually noticeably faster than expected for Larrabee. Cell has shipped for years at 3.2 GHz; the more recent PowerXCell version uses newer fabrication technology to lower power (heat), not to increase speed. Public Larrabee estimates say that when it ships (late 2009 or 2010) it will be somewhere around 2 GHz., so in that sense Cell is about 1.25X faster than Larrabee (both are in-order, both count FLOPS double by having a multiply-add).
Larrabee is “faster” only because it contains much more stuff – many more transistors – to do more things at once than Cell does. This is true at two different levels. First, it has more processors: Cell has 8, while Larrabee at least 16 and may go up to 48. Second, while both Cell and Larrabee gain speed by lining up several numbers and operating on all of them at the same time (SIMD), Larrabee lines up more numbers at once than Cell: The GFLOPS numbers above assume Larrabee does 16 operations at once (512-bit vector registers), but Cell does only four operations at once (128-bit vector registers). To get maximum performance on both of them, you have to line up that many numbers at once. Unless you do, performance goes down proportionally.
This means that to match today’s and several years’ ago Cell performance, next year’s Larrabee would have to not just emulate it, but extract more parallelism than is directly expressed in the program being emulated. It has to find more things to do at once than were there to begin with.


