by Tony M. Brewer
The last decade has seen continuous improvements in cost-per-unit of performance of commodity processors, leading to their near-universal adoption by the high performance computing (HPC) community. But, in recent years, clock rates of commodity processors have flattened and performance-per-processor core has stagnated. Blame this condition on the laws of physics: as processor clock speed (thus power) increases, while die size remains roughly the same, the power/density ratio increases until no practical way exists to dissipate the heat.
How can we circumvent the laws of physics and break the power/performance wall? One common method is to leverage the venerable Moore’s law: use the billions of transistors now available on a processor die to add cores, increase the size of on-chip caches and devise clever ways of overlapping operations. But, by all accounts, effective programming for multi-core is difficult, and other miscellaneous changes only incrementally improve performance.
We’re left with the conclusion that the only solution is to increase performance pound-for-pound and watt-for-watt over what we’re currently getting out of the hardware in our data centers. In other words, find a creative way for a handful of transistors to get 10x, 100x or 1000x the performance of the equivalent number of transistors in a commodity processor.


