by Steve Reinhardt
Organizations that depend heavily on high-performance computing (HPC) are salivating at the prospect of the performance potential of GPUs (general-purpose graphics processing units), which is often 10 or even 100X faster performance per chip, usually with less power consumption than mass-market x86 sockets. However, the path to widespread realization of this performance contains considerable obstacles. Advanced software can overcome these obstacles, though establishing a language environment that is both stable and highly productive in the GPU context will require investments that few HPC-dependent organizations may be expecting.
The pioneers of the stream processing technology behind GPUs understood the inefficiencies in the use of memory bandwidth caused by mapping typical HPC programs to general-purpose microprocessors, so they designed better memory interfaces and execution structures that often provide 10X and sometimes 100X higher performance for well-suited code. Unfortunately, the current state-of-the-art of GPU languages and compilers means that this new memory hierarchy is exposed to the programmer, who must explicitly arrange data appropriately to reap the potential higher speed.


