by Michael Feldman, Editor of HPCwire
Following my blog last week about the transition to GPU computing in HPC, I ran into a couple of items that cast the subject in a somewhat different light. One was a paper written by a team of computer science researchers at Georgia Tech titled “On the Limits of GPU Acceleration” (hat tip to NERSC’s John Shalf for bringing it to my attention.) The other item surfaced as a result of an Intel presentation on the relative merits of CPU and GPU architectures for throughput computing, titled “Debunking the 100X GPU vs. CPU Myth.” I think you can guess where this is going.
Turning first to the Georgia Tech paper, authors Richard Vuduc and four colleagues set out to compare CPU and GPU performance on three typical computations in scientific computing: iterative sparse linear solvers, sparse Cholesky factorization, and the fast multipole method. If you don’t know what those are, you can look them up later. Suffice to say that they are representitive of HPC-type algorithms that are neither completely regular, like dense matrix multiplication, or completely irregular, such as graph-intensive computations.
When it comes to the CPU vs. GPU performance wars, it pays to know who’s runnning the benchmarks — not only in relation to vendor loyalties, but also programming skills, software tools they used, etc. It’s also worth comparing like-to-like as far as processor generations. In this regard, I think the NVIDIA Fermi GPU should be used as sort of a ground floor for all future benchmarks. To my mind, it represents the first GPU that can really be called “general-purpose” without rolling your eyes.