Feature - Inflated performance
The more things change, the more they stay the same. Since the advent of scientific computing, researchers have sought out ways to get more research done in less time and for less money. Proponents of new technologies have, in turn, claimed improvements in technology that may seem too good to be true. How, then, is a researcher to tell the difference?
Back in 1991, Berkeley Lab scientist David Bailey published an entertaining tongue-in-cheek paper in the now-defunct Supercomputing Review entitled, "Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers" (PDF). As with any paper discussing computer performance, there are some obvious aspects of the paper that are now out of date, such as the shift from megaflops to teraflops.
"Also, the focus of comparison at the time was vector supercomputers then being manufactured by Cray Research, Inc.," Bailey said during a correspondence with iSGTW. "Nowadays comparisons would more likely be made to vendor-supported highly parallel systems, such as from IBM and Cray."
According to Bailey, at present the issues he outlines in his paper arise most often in relation to graphics- or game-based processors, known as GPUs.
"Here, as before, vendors (and even some research scientists using the systems) are eyeing potential performance gains of 100X over conventional systems," Bailey said, "but in real-world tests this advantage is typically much more modest, say a factor of 5 or so (and that only after considerable reprogramming)."
Uday Bondhugula, Rajesh Bordawekar, and Ravi Rao of IBM have published two IBM research reports closely related to this issue.
"We had a long running application from the computational biology domain, and we were interested in accelerating it with the latest available hardware," explained Bondhugula. "Programming and optimizing well for GPGPUs requires a large amount of additional effort (several days or even weeks). Unless this effort pays off with a very good speedup, a programmer might not be willing to go the distance."
In the first, entitled "Believe It or Not! Multi-core CPUs can Match GPU Performance for a FLOP-intensive Application!," they compare the performance of an image processing application on a GPU (the nVidia GTX 285) and a multi-core CPU. Their second paper, entitled "Can CPUs match GPUs on Performance with Productivity?: Experiences with Optimizing a FLOP-intensive Application on CPUs and GPU", examined the same problem with the newly-released nVidia Fermi GPU.
Their conclusions? Given the development time to optimize for a GPU, multicore CPUs remain the more attractive solution for some applications. In fact, in terms of productivity and performance, they believe that multicore CPUs continue to hold the advantage over GPUs. (they make no mention of cost.)
At the 2010 SciDAC conference, Richard Vuduc of Georgia Tech gave a presentation closely related to these questions, entitled "On the limits of and opportunities for GPU acceleration" (PDF). He cites both Bailey and Bordawekar et. al., presenting a thorough analysis of a variety of tests comparing GPUs and CPUs.
"I don't believe that the speedups people report from their GPU ports are inflated," Vuduc said. "However, these speedups also don't magically occur just because you put a GPU into your system. Rather, they occur because you had to rewrite your code, explicitly parallelizing it, and probably tuning it."
Of course, the code structure that is ideal for a GPU is not the ideal code for a CPU, and vice versa.
"The interesting question is whether GPU designs are fundamentally better than conventional CPU designs," Vuduc added. "The answer is that it depends - sometimes it is and sometimes it isn't - but the focus on large speedups obscures our collective understanding of when it is or isn't and why."
-Miriam Boon, iSGTW