The news late last year that China's GPU-rich Tianhe-1A supercomputer was ranked the fastest system in the world focused attention - and a lot of discussion - within the HPC community about the advantages of graphics processing units versus central processing units used in many systems.
Although GPUs were originally intended to be used as fast video game engines to process 3D functions, simulate movement and other mathematically intensive operations that might otherwise strain some CPUs, they are capable of more than making whiz-bang images or movies. Peter Varhol, a contributing editor for the online magazine Desktop Engineering, says GPUs are capable of performing high-end computations such as those used in engineering applications, in some instances as much as 20 times faster.
However, as Varhol wrote in an article for DE late last year, GPUs don't necessarily trump CPUs every time. In fact, when it comes to engineering-intensive applications, comparing CPUs with GPUs "is like comparing apples with oranges."
"The GPU remains a specialized processor, and its performance in graphics computation belies a host of difficulties to perform true general-purpose computing," Varhol wrote. "The processors themselves require rewriting any software; they have rudimentary programming tools, as well as limits in programming languages and features."
Moreover, most of today's GPU systems are based on a variety of proprietary software systems, while most CPUs used in high-end supercomputers are based on widely accepted industry standards.
"It's expensive to port software in general, and more so when the GPU standard is an evolving target," Varhol said when interviewed for this article. "It's problematic for software vendors to support multiple platforms unless there is a clear market need to do so. So it's almost a chicken and egg problem. It won't become a market need unless we support it, yet it's not cost-effective for us to do so unless the market has already arrived."
Thanks to ongoing support and development by some companies, notably NVIDIA and AMD, software vendors are seeing that it is in their interest to take the plunge, added Varhol. NVIDIA's (CUDA) parallel computing architecture, for example, has a single industry-standard processor, usually running Windows or Linux, and is based in industry-standard languages such as C and C++, with third-party compilers for Fortran and other standards available.
CPUs are designed to handle complexity well, while GPUs are designed to handle concurrency well."
-Axel Kohlmeyer, TeraGrid Campus Champion
Plus, NVIDIA and AMD recently announced plans to combine CPUs and GPUs in one chip. This architectural change will eliminate some of the primary architectural bottlenecks that now limit performance.
Within the TeraGrid, several systems are currently running GPUs: the Lincoln cluster at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign, Nautilus at the National Institute for Computational Sciences, TeraDRE at Purdue University, and the Longhorn and Spur systems at the Texas Advanced Computing Center at the University of Texas at Austin. There's also the Keeneland Project, developed under a partnership that includes the Georgia Institute of Technology, the University of Tennessee at Knoxville, and Oak Ridge National Laboratory.
"GPUs fit into a small class of new compute technologies that do not get overtaken by the evolutionary progress of CPU technology," says John Towns, director of persistent infrastructure at NCSA and TeraGrid Forum Chair. "They are more comparable to the advent of vector processors, message passing, or even commodity processor technologies applied to HPC, as opposed to technologies such as field programmable gate arrays and the like.
"Such technologies do not establish themselves overnight, however. Each of the successful technologies required a period of time - long in our community - for the development and maturation of the expertise and tools necessary to effectively harness the technology."
More! Better! Faster!
Indeed, GPUs are gaining popularity across many scientific domains: atmospheric modeling, high-energy physics, cosmology, quantum chemistry, molecular dynamics, and even drug design. One common mantra among researchers in just about every science domain is "more, better, faster!" and GPUs are seen as part of the solution to compressing compute times dramatically.
So does this mean that CPUs are becoming obsolete? Not really, because GPUs still require CPUs to access data from disk, or to exchange data between compute nodes in a multi-node cluster. CPUs are excellent for executing serial tasks and every application has those. And as more and more cores are combined on a single chip, CPUs are becoming parallel as well. Plus, GPUs are not right for every application.
"CPUs are designed to handle complexity well, while GPUs are designed to handle concurrency well," says Axel Kohlmeyer, associate director with the Institute for Computational Molecular Science at Temple University. Kohlmeyer, a TeraGrid Campus Champion, is working to make GPU codes faster for molecular dynamics applications. "To get good performance from a typical GPU, one needs to write code that parallelizes well across thousands of threads. So the challenge of writing good GPU code is to find sufficient concurrency, then dispatch that work to the GPU while handling other tasks more efficiently on the host processor."
"Essentially, if the effort has been made to port the code to GPUs then the performance improvement over CPU systems can be phenomenal," says Ross Walker, an assistant research professor with the San Diego Supercomputer Center at UC San Diego, another TeraGrid partner. Walker and his team at SDSC have been working with NVIDIA to analyze how GPUs can benefit research in the areas of biomolecules, biofuels, and flu viruses. His research involves high-speed simulations using AMBER, a widely used package of molecular simulation programs.
"GPUs are, for the first time, giving us the increases in capability we have been desperate for since the beginning of the multicore era," says Walker. "I'm confident that we will soon be achieving throughput with GPU-enabled AMBER that is at least an order of magnitude better than we could ever hope to achieve with CPU-based clusters."
Stay tuned for Part 2 of this article in next week's issue. This article originally appeared on the TeraGrid website.