Dirac Testbed Reveals How Applications are Written
Creating a platform to explore the range of GPUs in scientific computation

The Dirac testbed — a cluster of 48 NVIDIA Fermi GPUs at the National Energy Research Scientific Computing Center
The Dirac testbed — a cluster of 48 NVIDIA Fermi GPUs at the National Energy Research Scientific Computing Center — is being put through its paces to determine which scientific codes might benefit most from being adapted to run on graphics processing units. Courtesy of Margie Wylie, Lawrence Berkeley National Laboratory
Graphics processing units, or GPUs, may have been invented to power video games, but today these massively parallel devices are being pressed into high-performance computing, or HPC. With improving programming toolsets, commercial computer vendors have become more confident in selling GPU-accelerated systems but, in the world of science, GPUs are still almost as experimental as the problems they are expected to solve.

“There’s a lot of interest in GPU technology for high performance scientific computing,” says Katherine Yelick, associate laboratory director for computing sciences at the Lawrence Berkeley National Laboratory (Berkeley Lab).

She notes that GPUs can offer energy-efficient performance boosts to traditional processors, since they contain massive numbers of simple processors, which are more energy-efficient than a smaller number of larger processors. They also are available at reasonable cost, since they are already being mass-produced for video gaming. Moreover, GPUs are now used in some of the world’s fastest computers.

“The question is whether GPUs offer an effective solution for a broad scientific workload or for a more limited class of computations,” Yelick says.

That’s why the National Energy Research Scientific Computing Center (NERSC), in collaboration with the computational research division at the Berkeley Lab, launched a general-purpose GPU computing testbed called Dirac in April. Named in honor of Paul A.M. Dirac, the 1933 Nobel laureate in physics, the 48-node Dirac cluster is being put through its paces by NERSC users. Paul Hargrove, a computer scientist at Berkeley Lab, purchased the system with funds from a Department of Energy (DoE) program designed to give researchers access to advanced computer architectures.

“The DoE offered funds to buy a system to study how our community writes its applications, in contrast to the typical NERSC system that is intended primarily for running them,” says Hargrove. “With this goal in mind, a GPU cluster was an obvious choice to offer NERSC’s users access to a technology that is positioned to change how many HPC applications are written.”

Each of Dirac’s 48 nodes is composed of two Intel 5530 2.4 gigahertz chips that include eight megabytes of cache and 24 gigabytes of memory. Each node also includes an NVIDIA Tesla GPU. Four Dirac nodes have one Tesla C1060 GPU attached, which includes four gigabytes of memory and 240 parallel processor cores, and the other 44 nodes have one Tesla C2050 (Fermi) GPU, which includes three gigabytes of memory and 448 parallel CUDA processor cores.

This system was installed to allow users to explore the applicability of GPUs to scientific simulations and to various data-visualization problems. The system is not just used for programming individual GPUs, but for scaling codes on a GPU cluster. It also gives users experience with the current set of GPU programming languages, such as CUDA and OpenCL, often in combination with a cluster-programming library like MPI.

Early applications
Despite Dirac’s relatively recent launch, about 100 users already take advantage of the testbed. Yelick points out that there are about 500 different applications used throughout NERSC, so it represents a very broad spectrum of scientific codes.

“We thought it would be best to make this GPU system available to users and then see what their experience was with it,” she explains.

According to Yelick, some scientific areas are very computationally intensive and seem to have the most potential to gain benefits from porting them to GPUs. In fact, two postdoctoral researchers at Berkeley Lab have both made extensive use of the GPUs in Dirac.

“I am currently working on accelerating various computational chemistry codes using the Dirac GPU cluster at NERSC,” says Jihan Kim, a postdoctoral research at NERSC. “Given that a GPU can execute thousands of parallel threads concurrently, we can potentially obtain significant speedups over the same application code optimized for a CPU. This kind of performance boost is exciting for chemists who extensively use numerical simulations to model large molecular systems.”

Meanwhile, another NERSC postdoctoral researcher, Filipe Maia, is using Dirac to solve partial differential equations and perform x-ray tomographic imaging and diffraction imaging. “The imaging applications make extensive use of large fast Fourier transforms, which are particularly well-suited to GPU due to their regularity. Using GPUs can provide large increases in performance in many applications, which is often of crucial importance to test a wide range of conditions,” explains Maia. “Unfortunately, this comes at the cost of having to rewrite the application for a many-core architecture.”

Dealing with the details
Going from a CPU-based system to one that includes GPUs requires some modified thinking by computer scientists. To test how an application can be improved, a researcher cannot simply take code written for a CPU and run it on a GPU. Such an approach would not take advantage of the parallel power that GPUs offer. Instead, porting an application to a GPU requires reworking the code, such as deciding which parts to run on a GPU and then figuring out how to best modify the code. For example, running an application efficiently on a GPU often requires keeping the data near the device to reduce the computing time taken up with moving data from the CPU or memory to the GPU. Also, a programmer must decide how to thread results from the GPU back into the CPU program. This is not a trivial process. So, the Dirac Web site provides users with some tips when trying to port applications to this system.

When using CUDA, for example, the site advises: “NVIDIA’s CUDA parallel programming model allows programmers to write data-parallel applications for GPUs at the ‘kernel’ level by specifying what operations take place on an individual data element.” It also gives CUDA users code for an error-catching wrapper that can be used when debugging CUDA programs.

The Dirac Web site also gives users information about queues and policies. For example, the site states: “Interactive access is intended for code development and short runs. To run interactive jobs on Dirac, users must request compute node resources.” Then, the site provides code that lets users make such a request.

Expanding opportunities
As researchers gain experience using Dirac, this testbed should prove useful to a wider range of applications. Nevertheless, some applications might be more difficult to port to GPUs than others. This includes areas of research that use commercial codes or community codes, where users don’t have control over the software.

For example, Yelick notes that climate modelers often use the Community Climate System Model (CCSM), which is maintained by the National Center for Atmospheric Research (NCAR). “This would be very difficult to move to a GPU-based system. Not only is the code itself difficult to modify, but because there is large group of people that work on it, the committee decides what goes in next,” Yelick says.

“Here at NERSC, some parts of our workload may run well on GPUs and some on more traditional processors. Determining which applications are well-suited for GPUs is the reason for building the testbed,” Yelick says. “The energy efficiency and performance gains that you get from using GPUs are the reasons we need to push forward with this research.”

Mike May is a freelance writer for science and technology based in Houston, TX. He may be reached at