Scientific Computing
   Popular Searches:
lims, visualization, chemistry, statistics, hpc
HPC



SITE SPONSORS
Home > HPC > Multicores and Manycores and GPGPUs, Oh My!

Multicores and Manycores and GPGPUs, Oh My!
A fresh look at alternative processor strategies
Steve Conway 

alternative processor strategies
To recap the problem: Most high performance computing applications, like PC applications, were originally written to run on one single-threaded processor. Thirty years of hardware-software innovations have enabled most codes to exploit only a modest number of parallel processing elements — IDC research found that 56 percent of HPC codes are no more than eight-way parallel and only six percent can run on more than 128 processor cores.

The bulk of sustained performance progress on real-world applications has come not from hardware parallelism, but from Moore’s Law — governed, generational jumps in single-threaded processor speeds. This evolutionary gravy train hit a heat-and-power wall half a dozen years ago. Since then, several processor strategies have emerged to help get the train moving forward again. This is important, because another IDC study showed that 12 percent of HPC sites have at least some codes that run more slowly on their newest HPC system than on the prior one, and 50 percent of the sites expect unprecedented retrograde performance like this on some codes within 12 months.

At the recent HPC User Forum meeting in Seattle, steering committee chairman Steve Finn of BAE Systems led a panel of experts from AMD, Army Research Laboratory, Cray, ET International, Intel, NVIDIA, and Pacific Northwest National Laboratory (PNNL) through a discussion of alternative processor strategies. The session, which took place in September 2010, was titled: “Heterogeneous Multicore, Manycore and GPGPU Computing in HPC.” (Note: The pros and cons in the descriptive bullets below represent the author’s views, not those of the panelists.)

• Multicore processors — the most conservative alternative, pack multiple industry-standard (typically x86) processor cores into each socket.
Pros
• more peak computing power per socket
• more parallel processing elements available
• for programmers, no departure from the single, established instruction set architecture
Cons
• continually multiplying the number of parallel processing elements needing to be managed
• per-core memory has not kept pace
• tuned-down single-core speeds throttle performance on poorly scaling codes.

• Heterogeneous multicores — employ more than one type of core in order to address multiple problem types on the same processor.

• Manycore processors — few of which exist yet, include so many cores per processor socket (tens of them or more) that additional networking may be needed on the processor.
Pros and cons — similar to those for multicore processors, only more so.

• General-purpose graphics processing units (GPGPUs) — implemented today as co-processors in CPU-based systems, provide a way to accelerate the data-level parallelism that resides in many HPC codes. In this sense, GPGPUs are like the vector processors that once dominated the HPC market, but much more cost-effective.
Pros
• substantial acceleration and compelling price/performance on the right codes and portions of codes
Cons
• still not easy to program, although improving
• moving data between the CPU and GPGPU adds time

Panel discussion highlights
No one objected when a panelist stressed that multi-threaded software is a must for any of these alternatives, adding that this is a new model for a large collection of legacy codes. Massive multi-threading is the way of the future.

Intel CPUs have a commanding revenue share of the global HPC market. The company’s Stephen Wheat noted that Intel has had hyper-threading, Intel’s multi-threading technology, for some time. Intel’s tick-tock processor roadmap now extends out to eight nanometer process technology in 2017 and beyond — meaning that in its more recent, multi-threaded incarnation, Moore’s Law is alive and well. Intel’s Many Integrated Core (MIC) technology is intended, as the name implies, as the company’s first general-purpose manycore architecture. It incorporates GPGPU-like capabilities from Intel’s scaled-back Larrabee project, is implemented as a co-processor, and shares the same standard Intel IA programming environment as the company’s CPUs. Next up in the MIC roadmap is the 50-core processor codenamed Knights Corner.

Michael Houston of AMD explained that people put a lot of work into codes for locality optimization on the GPGPU, but then don’t port the code back to the CPU. If they did, they would see that the GPGPU advantage shrinks, because this work also benefits the CPU. AMD makes both processor types, of course. AMD CPUs power six of the world’s top 20 supercomputers (www.top500.org), including number one; and some major HPC buyers have told IDC they are excited about the company’s GPGPU roadmap. Houston added that compiler technologies won’t come to the rescue of “dusty deck,” legacy codes.

PNNL’s Rob Farber reported that CUDA-based GPU computing is part of the curriculum at more than 200 universities around the world, including marquee names such as MIT, Harvard, Cambridge, Oxford, the Indian Institutes of Technology, National Taiwan University, and the Chinese Academy of Sciences. GPGPUs can provide major speedups if they’re given enough work to do. A 10x speedup can make computational workflows more interactive; 100x acceleration could fundamentally affect scientific research by removing time-to-discovery barriers; and there are even examples of 1000x speedups through the use of optimized transcendental functions and/or multiple GPUs.

Dave Wallace at Cray said the company expects fast changes in accelerator technologies and is restructuring MPI applications into hybrid versions that can be compiled efficiently for multicore nodes, with or without accelerators. The benefit: restructuring the application once, and then using high-level abstractions to make adjustments going forward. The Cray compiler aims to preserve the code base while supporting multicores, manycores and GPGPUs. The programming effort will be a key factor for heterogeneous systems. Reducing this effort will be at the expense of absolute performance, an acceptable trade-off. Automatic detection and exploitation of parallelism in codes is the Holy Grail.

NVIDIA is leading the charge today in the adoption of GPGPU technology for high performance computing. Stan Posey with NVIDIA explained that, for “dusty deck” codes, the use of pre-processors is necessary. NVIDIA is working on a project with Livermore Software Technology  (LSTC) to take LS-DYNA through a pre-processor to produce a code that will run in CUDA. Eventually, compilers can evolve to handle these types of constructs. For new codes, NVIDIA is now seeing independent software vendors (ISVs) adapting their codes for GPGPUs from the start.

ET International is an ISV specializing in software for advanced multicore platforms from IBM and others. The company’s CEO, Guang Gao, said he doubted the feasibility of truly scalable solutions for complex, heterogeneous codes. The HPC community must take a serious look at whether it needs a fundamentally different parallel execution model. It’s time to re-examine the entire software investment.

The panel discussion took place at a time when there is growing apprehension within the HPC community that many codes may need to be fundamentally rethought and rewritten in the next five years or so — to take better advantage of larger, more highly parallel, increasingly heterogeneous HPC systems. Naturally, users want to avoid rewrites if possible, or to rewrite only once if not. (Rewrites may be unavoidable in some cases, such as if the underlying science advances materially.) The alternative processor strategies, singly or in combination, could effectively postpone and in some instances obviate the need for rewriting codes.

Steve Conway is Research VP, HPC at International Data Corporation (IDC). He may be reached at editor@ScientificComputing.com.

 


Scientific Computing
Advantage Business Media
Rockaway NJ 07866

Email Article | Contact the Editor | Printer Friendly

Post to Del.icio.us | Digg This | Post to Slashdot
 






Bioscience Technology Chromatography Techniques Drug Discovery & Development Laboratory Equipment Pharmaceutical Processing R&D Scientific Computing
Advantage Business Media © 2012 Advantage Business Media
Privacy Policy | Terms & Conditions | Advertise with Us

Top Stories and Headlines
EVERY DAY!

FREE Email Newsletter