Scientific Computing
   Popular Searches:
lims, visualization, chemistry, statistics, hpc
HPC



SITE SPONSORS
Home > HPC > Hybrid-core Computing Punching through the power performance wall

Hybrid-core Computing: Punching through the power/performance wall
Squeezing more out of the same number of transistors is the name of the game
Tony M. Brewer 

Hybrid-core Computing Punching through the power performance wall
Figure 1: Hybrid-core computing is the technique of extending a commodity instruction set with application-specific instructions
The last decade has seen continuous improvements in cost-per-unit of performance of commodity processors, leading to their near-universal adoption by the high performance computing (HPC) community. But, in recent years, clock rates of commodity processors have flattened and performance-per-processor core has stagnated. Blame this condition on the laws of physics: as processor clock speed (thus power) increases, while die size remains roughly the same, the power/density ratio increases until no practical way exists to dissipate the heat.

How can we circumvent the laws of physics and break the power/performance wall? One common method is to leverage the venerable Moore’s law: use the billions of transistors now available on a processor die to add cores, increase the size of on-chip caches and devise clever ways of overlapping operations. But, by all accounts, effective programming for multi-core is difficult, and other miscellaneous changes only incrementally improve performance.

We’re left with the conclusion that the only solution is to increase performance pound-for-pound and watt-for-watt over what we’re currently getting out of the hardware in our data centers. In other words, find a creative way for a handful of transistors to get 10x, 100x or 1000x the performance of the equivalent number of transistors in a commodity processor.

Heterogeneous processing
One way to accomplish this is heterogeneous processing; employing specialized hardware that accelerates specific portions of an application. Examples include employing attached “array processors,” and — garnering attention today — the use of I/O cards containing graphics processing units (GPUs) as application accelerators. Another approach is to use reconfigurable hardware, generally field programmable gate arrays (FPGAs), to execute performance-critical portions of an application.

Such specialized hardware architectures can provide very high performance with much better power efficiency. However, heterogeneous systems are notoriously difficult to program, mostly due to the complexity involved in distributing and managing execution across multiple architectural models.

The next logical step is to combine application-specific hardware with a commodity instruction set in a single, integrated architecture. Software development in such an environment follows a well-established programming model (industry-standard programming languages and a shared view of memory), while providing the increased performance of application-specific hardware.

Hybrid-core computing
Hybrid-core computing extends a commodity instruction set (e.g. x86-64) with hardware-based, application-specific instructions to accelerate HPC applications. The key differentiating feature of hybrid-core computing (especially from I/O based accelerators) is the programming model, which supports multiple instruction sets in a single address space. The off-the-shelf processor executes normal instructions, and the coprocessor executes the application-specific instructions.

The host processor and coprocessor share the same cache-coherent view of virtual memory — i.e., the coprocessor is treated as if it were just another processor on the system bus. An application executable can contain both x86 and coprocessor instructions; a dispatch mechanism supports execution of coprocessor instructions on the coprocessor (Figure 1). Coprocessor instructions execute in the same address space and on the same data as x86 instructions.

Coprocessor instructions are grouped into sets, or “personalities,” that are loaded at runtime, allowing the system to present a customized set of instructions to each application. Instruction functionality is flexible; personalities can contain anything from instruction-level acceleration (e.g. emulating a standard vector processing programming model) to complete algorithms using a Multiple Instruction Multiple Data (MIMD) programming model.

For example, Convey has developed a series of personalities for several application areas, including specific algorithms in life sciences, oil and gas, and financial analytics. In addition, a Personality Development Kit (PDK) provides a toolset that allows users to develop their own personalities for specific applications.

Ideally, a hybrid-core system supports a unified development environment that generates executables containing both x86 and coprocessor code, allowing a single executable to utilize both elements. That allows hybrid executables to run directly on a commodity x86 server, exclusively taking the x86 code paths (obviously this would be much slower). When the coprocessor is present, compiler-generated profitability analysis code determines (at runtime) the optimal code path.

Putting the green in HPC
When used as nodes in a high performance computing cluster, a hybrid-core system delivers higher per-node performance, providing substantially better performance per watt than conventional clusters. Put another way, a hybrid-core server provides commensurate performance (as an equivalent commodity server) while using a lot less power, cooling and floor space.

For example, an early application of the Convey hybrid-core computer involved implementing a proteomics application developed by the computer science department at the University of California, San Diego (UCSD). The application, called InsPecT MSAlignment,1 involves finding close matches of mass spectrometer spectra (representing amino acid sequences in a sample) in a large database of known proteins. UCSD estimates they can replace eight racks of commodity servers with a single rack of Convey servers — reducing power requirements, air-conditioning and floor space by as much as 91 percent.

Hybrid-core computing is arguably the inevitable evolution of heterogeneous computing — and ultimately an inevitable path towards providing more performance with less power to the HPC community. Squeezing an order of magnitude or more performance out of the same number of transistors is the name of the game, and hybrid-core computing can do just that.

1. Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0404, USA.

Tony Brewer is Chief Technology Officer and Cofounder of Convey Computer. He may be reached at editor@ScientificComputing.com.

 


Scientific Computing
Rockaway NJ 07866

Email Article | Contact the Editor | Printer Friendly

Post to Del.icio.us | Digg This | Post to Slashdot
 










Bioscience Technology Chromatography Techniques Drug Discovery & Development Laboratory Equipment Pharmaceutical Processing R&D Scientific Computing
Advantage Business Media © 2010 Advantage Business Media
Privacy Policy | Terms & Conditions | Advertise with Us