HPC Power and Cooling Heat Up
Until recently, advances in HPC hardware took center stage, but now strategies for coping with escalating power and cooling requirements are in the spotlight
In Stephen Leacock’s nonsense story,1 “Gertrude theGoverness,” the hero, in extremis, “… flung himself upon his horse and rode madly off in all directions.” A fitting description for the state of power and cooling in today’s high performance computing (HPC) industry.
Researchers and engineers at companies, government agencies and educational institutions worldwide are exploring a wide variety of solutions to problems posed by petascale systems that require enormous quantities of energy and produce voluminous amounts of heat. Exascale, still a shimmering mirage on the far horizon, but gradually coming into focus, is adding impetus to their efforts. Power and cooling solutions under investigation range from geothermal-powered data centers in Iceland to exotica, such as carbon nanotubes or supercool superconducting circuits.
In the realm of cooling, it’s déjà vu all over again. Introduced in 1976, that iconic machine, the Cray 1, was liquid cooled using Freon. But, as the HPC industry progressed, less expensive solutions featuring air cooling became the norm. Data centers sprouted A/C units including chillers, ducts, heat-exchangers and noisy fans. Thomas Sterling, Professor, School of Informatics and Computing at Indiana University, may have experienced the ultimate in HPC-generated decibels. A while back, he visited Moscow State University’s massive supercomputer, which featured high density packing and relied on air cooling. “Loudest machine I have ever experienced,” he recalls.
Today, liquid cooling is having a resurgence as hardware densities escalate. Cray, for example, offers both liquid and air-cooled versions of its new XC30 system and CS300 HPC cluster. Sterling points to a recent single-rack, liquid-cooled system from RSC in Russia, which has achieved a peak performance of one petaflop. And in Japan, NEC has developed a phase-change approach in which coolant is evaporated to disperse heat more quickly.
There are innumerable other approaches to liquid cooling underway, some of them involving cryogenics. But the poster child for the practical application of warm water liquid cooling has to be the Department of Energy’s National Renewable Energy Laboratory (NREL) in Golden, CO. (See related story on page 10 of this issue.)
HP and Intel are supplying NREL’s petascale HPC system. And HP has developed a new component-level liquid cooling system that reflects the company’s design philosophy.
“What we are providing is a fully-integrated solution,” explains Nicolas Dube, Distinguished Technologist at HP. The idea, he says, is to go beyond just cooling the supercomputer’s processors and memory to all other system components, such as the power supplies, voltage regulators, network interconnect silicon, etcetera.
Lose the Chiller
By embracing warm-water cooling, HP is able to get rid of that standby of air cooled HPC data centers — the chiller.
Chillers and associated computer room air conditioning have become essential for cooling highly concentrated HPC clusters. But they come with a price — not only are they expensive, but they are energy hogs, consuming huge amounts of electricity and requiring their own dedicated power infrastructure.
As of last summer, NREL’s data center has been consuming just over one megawatt of power. But it’s in the winter that this system shines. Water to the servers is supplied at 75 degrees Fahrenheit and returned at 100 degrees Fahrenheit.
Dube notes that HP and NREL are looking beyond achieving a favorable power usage effectiveness (PUE) — the metric that measures data center power user efficiency. “What I’m pushing for is ERE — energy reuse efficiency,” he says. “The heated water from the supercomputer’s cooling system acts as a furnace, heating the lab and offices and is even being channeled under walkways to melt snow in the Colorado winter.”
And speaking of power, at SC13 in Denver this last November, the Green500 published its list of power-efficient systems, dominated yet again, to no one’s surprise, by heterogeneous supercomputers sporting Intel CPUs combined with NVIDIA GPUs.
Taking top honors was the TSUBAME-KFC. Developed at the Tokyo Institute of Technology, the supercomputer, powered by a combination of Intel Ivy Bridge Processors and NVIDIA Kepler GPUs, claimed an efficiency of 4.5 gigaflops/watt. The Japanese supercomputer features a unique oil cooling system.
Just a few years ago, recalls Steve Keckler, Senior Director of Architecture Research at NVIDIA, energy efficiency was not a critical design criteria. For example, when designing the company’s Fermi class GPUs, energy considerations were important but not a top priority. “The jump (in energy efficiency) from Fermi to our Kepler class systems was monumental,” says Keckler, noting that the 10 top supercomputers on the Green500 list use Kepler GPUs.
He claims that throughput-oriented computing is the way to go for energy efficiency, as opposed to the more traditional approaches using high-power, high-performance CPUs.
Designed for a small number of threads or jobs done very quickly, CPUs have a high energy overhead. From NVIDIA’s perspective, important applications that demand true high performance are parallel in nature and are amenable to more throughput-oriented systems designed to execute thousands or millions of tasks.
At NREL, architecture plays a key role in energy efficiency. In this case, the HPC system is using HP servers based on Intel Xeon processors, including the 22 nm Ivy Bridge architecture and Intel Xeon Phi coprocessors. Says Stephen Wheat, general manager of HPC at Intel, “As the optimized and tuned application is run in production, the achieved performance per watt on both Xeon Phi and Xeon processors has allowed achieving the results with the lowest energy use.”
Madly Moving in All Directions
Of course, the push to achieve exascale computing in the not-too-distant future is motivating a lot of very smart people to devise radical new power and cooling alternatives. The power budget of 50 megawatts for an exascale system built on today’s technology platforms is hardly acceptable. Unless, of course, you can afford your own small nuclear power plant or build your data center in locations where power is virtually unlimited — e.g., Niagara Falls, the Columbia River Gorge or the Bay of Fundy. And then there’s Reykjavik, where you not only have volcanoes spewing out geothermal power, but plenty of ice to boot.
But, according to Keckler, there is no one big silver bullet that will allow us to efficiently power and cool today’s and tomorrow’s HPC systems. Rather, it will be a collection of technological advances, some mundane, some quite startling, including some Black Swans — those impossible to predict events that have major repercussions.
For example, full server oil immersion is making inroads — Intel, among others, has been experimenting with the technology for several years. Intel also has been looking into near-threshold voltage circuit design.
Three-dimensional architectures include fin FETS, a 15-year-old technology that allows you make smaller, more densely packed, energy efficient transistors, which is being revitalized by semiconductor manufacturers. In another approach, called heterogeneous three-dimensional (3-D) integration, multiple wafers, stacked vertically, have the potential to consume less power and provide higher performance than current two-dimensional chips.
Lawrence Berkeley National Laboratory, among others, has been researching the use of carbon nanotubes for microprocessor cooling applications.
Microsoft, Amazon, Google and others are deploying highly energy-efficient modular computer containers to support their cloud services efforts.
Superconducting circuits, chilled to a few degrees above absolute zero, have the potential to be not only faster, but far more energy efficient than conventional semiconductors.
The list goes on. With the emphasis on green computing and the need to navigate the rocky road to exascale, some of these solutions will take hold and evolve, others will die on the vine and, most interestingly, mysterious Black Swans are bound to arise.
Says Thomas Sterling, “It may be in the future that we will see other completely alien forms of computing, which may operate with a completely different relationship between effective computation and energy and power consumption.”
It should be an interesting decade, one in which power and cooling, in all their myriad forms, take center stage in the HPC data centers of the world.
Gertrude the Governess: or, Simple Seventeen: http://www.online-literature.com/stephen-leacock/nonsense-novels/5/
John Kirkley, President, Kirkley Communications, is a writer and editor specializing in HPC. He may be reached at editor@ScientificComputing.com.