Two Megawatts of Computing Power in a Ford Fiesta?
Using immersion cooling to reach the next level of power density and efficiency

 Two Megawatts of Computing Power in a Ford Fiesta? 
Progress in leadership-class computing is being hindered by the limitations of conventional air cooling technology. Multicore chip architectures, faster memory and increases in parallelism have meant an increase in the amount of computational power that must be devoted to communication. While evolving technologies such as 3-D packaging, low-loss materials and improved Z-axis and optical interconnect will play an important role in increasing off-chip and inter-node bandwidth, decreasing signal path length through increased packaging density remains a tried-and-true strategy.

However, the physical proximity of devices, particularly at the node and inter-node level, is limited by the spaces required for heatsink attachment and airflow. Air cooling constrains facility power density as well. If built with conventional air cooling technology, some proposed exascale facilities will be the size of small cities. At these scales, the power consumed by air cooling equipment is more than an unpalatable operating cost, it can affect overall project feasibility.

It is generally recognized that liquid cooling can dramatically increase energy efficiency and power density. The power density in the Power7 IH supercomputer has reached an unprecedented 250 kW per oversize rack with water cooling. In Zurich, researchers have achieved tight thermal coupling between CPU junction and the water in an Aquastar system1 that allows the water coolant temperature to rise to 60°C making it more economical to reject waste heat and more practical to capture and utilize it. 

However, this level of performance is not without costs. Cold plates, manifolds, pumps, hoses, heat exchangers, couplings and other components add cost and take precious space within the compute nodes. Controlling the flow and mitigating the loss of coolant through this myriad of components within a node, rack or facility is an engineering challenge exacerbated by the number and variety of devices on each “hot swappable” node.

One need only look inside one of these machines to see why liquid cooling has been relegated to the world of supercomputers. The cost and complexity barriers, even in large-scale production, are simply too high for much of the HPC industry to bear.

Cooling by immersion in a dielectric liquid is, arguably, one of the more elegant ways to capture all of the heat generated by a complex electronic assembly, because much of the aforementioned hardware can be eliminated. Companies like Green Revolution Cooling, Hardcore Computer and Iceotope are making immersion systems based on liquid phase mineral oil or fluorochemical fluids. These systems are certainly more quiet and efficient than their air-cooled equivalents, but the efficiency and density are ultimately limited by the need for extended surfaces (finned heat sinks) on devices and the inferior liquid phase transport properties common to dielectric coolants.

Two Megawatts figure 1
Figure 1: Experiment with 4-way SLI with NVIDIA GF-100 GPUs
Enter 2-phase immersion 
If an immersion working fluid is sufficiently volatile, it will change phase from liquid to vapor and gather the device heat much more efficiently. How efficiently? In recently published work,  researchers demonstrated a junction-to-fluid thermal resistance of 0.076°C/W while passively removing 240 Watts from four overclocked NVIDIA GF-100 GPUs (Figure 1). A 100µm-thick metallic boiling enhancement coating (BEC) had been soldered atop the lids (CPU in Figure 2). The inexpensive BEC ensures a sub-3°C temperature difference between the lid surface and the boiling fluid. By incorporating it onto a slightly thicker lid at manufacture and using solder in place of thermal grease, as is common practice for CPUs, a resistance of 0.050°C/W should be attainable without impacting the package assembly process. This means that an 85°C junction temperature could be maintained on a 200 W device with a fluid that is boiling (and condensing) at 75°C. The heat could then be transferred efficiently by condensation to facility water entering at 60°C and leaving at 70°C.

Two Megawatts figure 2
Figure 2: CPU without and with the shown BEC soldered atop the lid
Very High Power Density
So, what about power density? In recent experiments conducted by 3M, a 17x20 cm PCB was populated with 20 200W CPU simulators comprised of ceramic heaters with an enhanced lid bonded to them. This 4,000 W board was run in a small tank with a 4 mm gap between the lids and the adjacent wall. The total system volume was one liter, and it used only 200 cc of fluid (same cost as a 1U heat sink). At 4 kW/liter, one could fit 2.4 MW of computing power into the passenger compartment of a Ford Fiesta.

So, why hasn’t this been done before? Previous computing systems that used passive 2-phase immersion used “direct die” techniques, meaning that the fluid contacted the silicon directly. Unfortunately, the heat flux that can be achieved by direct boiling of dielectric coolants is limited to about 20 W/cm2, well below the operating heat flux of most CMOS devices. Designers migrated to complex active techniques like spray cooling that enable heat fluxes well over 100W/cm2. Unfortunately, the resultant heat transfer coefficients are quite low. The spray impacting the chips in the Cray X1 supercomputer, for example, produced heat transfer coefficients of about 1W/cm2-K. This would mean a thermal resistance from chip-to-fluid of 0.19°C/W for the 23x23 mm NVIDIA GPU, well above the 0.076°C/W junction-to-fluid value measured with passive boiling.

Two Megawatts figure 3
Figure 3: Open bath immersion cooling concept
Open bath immersion
How might passive 2-phase immersion technology be implemented for commodity HPC equipment? The traditional approach used in many high voltage inverters is clearly not viable. Those systems contain the electronics in sealed pressure vessels with hermetic electrical connections. Creating such an enclosure for hot-swappable computer hardware might be more challenging and expensive than water cooling. However, immersion cooling can be applied; some believe, in a much simpler way to produce systems that are more dense, less expensive and more efficient than any other liquid-cooled system. This new twist on immersion cooling, which was detailed in several recent publications,  has been termed “open bath immersion.”

With open bath immersion, servers would be immersed side-by-side in modular baths of a volatile dielectric fluid (Figure 3). The baths remain closed when access is not needed, but are allowed to “breathe” so they operate at atmospheric pressure — a technique that really only works for computing equipment that operates at a near-steady power load. It requires no specialized hermetic electrical connections. Instead, nodes plug into a submerged backplane fed by a simple conduit beneath the liquid level that exits the top of the tank. The vapor generated by boiling rises to a warm water-cooled condenser integrated into the tank. Condensed vapor simply falls back to the bath. Servers can be hot swapped without disturbing their neighbors by simply removing them from the bath. Like PCBs cleaned in a vapor degreaser, they exit the bath dry, resulting in minimal and easily quantified fluid losses.

Among the advantages of OBI compared to more traditional liquid cooling schemes is the fact that all server- and most rack-level cooling hardware are eliminated, along with considerations relating to their integration, reliability and power consumption. All devices are kept at the same temperature. The fluids being proposed are not flammable nor even combustible. In fact, they are already sold globally as halon replacements, so fire protection is intrinsic to the technology.

Historically, power density has been constrained by cooling technology. In a strange twist, the reverse is now true. Indeed, the true merits of the technology only can be demonstrated with very high-density hardware. Nevertheless, small-scale demonstrations are being built using conventional air-cooled hardware to validate fluid loss models and demonstrate the energy efficiency merits of this elegant technology.

1. IBM Zurich’s Aquasar system

Phil Tuma is an Advanced Application Development Specialist in the Electronics Markets Materials division of 3M Company. He may be reached