Increased Speed Lowers Costs
Comparing network fabrics, performance and the cost of power

Comparing network fabrics, performance and the cost of power
Up until recently, when the cost of power was negligible versus the cost of the system, high-performance computing was all about speed — the fastest CPU, the fastest interconnect, the fastest storage. Today, with the increased cost to power and cooling, power costs are nearly similar to the system cost (four-year usage model). As a result, we have witnessed a paradigm shift in CPU technology, where the frequency was replaced with multi-core technology.

Instead of increasing the CPU frequency for higher performance, the frequency is being kept the same or being reduced, and more CPU cores are being added. Therefore, the power consumption of the CPU stays the same. On the storage side, we have seen the same power-aware development with the new solid-state drive technology that can provide higher performance with lower latency and lower power consumption, since there are no moving parts.

On the interconnect technology side, the race for higher speed is still on. Ethernet has reached 10 Gb/s and InfiniBand 40 Gb/s with a clear roadmap to 56 Gb/s (InfiniBand FDR) and 100 Gb/s (InfiniBand EDR). This article reviews how the increase of the interconnect speed actually reduces the power consumption per compute task, or simulation job.

Eclipse performance comparison
Figure 1: Eclipse performance comparison
Network topologies
Constant bi-sectional bandwidth (CBB) or fat tree networks have emerged as a key ingredient to deliver non-blocking, scalable bandwidth and lowest latency for high performance computing and other large-scale data center clusters. In addition to wiring the network to provide a physical CBB network topology, system architects also need to pay close attention to the underlying networking technology. The spanning tree algorithm required by Ethernet layer 2 switches (to solve network loops) is not able to exploit physical CBB fat tree topologies. Moving to expensive layer 3 switches solves the spanning tree-related scaling problem, but adds the overhead of additional layer 3 algorithms and store-and-forward switching.

InfiniBand, on the other hand, combines automatic configuration and flexibility of the forwarding algorithm, to take full advantage of the underlying CBB network. As such, by scaling deterministically, maintaining full wire speed across the cluster, and through the use of simple cut-through switching, InfiniBand helps keep costs down. This enables the building of systems with multiple layers of switches fairly easily and cost effectively, and reduces the number of network elements and the associated power consumption.

Eclipse power consumption per job
Figure 2: Eclipse power consumption per job
Commercial and open source applications
Discussing micro-benchmarks, such as latency and throughput, is good because it allows us to easily compare networks with simple easy-to-run benchmarks. However, there is nothing like real applications for comparing network fabrics, performance and the cost of power. We have compared multiple application performance and power cost per simulation job scenarios (power per job or power efficiency) using InfiniBand, 10 GigE and 1 GigE. The results show that faster simulations can actually save power for the same amount of computation. (More details on these results can be found in HPC Advisory Council publications available on the council’s Web site

• Case 1: Eclipse
Eclipse is a commercial product of Schlumberger and is an oil and gas reservoir simulation application that is extremely popular in the oil and gas industry. Figure 1 shows the performance for 1 GigE, 10 GigE and InfiniBand in elapsed time (seconds) as the number of nodes is varied, while Figure 2 shows the power consumption comparison between InfiniBand, 10 GigE and 1 GigE when four simulation jobs were run on the cluster (the comparison is power per job).

MPQC Performance Comparison for aug-cc-pVDZ benchmark
Figure 3: MPQC Performance Comparison for aug-cc-pVDZ benchmark
MPQC Performance Comparison for aug-cc-pVTZ benchmark
Figure 4: MPQC Performance Comparison for aug-cc-pVTZ benchmark
• Case 2: MPQC
MPQC is a Massively Parallel Quantum Chemistry program developed at Sandia National Laboratories. It computes properties of atoms and molecules from first principles using the time-independent Schrödinger equation. Figures 3 and 4 show MPQC performance comparisons for aug-cc-pVDZ benchmark. Figure 5 shows the power consumption comparison between InfiniBand, 10 GigE and 1 GigE when four simulation jobs were run on the cluster (the comparison is power per job).

We have reviewed the performance and power consumption per simulation job (in other words, productivity) for two cases — a commercial application and an open source application. Many more cases (nearly 50) can be found on the HPC Advisory Council Web site under the best practices section — These results are the outcome of the HPC|Works subgroup.

The main target of high-performance technologies and solutions is to reduce application or simulation runtime. With any research activity or product design and development, no matter if it is weather forecasting, automobiles and airplanes manufacturing, drug discovery or metal forming, thousands to even millions of compute simulations may be required. The higher the number of simulations per day a compute system can execute, the lower the power cost per simulation, and the faster the product can be designed and be available in the market.

Multicore CPU technology ensures an increase in CPU performance without increasing the power envelope and, the faster the interconnect, the more CPU cycles can be utilized (CPU efficiency) and the more simulations jobs can be performed. By providing the highest throughput and lowest latency, InfiniBand provides the best power/performance and power/efficiency ratio, lowering the power cost per simulation.

MPQC power consumption per job
Figure 5: MPQC power consumption per job

Other HPC clustering elements that help with reducing power consumption per simulations are GPUs, and we will be seeing an increase of this architecture coincide with the increase of applications than can take advantage of the GPU technology.

Gilad Shainer is Chairman of the HPC Advisory Council, Pak Lui is HPC Advisory Council Cluster Center Manager, Tong Liu is Director of the HPC Advisory Council China Center of Excellence, and Brian Sparks is the HPC Advisory Council Media Relations and Events Director. They may be reached at