Ethernet Advances Allow Intensive Applications to Reap Full Benefits
Convergence of 10-gigabit Ethernet with low latency technologies is reducing latency and improving efficiency
In June of this year, the TOP500 organization once again released its biannual report on the top 500 supercomputers in the world. These new rankings mark the first time in
Cut one way, these numbers look very impressive and tell the story of a dominant technology being readily adapted to another type of application. However, if we cut the numbers another way we find that, despite the many HPC systems that use Ethernet for their MPI, as a whole they account for roughly 35 percent of the total Gflops. These numbers are at once proof of the commercial viability of Ethernet as a cost-effective interconnect for a wide range of HPC applications, and also of the need to improve the performance level of Ethernet before it can be accepted into the most intensive of scientific computing applications. The two major barriers to the adoption of Ethernet in this regard have been bandwidth and latency.
The question of bandwidth is being answered today by the emergence of 10-gigabit Ethernet (10GbE). Sales of 10GbE ports in 2005 climbed at an accelerated rate and prices dropped just as quickly. With the projected market topping $2 billion, 2006 will see 10GbE follow earlier Ethernet technologies into “critical mass,” where economies of scale and market competition make it a highly attractive interconnect option. Accordingly, 2006 could see the first TOP500-listed HPC cluster using 10GbE for its MPI.
There are HPC applications, however, for which latency is at least as important as available bandwidth. For these applications, Ethernet historically has not been a viable option. Where typical Ethernet has an end-to-end latency of greater than 80 microseconds (µs), specialty fabrics such as Myrinet and InfiniBand can achieve latency of less than 5 µs, and Quadrics can achieve 3 µs. So, despite Ethernet’s other advantages, latency-sensitive applications have required the selection of one of the other alternatives.
Today, however, dozens of companies are enhancing 10-gigabit Ethernet (largely in collaboration within standards bodies such as the Institute of Electrical and Electronics Engineers (IEEE), Internet Engineering Task Force, Remote Direct Memory Access (RDMA) Consortium, and others) to make it performance-competitive with the specialized alternatives. An emerging suite of technologies, including MX, RDMA, iWARP and hardware-based transmission control protocol (TCP) offload, are focused on reducing the latency and improving the overall efficiency of 10GbE endpoints.
This widespread and multi-pronged effort is another example of the thriving ecosystem that exists for Ethernet and services that run over Ethernet — unmatched by any other type of interconnect — that will provide multiple, viable low-latency alternatives for MPI applications. The nearly $500 million of venture capital funding estimated to be already dedicated to the development of 10GbE network interface cards (NIC) is strong evidence of this fact.
However, for latency-sensitive HPC applications, focusing on the endpoints is not enough. The time it takes data to pass through a network of switches can be a significant source of end-to-end latency. Traditional 10-gigabit Ethernet switches add as much as 3 to 5 µs of delay per switch for a normal packet. At first glance, this may not appear to be much, but in a large-scale HPC cluster with, for example, five hops between endpoints, the aggregate switch-induced latency could total up to 25 µs.
The switch latency issue is being addressed by 10GbE switch chip vendors who are releasing new chips for switches that can serve in even the most demanding of scientific
Use of these techniques has resulted in a dramatic drop in the latency of a 10GbE switch chip — a key component in interconnect latency — to around 200 nanoseconds. Putting this chip into the switches used in the five hop MPI fabric described earlier results in a vastly improved aggregate switch latency of less than 1 µs. Combined with RDMA- or TCP/IP offload engine-supported NICs the total end-to-end latency of this example would now be las low as 2.5 µs, down from the more than 80 µs of previous-generation Ethernet interconnects and placing it in the same arena as the specialty fabrics.
The strength of Ethernet has always been its adaptability. Now the convergence of 10-gigabit Ethernet with low latency technologies is making it possible for even the most intensive of HPC applications to reap the full benefits of the Ethernet ecosystem. For someone looking to deploy and manage a scalable HPC cluster, these benefits are compelling.
Beyond the initial benefits of falling equipment costs, however, Ethernet offers additional, less tangible benefits that are arguably more valuable. No other interconnect technology is as easy to install and scale over time as Ethernet. Part of this is due to the very nature of Ethernet as a protocol. However, the overall knowledgebase and comfort level of the average Ethernet user is also an important factor. Unlike other specialized interconnect technologies, the ubiquitous nature of Ethernet means that almost everyone has some level of understanding of how it works.
Finally, with Ethernet as the interconnect fabric, the possibility emerges of creating a homogeneous environment where input/output, MPI, storage and management communications all operate across the same Ethernet infrastructure. This “big tent” philosophy is becoming popular in the corporate data center where a gain in reliability and responsiveness, and a reduction in operational expenditure through simplified network management is often realized. Now, with the enhanced performance of Ethernet there is no reason why scientific computing deployments cannot realize the same type of benefits.
Uri Cummings is the Chief Technology Officer and co-founder of Fulcrum Microsystems. He may be reached at editor@ScientificComputing.com