Brad Gibson is a galactic archeologist, mining the fossil chemistry record of millions of stars to discover the origin of the Milky Way galaxy.
Nina Dethlefs is teaching computers to talk and listen to people.
Michael Fagan is discovering why bones have the shape they do and how they develop and change with age.
These and other researchers at the University of Hull in Yorkshire, England, are salivating at the opportunity to sink their teeth, as it were, into Viper, the first university-wide High-Performance Computing (HPC) cluster. Based on elements of the Intel Scalable System Framework—the latest Intel Xeon E5-2680v4 (Broadwell) processors and Intel Omni-Path Architecture—Viper gives Hull the opportunity to make its mark in academic research, says Graeme Murphy, head of research and enterprise ICT services.
“Hull’s compute systems up to now have been smaller-scale departmental installations, but now we are expecting the University’s new HPC to be the highest-performance machine of any northern university and be in the top 10 of university HPCs nationally,” Murphy says. “No other university in the UK currently has the technology we are getting here in Hull. We will be unique, and this is a huge step-change for us.”
Murphy says that researchers were spending time running computers rather than researching, and there was a lot of duplicated effort. “There was also a real sense that we were missing out on opportunities with respect to research grants and recruiting talent,” Murphy says. “Academics were turning up at Hull expecting to see HPC resources. We wanted to make a mark with our research, and our vision is to be acknowledged as a leader in the HPC field.”
Adds Dr. David Benoit, senior lecturer in physical chemistry at the university, “The sense of scale was missing; you couldn’t scale out your models easily, couldn’t connect them well. This limited the progress we could make in our research.”
“A machine way beyond our expectations.”
To create a coherent institutional HPC strategy and the machine to go with it, the University of Hull put out a public tender for an HPC system, with a £1 million (US$1.5 million) hardware budget.
“We wanted vendors to give us as much performance for that £1 million as possible,” Murphy says. “We thought our budget would buy about half a petabyte of storage and 1,000 to 1,500 cores, but the machine we got was way beyond our expectations.”
The “way beyond” bid came from ClusterVision, a leading HPC systems provider in Europe. ClusterVision has designed and built some of the fastest and most complex computational and database clusters on the continent, including many TOP500 systems.
Viper (see specs in adjacent sidebar) is one of the world’s first HPC systems that uses the Intel Omni-Path Architecture (Intel OPA) high-speed interconnect. Intel OPA is a next-generation fabric with the ability to scale to tens of thousands of nodes—and eventually more—at a price competitive with current-generation fabrics. The Intel OPA 100 Series product line includes PCIe* adapters, silicon, switches, cables, and management software. As the successor to Intel True Scale Fabric, Intel OPA is built on a combination of enhanced intellectual property and Intel technology. ThinkParQ GmbH, the Fraunhofer HPC spin-off behind the parallel file system BeeGFS, recently certified BeeGFS over Intel OPA.
“We didn’t specify the interconnect and other Viper components, but a key aspect of our HPC strategy is that we didn’t want our first system to be a one-off,” Murphy says. “We wanted it to be first of many HPC systems and be permanently supported. What we liked about the ClusterVision design and use of the Intel Scalable System Framework was that it gave us a sense of longevity. We know that we have at least three years with the Broadwell chips, and with Intel OPA we’re getting a really fast interconnect. We’re already talking about Viper 2, and that’s a message our researchers haven’t heard before. We’re at the start of a journey versus the end.”
Viper by the numbers
Viper is based on the Linux* operating system and is composed of approximately 5,500 processing cores with the following specialized areas:
- 180 compute nodes, each with 2x 14-core Intel Xeon processors E5-2680v4 (2.4 –3.3 GHz), 128 GB DDR4 RAM
- 4 high memory nodes, each with 4x 10-core Intel Xeon processors E5-4620v3 (2.0 GHz), 1TB DDR4 RAM
- 4 GPU nodes, each identical to compute nodes with the addition of 4x Nvidia Tesla K40m GPUs per node
- 2 Visualisation nodes, each identical to compute nodes with the addition of 2x Nvidia GeForce 980Ti GPUs per node and the ability to perform remote logins.
- Intel Omni-Path Architecture interconnect (100 Gbps)
- 500 TB BeeGFS parallel file system
- UPS and generator power failover
Brad Gibson, director of the E.A. Milne Centre for Astrophysics in the department of physics & mathematics at the University of Hull, has pioneered the use of computational fluid dynamics married to chemical element evolution as a means of isolating and studying the complex physics underpinning the formation of galaxies such as the Milky Way. Mining the fossil record of billions of stars throughout our galaxy is one of the most audacious experiments in science today, involving an array of space and ground-based telescopes and instruments.
“Viper gives us the opportunity to make a step change in our field, letting us potentially tackle Grand Challenge problems on a scale that would be all but impossible to astrophysicists limited to (typically) sub-million core-hour problems,” Gibson says. His team employs a blend of particle- and grid-based approaches to hydrodynamics coupled within a gravitational N-body framework, allowing researchers to explore problems varying on scales of the origin of the chemical elements within the central nuclear furnaces of stars, to the shaping of galaxies and clusters throughout the cosmos.
“Our codes demand a fast and efficient interconnect with low latency, with ready access to significant memory,” Gibson says. “Viper, with its Intel Omni-Path interconnect, is an exciting new venture for us, building on our extensive track record in optimizing parallel codes for distributed facilities.”
In relation to the smaller scales of the team’s science goals, supernovae from massive stars are the first to contribute to the chemical evolution of stars and galaxies, and produce elements fundamental for life to develop, including oxygen and iron. Due to their short lifetimes, they are responsible for the earliest chemical fingerprints observed in old stars, a field referred to as galactic archeology, as such stars bear the fossil record of the ashes of the very first, primordial, stellar generations.
Over the next five years, staff and students within the E.A. Milne Centre aim to produce the first suite of supernova nucleosynthetic yields from three-dimensional simulations of stars, something never before undertaken beyond simple one-dimensional stellar models. To generate a set of 50 models, spanning a range of initial masses and metallicities, current codes deployed on contemporary systems would take upwards of ~100 Mcore-hrs. The Milne Centre team, led by Dr. Marco Pignatari, is currently optimizing its nucleosynthesis code and anticipates a factor-of-five speedup in performance.
“Given the anticipated lifecycle of Viper, we anticipate amortizing our ~20 Mcore-hr needs over about four years,” Gibson says. “Such an investment of time would be exceedingly difficult to secure on other regional HPC resources, but Viper will provide the opportunity to revolutionize the field of nuclear astrophysics through the provision of a database of chemical yields, which will support the entire community for the next 10 to 15 years.”
Professor Michael Fagan, head of medical engineering and professor of medical and biological engineering at the University of Hull, is looking forward to using Viper to gain new insights into bone behavior.
Bones are difficult to model accurately because their geometries are so complex. Biological structures such as bones not only have complex internal and external geometries but variable material properties and experience complex and varying applied loads, for example from muscular activity. Experimental measurements can be extremely difficult, are often not possible in humans, and are ethically undesirable in animals. Computational modeling is a great alternative with many advantages over experimental methods.
Fagan and team have developed voxel-based finite element software, VOX-FE, for analyzing very large-scale, high-resolution models of bone and other biological structures. Voxel-based modeling allows Fagan to go directly from microCT or synchrotron scan data to finite element models with minimal loss of data and geometry simplifications. It also allows inclusion of local variation in material properties based on each voxel’s grey scale value. It is these microscopic-level geometry and material property variations that explain much of the complex macroscopic behavior of bone.
Commercial finite element software is typically limited to models of less than 10 million elements, but using VOX-FE, Fagan’s team has solved models with more than 200 million elements, and is working towards models with billions of elements. This step change in model size together with an adaptive remodeling capability, means the number of potential applications and opportunities for VOX-FE is enormous, but so are the computer resources required, especially for the most complex, adaptive models.
Viper provides those opportunities and offers the prospect of some very exciting science. These include high-resolution modeling of whole bones to understand normal and pathological bone biomechanics, for conditions such as osteoporosis; growth and development of bones; and in silico design and testing of new dental and orthopedic implants. The software will also be used in predictive biology and virtual experimentation to reduce animal experiments, and in understanding the biomechanics of living and extinct animals in general, from insects to dinosaurs.
Dr. David Benoit, senior lecturer in physical chemistry, is focused on material science, particularly interfaces and how molecules stick to surfaces. He and his colleagues’ research probes three main areas: molecule–surface interactions, pH effects on biological molecules, and new materials for gas adsorption. All three topics help to answer questions relating to surface adhesion—as an example, the impact of ocean acidifications on marine life and carbon capture.
His team actively develops hybrid approaches to potential energy landscape exploration and vibrational structure determination, which are tailored to use both grid computing and HPC. Benoit has pioneered the area of accurate vibrational predictions for adsorbed systems and showed that understanding the vibrational behavior of molecules adsorbed on surfaces is key to understanding adhesion. Indeed, the subtle change in the oscillations of molecules that occurs when they bind to a surface is a telltale sign of the causes of sticking.
Benoit’s team’s work on biological molecules concentrates on conformations, solvent effects, and accurate computation of nuclear magnetic resonance spectra. It also investigates molecular capture mechanisms in new materials using advanced reaction path sampling and high-end 3-D visualization techniques. “Low latency and efficient dispatching of large amount of data are paramount to accelerating our research, and having access to technology like Viper can truly revolutionize the way we do our research,” Benoit says.
BeeGFS + Intel OPA + Trinity + Docker = great performance
Virtual machines (VMs) are the normal mechanism for building isolated computing environments in a cluster, but open-stack has no physical provisioning tools for cloud environments. Also, VMs have a performance impact.
ClusterVision gets around this impedance and tooling mismatch with Trinity, an open source cluster manager. Trinity uses Docker containers, rather than VMs, to achieve workload isolation without the performance impact that VMs impose. Viper is configured with main and test virtual clusters so that software development and other small tests can side-step the main queue if necessary, while ensuring that users can scale up their jobs without reconfiguring the containers.
Using Trinity in combination with the BeeGFS file system and Intel OPA running in Docker containers, ClusterVision achieves performance of 9 GB/sec writes and 8.4 GB/sec reads.
Workload isolation can also deliver security benefits. Several researchers can use the same machine without stepping on one another’s toes. This is important from a stability and a performance perspective. Without containers, an error in one user’s job could bring down the entire system. Or one user’s CPU-intensive job could steal cores from other jobs. A containerized environment prevents this and keeps everyone’s jobs separate.
With Viper, Benoit and team will be able to improve the accuracy of their models and explore larger systems while simultaneously reaching longer simulation timescales. “With access to the latest Intel processors, we’ll have the opportunity to devise and test novel computational techniques that can harness the power of emerging technologies such as accelerator-based HPC”, Benoit says.
Dr. Nina Dethlefs is studying computational language learning—computers that can understand and produce human language and ultimately talk to people to assist in everyday tasks, such as querying the web, making a restaurant reservation, or just having a social conversation.
Her team uses deep learning and neural networks to learn language models of syntax and lexis, which require a great deal of data to work well. “We’ve trained small models with relative success, but language is such a complex system that we have not been able to scale up,” Dethlefs says. “That’s where Viper comes in. Our first priority is to scale up and train larger models that can talk about more than one thing at a time. With more processing power, our models will work better, and we’ll have more real-world impact in deploying conversational computer systems.”
One area where Dethlefs particularly hopes to make progress in is assistive technology. Her group has been working on conversational systems for dementia patients. These systems provide cognitive stimulation therapy—a non-pharmacological intervention that has been shown to slow the progression of the disease by offering quizzes and memory activities through conversation. For this to work well, however, patients need to use the system regularly and continuously, which they will only want to do if it is fun to use and can talk about many things in a socially engaging manner. Viper will allow researchers to do this by giving them enough processing power to train large models of language that know a little bit about everything that people may want to talk about and have the capability to learn from conversations.
More immersive visualization
Dr. Helen Wright, Senior Lecturer and SimVis Group Leader in Computer Science, is interested in computational steering, virtual reality, and visualization. Computational steering allows scientists to do “what if” investigations by running their simulations online at the same time as they visualize the results. A program’s parameters are usually input via sliders and dials on a separate control panel, but Wright’s team has developed a way to interact directly with the simulation using the image of its output. This is a difficult problem, because the information content of data reduces as it is transformed into an image, ultimately becoming just pixels on a screen. Solving this problem is important for visualizing results in immersive setups such as the Hull Immersive Visualization Environment (HIVE).
Computational steering and visualization will be among the immediate beneficiaries of Viper. “We will be using Viper to steer larger and more complex simulations than we can at present, for example feeding into our new disaster-planning application,” Wright says. “In HIVE, we already present models and scenes to users in an immersive and interactive way using head-tracking, which introduces an immediate, real-time constraint to the visualization. Very large geometries introduce lag (the cause of so-called ‘simulator sickness’), so the prospect of also offloading the rendering step to our new supercomputer could herald a step-change in our visual environments capability.”
Evangelize and grow
With Hull’s HPC program in its infancy, there are fewer than 50 researchers and PhD students using Viper today. But that’s going to change fast.
“We’re working hard to expose Viper to more users,” Murphy says. “Just the other day I had a conversation with someone from the drama department who was interested in using HPC in research on the psychology of dance and music. I’m hearing ideas I never would have imagined; people are coming out of the woodwork. We’re eager to expand our HPC community at the same time as we are building up our experience and capability. This is undoubtedly the most exciting time I’ve experienced at the university."
Jane Glasser is a high-tech writer based in Portland, OR.