Supercomputing: The Reality and Vision
To most of the world, “supercomputing” is what it means to the user
The door to the research group tea-room swung open and Rebecca marched in. She smiled at the team members already there and announced joyfully, “the review committee approved the business case — we can buy our supercomputer!” One person cheered, another grinned widely. Most simply grunted “good,” “about time,” “well done” or similar minimal shows of enthusiasm — not truly believing they had been successful yet. This was a natural symptom of a year-long multi-hurdle battle to get senior management to accept that a 30x faster computing system would deliver real benefits.
Some months later, Robert proudly stood back from the new computer system — “it’s ready” he declared. But no one was there to listen. In the side room off the research corridor, the group’s new supercomputer was now ready for the team to use. Thirty times more powerful than their previous central computing system, it meant that the bigger model runs could now be done in hours rather than several days. And they could now solve much bigger models than ever before.
The researchers now could look forward to weeks of getting their codes ported onto the new supercomputer, and learning how long different model sizes take to run, etcetera. And those undertaking simulations to support customer projects were much more confident of getting the results out on time — and even having time to perform quality checks through additional runs!
Of course, there would be some steep learning curves to negotiate too — each core of the new supercomputer was clocked only a few percent faster than in the old system. Most of the performance increase came from the vector math units and the fact there were lots of cores — nearly 400 in total now.
Yes, 400 cores can be a supercomputer. And it can be life-changing to those who see it as 30x performance from their first computing facility upgrade for several years.
This story may seem a little “small” to many attendees at ISC’13 and other big supercomputing conferences. But, I venture to suggest it is much closer to home for the majority of high performance computing (HPC) users worldwide than the talk of exascale, new programming paradigms, novel many-core processors, and so on, that apparently dominates the HPC community conversation.
HPC vs. Supercomputing?
Indeed, when it comes to defining HPC, many like to distinguish between HPC and supercomputing — lots of people might do HPC, but only the few biggest systems in the world are “supercomputers.” (Note that the group of “biggest” is always defined in such a way as to include one’s own supercomputer.)
This, politely speaking, is twaddle! Supercomputing is what it means to the user. If it is “super” to them — e.g. a qualitative and quantitative change in their ability to do modelling as a result of increased computing capability — then it is supercomputing.
We are proud and lucky to have a very active international HPC community including users, providers of hardware, software and expertise, and technology researchers. However, I think many of the users of HPC, especially those outside of the Top500 arena, would not naturally identify themselves as being part of the HPC world.
So, does the conversation at conferences like ISC’13 matter to these “normal” HPC users? Well, yes, I think it does.
Having an exascale computer may be so far in the future for those normal HPC users that it is up there with the end of the Sun in their planning priorities. But, honestly, the same is probably true of petascale computers (even though those at the top consider petascale to be ordinary now). And yet, the technology those normal users are employing for their research group supercomputer is essentially the same as those in the petascale computers. This includes processors, memory hierarchies, software stacks, programming methods, etcetera.
Relevance of Exascale
So, it is reasonable to conclude that the technologies of the first few dozen exascale computers will be the same as those used to build the “normal” research group supercomputers of that time. Sure, the first one or two exascale machines might be special beasts. But, I’m willing to bet the next 20 exascale machines will be essentially the same technology and techniques as the rest of the HPC facilities at smaller scales.
Thus, the obsession we seem to have at the moment with many-core processors, such as GPUs and Xeon Phi, and the inevitable programing challenges are very relevant to the near future of those normal HPC users. It is even possible that some of those normal HPC systems are already using a few GPUs or Xeon Phi cards to accelerate specific workloads. But, even if not in their current systems, successfully exploiting high levels of parallelism on a single node will almost certainly be a feature of their next HPC system — and, thus, within their planning horizon for application development, skill sets, and so on.
Likewise, the rest of the exascale discussion — data movement, programming standards, efficiency of resource usage, etcetera — will soon be relevant too. We are probably a little ahead of the normal HPC users with many of these. But, once the first exascale machines come into sight, these topics will be pertinent to the research group systems too — because the core technologies used to build the supercomputers of any given era will be the same whether that machine is research-group scale or the world’s biggest.
There are some possible exceptions to this — for example, one of the toughest probable exascale challenges will be affordable resiliency (or fault tolerance). This might only affect the largest systems, as many of the system faults, data corruptions, etcetera, emerge from the fact of scale.
HPC Expertise in the normal user world
However, it is worth highlighting another characteristic of the normal HPC user world. Remember proud Robert who got the fictitious group’s new supercomputer ready? No one else was there to observe or help. That is often the reality for the majority of normal supercomputer facilities — only one person looks after the computer systems. It may be a researcher who does the system management as a side job, or a dedicated computer support person who looks after everything from the group’s laptops, to the networks, to the supercomputer configuration. The users normally do the programming themselves, or use off-the-shelf packages (whether open source or commercial) from outside sources.
The key thing is that there is usually no dedicated HPC expertise person within the group. This is the rare skill set that mixes good enough understanding of the many parts of the ecosystem — hardware architecture, system configuration, software tools, algorithms, parallel software engineering and science needs — to ensure the supercomputer facility delivers the maximum science and engineering output. The reality for most HPC users is that they have to find these skills themselves either by self-education or by finding a colleague who can help. For those who cannot find the skills on their own, HPC expertise providers, such as NAG, can help.
Note that none of my discussion makes any assumptions about whether these normal HPC users are in academia, government labs or industrial research and development. I don’t think it matters — most of the issues are the same. This includes the budget limitations, the hard work to get management agreement to invest in capability (people, software and hardware), the scale being so far removed from the Top20 supercomputers, the relevance of the exascale and many-core discussions, and the purpose of HPC itself.
In summary, to most of the world, “supercomputing” is what it means to the user, not what it means to the measurement of peak FLOPS or Top500 position. The same is true of the biggest supercomputers, too. What really matters is ensuring they fulfill their potential as a powerful multi-discipline science instrument or a capability-defining engineering design and validation tool.
These science/engineering facilities just happen to be built from computer technology but are no longer really just computers. Indeed, it comes back to one of my favorite catchphrases — high performance computing is much more than just a high performance computer.
Andrew Jones is Vice President of High Performance Computing Consulting at Numerical Algorithms Group. He may be reached at editor@ScientificComputing.com.