High Performance Data Analysis: Big Data Meets HPC
High performance computing (HPC) has already contributed enormously to scientific innovation, industrial and economic competitiveness, national and regional security, and the quality of human life. The crucial role of HPC has been stressed in recent years by the U.S. and Russian presidents, as well as by senior officials in Europe and Asia.
To date, most data-intensive HPC jobs in the government, academic and industrial sectors have involved the modeling and simulation of complex physical and quasi-physical systems. The systems range from product designs for cars, planes, golf clubs and pharmaceuticals, to subatomic particles, global weather and climate patterns, and the cosmos itself. But from the start of the supercomputer era in the 1960s — and even earlier —an important subset of HPC jobs has involved analytics — attempts to uncover useful information and patterns in the data itself. Cryptography, one of the original scientific-technical computing applications, falls predominantly into this category.
The financial services industry was the first commercial market to adopt supercomputers for advanced data analytics. In the 1980s, large investment banks began hiring particle physicists from Los Alamos National Laboratory and the Santa Fe Institute to employ HPC systems for daunting analytics tasks, such as optimizing portfolios of mortgage-backed securities, pricing exotic financial instruments, and managing firm-wide, global risk. This practice has continued: in 2013, Goldman Sachs lured a particle physicist away from the Large Hadron Collider work at CERN.
What Is Driving HPDA Demand?
HPDA is an evolutionary and a revolutionary story. The data explosion fueling the growth of high performance data analysis stems from a mix of long-standing and newer factors:
- The ability of increasingly powerful HPC systems to run data-intensive M&S problems at larger scale, at higher resolution, and with more elements (e.g., inclusion of the carbon cycle in climate ensemble models)
- The proliferation of larger, more complex scientific instruments and sensor networks, from "smart" power grids to the Large Hadron Collider and Square Kilometer Array.
- The increasing transformation of certain disciplines into data-driven sciences. Biology is a notable example, but this transformation extends even to humanities disciplines such as archeology and linguistics.
- The growth of stochastic modeling (financial services), parametric modeling (manufacturing) and other iterative problem-solving methods, whose cumulative results produce large data volumes.
- The availability of newer advanced analytics methods and tools: MapReduce/Hadoop, graph analytics, semantic analysis, knowledge discovery algorithms, and others
- The escalating need to perform advanced analytics in near-real time—a need that is causing a new wave of commercial firms to adopt HPC for the first time
Existing HPC Disciplines Expand Analytics
Some members of the climate research community have begun to augment existing methods with analytics-based knowledge discovery algorithms to promote new insights. Perhaps no field has stronger potential for benefiting from HPC-based analytics than bioscience. Data-intensive applications already in motion in this varied field range from advanced research—notably in genomics, proteomics, epidemiology and systems biology—to commercial initiatives to develop new drugs and medical treatments, agricultural pesticides and other bio-products.
One of the world's most socially and economically important HPDA thrusts will almost sure be the multi-year transition from today's procedures-based medicine to personalized, outcomes-based health care. Identifying highly effective treatments in near-real time by comparing an individual's genetic makeup, health history and symptomology against tens of millions of archived patient records poses enormous HPDA challenges that may take another decade to master. When this capability matures, it will likely serve as a decision-support tool of unprecedented utility for the global health care community.
In yet another bioscience example, German-based Schrödinger is using HPC public cloud resources to identify promising candidates for new drugs to combat cancer and other diseases. IDC believes that at least half a dozen pharmaceutical firms are following in Schrödinger's footsteps.
Newer analytics methods and tools are likely to benefit all existing HPC vertical segments at least to some extent. These segments also include computer-aided engineerings, chemical engineering, digital content creation and distribution, electronic data automation, financial services, geosciences and geo-engineering (oil and gas), defense, government labs and academia. But the story doesn't end there.
High-potential horizontal analytics applications are also starting to make an important impact in the world of high performance computing. Fraud detection, cyber security and insider threats are increasingly crucial challenges for established HPC users in government, academia and industry to meet — and they are causing a new wave of commercial organizations to move up to HPC for the first time. Prominent examples range from PayPal to Italy's Istituto Nazionale della Previdenza Sociale and the U.S. Postal Service.
Tackling these problems often requires moving beyond today's needle-in-a-haystack, static searches for items already known to exist in a database. The challenge presented by these problems is to discover hidden patterns and relationships — things you didn't know were there — and then to track patterns dynamically as they form and evolve.
The HPDA vendor scene is becoming increasingly heterogeneous and vibrant. The analytics side of the formative HPDA market is where traditional HPC users and first-time commercial adopters are converging most rapidly. Established vendors that have served each of these customer groups are exploiting this convergence by following their buyers into the new HPDA analytics territory.
IDC forecasts that revenue for HPDA-focused servers will grow robustly (13.3% CAGR), increasing from $743.8 million in 2012 to approach $1.4 billion in 2017. HPDA storage revenue will approach $800 million in the latter year. The most serious technical challenge to liberating HPDA growth is data movement and management, although the HPDA market should be seen more fundamentally as a war among clever algorithms.
The growing market for high performance data analysis (HPDA) — using HPC for data-intensive challenges — is already enlarging HPC's contributions to science, commerce and society. HPDA promises to play a major role in helping to address the major opportunities and challenges of the 21st century.
Steve Conway is Research VP, HPC at IDC. He may be reached at editor@ScientificComputing.com.