Transforming Power Grid Operations

High performance computing is an enabling force in the integrated real-time platform

The phrase “high performance computing” may evoke visions of biological and molecular systems, social systems, or environmental systems, as well as images of supercomputers analyzing human genes, massive data analysis revealing carbon evolution on the planet, and “Deep Blue” defeating world chess champion Garry Kasparov. Such achievements are exhilarating, but little thought is given to the electricity that powers these brilliant computational feats and the power system that generates and delivers it.

Since the first commercial power plant was built in 1882 in New York City, the power grid in North America has evolved into a giant system comprising hundreds of
click to enlarge 

Figure 1: Functional structure of real-time power system operations. The state estimator drives functions such as contingency analysis, optimal power flow (OPF), and economic dispatch.
thousands of components across many thousands of miles. This power grid has been called the most complex machine man has ever built. It is also one of the most expensive, with assets estimated at $800 billion or more. The reliable and relatively inexpensive electricity supplied by the power grid is the foundation of all other engineering advances and of our nation’s prosperity.

Sadly, the computational technologies used to plan, monitor and control the power grids, as well as stabilize the system and make it efficient, lag far behind its growth. The power grid uses centuries-old algorithms that are typically applied on traditional computational hardware such as personal computers. Given the increased complexity and size of the power grid, they are unable to predict imminent problems in the grid.

The consequences are the inability to predict, avoid and/or mitigate events such as the 2003 blackout on the East Coast and the extensive blackouts in the West in 1996. These events reveal the vulnerability of the grid to disruption — whether caused by technical difficulty, natural disaster or malicious intent. High performance computing offers great advancements to power grid operations and may prevent or mitigate devastating blackouts.

Researchers at the U.S. Department of Energy’s Pacific Northwest National Laboratory (PNNL) in Richland, WA, believe that traditional power grid computation can be reformulated and converted to high performance computing platforms (i.e., PC clusters, reconfigurable hardware, scalable multi-core shared memory computers and multithreaded architectures). The improved performance is expected to have a huge impact on how power grids are operated and managed, and to ultimately lead to better reliability and asset utilization in the power industry.

The new computational capabilities will be developed, tested and demonstrated on the comprehensive grid operations platform in the Electricity Infrastructure Operations Center (EIOC), a new PNNL facility for developing and testing technologies that enhance energy infrastructure and operations. PNNL has a wealth of computing resources representing both leading and emerging classes of high-end computer architectures.

click to enlarge 

Figure 2: Grid computational paradigm. Real-time grid operations are based on static model formulation. Power grids are inherently dynamic. Dynamic simulation has low computational efficiency; i.e., it takes about 10 minutes to simulate 30 seconds of the western U.S. power grid.
The Advanced Computing Technology Laboratory at PNNL hosts a range of architectures such as the 128-processor SGI Altix with scalable non-uniform memory access shared memory. Shared memory is useful for efficient implementation of sparse matrix and irregular computations like those arising in power systems. Another smaller configuration of the SGI Altix with directly connected field-programmable gate arrays (FPGAs) is available to explore hybrid and reconfigurable computing options.

The Cray MTA parallel multithreaded shared-memory architecture, which, like the SGI Altix, provides shared memory hardware in the uniform memory access configuration, is also available. Future Cray architectures, like the Cray XMT (also known as “Eldorado”) scalable multithreaded system with advanced compiler technologies, are expected to offer excellent support for the latency hiding and fine-grain parallelism lacking in traditional architectures.

Software for power grid operations

Power grid software performs many important functions. “State estimation” is central for driving other key functions (Figure 1). State estimation typically receives telemetered data from the supervisory control and data acquisition (SCADA) system every four seconds and extrapolates a full set of grid conditions for operators based on the grid’s current configuration and a theoretically based engineering power flow solution.

click to enlarge 

Figure 3: The iterative conjugate gradient algorithm outperforms state-of-the-art direct algorithm, SuperLU, on 128-processor SGI Altix. Conjugate gradient algorithm performs well on scalability and absolute performance. Execution time increases with SuperLU running on more than two processors. The shortest time to solution using conjugate gradient (16 processors) is 4.75 times better than the best time using SuperLU (2 processors).1 
With today’s computers and algorithms, the state estimation process can be updated only about every two minutes — much slower than the SCADA measurement cycle. Contingency analysis assesses the effect of various combinations of power system component failures based on state estimates and can be updated only about every five minutes.

This is not fast enough to predict system status because a power grid could become unstable and collapse within seconds. In addition, both of these analyses are conducted for areas within one utility company’s boundaries, and examine an incomplete set of contingencies, prioritized from experience as the most significant. Other processes in Figure 1 take even longer (about 10 minutes) and are less frequently used.

Power grid models are built on complex mathematical theories and algorithms that have been optimized to the maximum performance of single-processor computers (Figure 2). Unfortunately, manufacturers are unable to increase single processor speed to meet the computational demands of power grid operations.

Due to thermal limitations in conventional semiconductor technology, clock frequencies cannot be increased significantly on single processors. However, parallel architectures are available to improve performance and transform the power grid operations paradigm. High performance computing can accomplish this transformation in three steps: fast and robust state estimation, faster-than-real-time dynamic simulation, and real-time dynamic contingency analysis.

Fast and robust state estimation

click to enlarge 

Figure 4: Fully parallelized state estimation package: 10x speedup of state estimation process is achieved when running on 16 Cray MTA-2 parallel processors versus a single MTA-2 processor.2 
Applying high performance computing technologies does not mean running the same software on advanced computers. In adapting software to high performance computing architectures, data and computations need to be partitioned so all available processors are deployed to perform the required computations. This challenging task requires redeveloping certain algorithms to take advantage of parallel computing architectures.

State estimation is no exception. Fast and robust state estimation lays the foundation for improving system operation from minutes to seconds. Mathematically, state estimation is finding an optimal fit for a set of states to measured quantities. The weighted least square method, which involves solving a large and sparse system of linear equations at every iteration of the state estimation process, is the most widely used. The two major categories of algorithms for solving such linear equations are direct and iterative. Direct algorithms perform well on sequential computers, but they do not scale well, and they exhibit lower performance on parallel computers due to the sequential nature of the operations.

When it comes to parallel computing for power grid simulation, some iterative algorithms are more suitable for parallelization and exhibit better performance than direct algorithms. Conjugate gradient is one of those. At PNNL, a parallel version of the conjugate gradient algorithm was developed in a shared-memory form. Tested on the 128-processor SGI Altix, the conjugate gradient algorithm outperforms the state-of-the-art direct algorithm, the University of California, Berkeley’s SuperLU package (Figure 3). SuperLU was designed for solving linear systems of equations in other computational areas such as computational fluid dynamics. This illustrates that a specific algorithm may perform well for some computational problems, but a good fit of the algorithm, the characteristics of the problem domain data, and high performance computing hardware is needed to achieve the desired results.

Based on the shared-memory conjugate gradient algorithm, a parallel state estimation package was developed that achieves an order-of-magnitude speedup (10x) on the Cray MTA-2 shared-memory parallel computer (Figure 4). The test case is a very large state estimation problem involving matrices of a size in the order of 28,000 by 28,000. This problem was derived from the western North American power grid, whose footprint encompasses a geographical area equivalent to over half the United States. Further tuning and optimization of the parallel state estimation package are showing promising results, with better performance and scalability. It is expected to bring the solution time of state estimation down to match SCADA cycles.

Faster-than-real-time dynamic simulation

The lack of dynamic information in real-time grid operations results in an incomplete picture of the inherently dynamic power grid. The technical hurdle is the low computational efficiency of grid dynamic simulation. For example, it takes 10 minutes to simulate 30 seconds of the western U.S. grid.

click to enlarge 

Figure 5: Integrated real-time operations platform for state estimation and grid simulation. Applying high performance computing to power grid simulation enables real-time state estimation, faster-than-real-time dynamic simulation, and dynamic contingency analysis.
Grid dynamics is typically represented in a set of differential algebraic equations. Dynamic simulation involves numerical integration and linear equation solutions. High performance computing technologies improve the computational performance of both mathematical aspects significantly. Numerical integration is inherently parallel because each equation is integrated separately. Methods for linear equation solution are similar to one step of the parallelized state estimation.

Combined with fast state estimation, dynamic simulation can be calibrated continuously with up-to-the-second system information. This concept of combined state estimation and dynamic simulation enables a dynamic view of the current and predicted future status of the grid. The integrated simulation algorithm for the combined simulation is being developed and implemented on parallel computing hardware at PNNL. Dynamic simulation is expected to run much faster than real time, so a skilled operator can anticipate developing problems in the grid. With fast state estimation and dynamic simulation, both current and future states of a power grid can be available to operators.

Real-time dynamic contingency analysis

The third step in grid reliability is the ability of the grid to sustain the potential loss of one or several components. This part needs real-time dynamic contingency analysis.

The Grid Command Center: Putting power ideas to the test

New computational solutions to managing the nation’s power grid in real time are on the horizon, but integrating them into the daily activities of grid operators poses challenges. Will the new technologies deliver the promised results? Will utilities invest in them? Will operators learn to use them and trust them during a crisis?

The U.S. Department of Energy (DOE) Pacific Northwest National Laboratory’s Electricity Infrastructure Operations Center (EIOC) is a platform for research, development and deployment of technologies to improve the reliability and effectiveness of the nation’s electric grid. The EIOC is a fully functional control room that looks and feels just like the 130 control centers across North America where grid operators work every day. The EIOC has the capability and services to operate as an actual backup control center.

With $3 million in energy management software provided by AREVA T&D, secure networks, 30 workstations, more than 100 servers, 25 special-purpose computers, and a 115-square-foot video wall, EIOC guests have referred to it as “the Star Trek room.” Access to live data from North America’s power grids makes the EIOC an ideal location for testing new technologies without the cost and risk associated with conducting research on a system that is actually controlling infrastructure.

The EIOC provides a test bed for researchers to develop and refine technologies. They can focus on solutions that will enable increased reliability, lower costs, expanded capacity and enhanced security for the U.S. electric grid. The EIOC includes a training room where operators try out new tools. Training modules help operators see beyond the obvious to spot potential problems.

Researchers are applying high performance computing to speed up the state estimation process and related applications. Today, it takes minutes for a static snapshot of the grid to reach the operators for analysis; a power grid could become unstable and collapse within seconds.
Because PNNL is a DOE national laboratory, the EIOC leverages the high performance computing capabilities of DOE’s Office of Science to advance computing architecture and algorithms for the electric grid. Researchers believe these processing capabilities will soon be commonplace for utilities.

The EIOC is available to government agencies, utilities, manufacturing companies and researchers interested in testing solutions, understanding potential benefits, solving problems or vetting new tools and integrating them with actual data in the environment where the technology eventually will be used.
State estimation is the basis for contingency analysis. Real-time state estimation at the SCADA cycle also enables real-time contingency analysis. Faster-than-real-time dynamic simulation further enhances contingency analysis because it can be performed with full dynamics compared with the traditional static power flow-type contingency analysis.

Contingency cases are relatively independent of one another, so contingency analysis is a parallel process. Mathematically, there is a relatively straightforward parallelization path, but several technical issues remain: contingency case selection, computational load management and computational results management. Case selection involves a simplified version of the power flow solution, for which parallel algorithms have yet to be developed. Contingency load management is balancing the workload among processors since not all cases require the same amount of time. Computational results management is data management. Contingency analysis, especially dynamic contingency analysis, generates a lot of data. Extracting information from large data sets must be addressed to maintain high computational efficiency and present complex information to the operator in a digestible manner. Human factors may be involved in this aspect.

Another challenge is the depth of the contingencies. Current practice requires only N-1 contingency analysis, or one contingency depth, which means the grid should sustain the loss of any one component at any time. However, real-life cascading blackouts usually involve multiple component failures. Due to the combinatorial and computational burden, contingency analysis is usually configured to look at only the most likely multiple-contingency situations. However, mitigating blackouts requires N-x contingency analysis, which would significantly increase the number of cases that need to be examined in real time.

For example, the western U.S. power grid has about 3,000 generators, 5,500 transformers and 13,000 transmission lines. Failure of any two of these components (N-2 contingencies) would generate 108 cases, making both contingency selection and analysis far more challenging. Considering N-3 or N-4 contingencies, it can easily evolve to a petascale problem, which would make high performance computing the only option for an effective solution.


The three high performance computing-based steps, fast and robust state estimation, faster-than-real-time dynamic simulation, and real-time dynamic contingency analysis, form an integrated real-time platform for state estimation and grid simulation, and provide a complete picture of the current, future and potential states of a power grid (Figure 5). This integrated real-time platform, with its computational power, will transform current grid operations into a more effective and reliable control function. For the first time, grid operators will be able to see ahead and prepare for what’s coming.

This transformation will not happen overnight. High performance computing is the enabling force, but to bring these computational solutions into control rooms requires proof of their feasibility, practicality and cost-effectiveness. Testing and demonstration of the integrated real-time operation platform in the EIOC is the next step.


1. Nieplocha, J. Marquez, A. Tipparaju, V. Chavarria-Miranda, D., Guttromson, R. Huang, H., Towards Efficient Power System State Estimators on Shared Memory Computers, Proceedings of IEEE Power Engineering Society Annual Meeting, Montreal, CA. 18-22 June 2006.
2. J. Nieplocha, A. Marquez, John Feo, D. Chavarria-Miranda, G. Chin, C. Scherrer, N. Beagley, Evaluating the Potential of Multithreaded Platforms for Irregular Scientific Applications, Proceedings of ACM Computing Frontiers, May 2007.

Zhenyu Huang is senior research engineer in energy science and technology; Ross Guttromson manages the Electricity Infrastructure Operations Center; Jarek Nieplocha is a Laboratory Fellow in computational and information sciences; and Rob Pratt manages the GridWise program and the Electricity Infrastructure Operations Initiative, all at Pacific Northwest National Laboratory. They may be contacted at