Numerical Precision: How Much is Enough?

Tue, 06/30/2009 - 7:30am
Rob Farber
Numerical Precision: How Much is Enough?

As we approach ever-larger and more complex problems, scientists will need to consider this question

The advent of petascale computers and teraflop-per-board graphics processors has raised the question of "how do we know that anything we compute is correct?" Numerical errors can quickly accumulate when performing a trillion to thousands of trillions of floating-point operations per second due to approximations, rounding, truncation errors and other concerns.

This problem confronts every scientist and applications developer because of the speed of current computational hardware. In asking for even more capable computational systems, we might be caught by the adage, "beware of what you ask for because you might get it."

While not many people will gain access to a petascale supercomputer in the next few years, GPU computing is becoming ever more ubiquitous. NVIDIA states they now have an installed base of over 100 million CUDA-capable graphics processors. With teraflop-per-board capability, graphics processors have dramatically increased the computational capabilities available for scientific calculations — and at a commodity price-point. A challenge with GPU computing is that peak performance for the current generation of hardware can only be achieved when using single-precision, 32-bit, floating-point arithmetic. The use of double-precision, 64-bit arithmetic will result in a significant decrease in performance. As a result, anyone planning to use the current generation of graphics processors for scientific computation must consider the question "how important is single-precision compared to double-precision (64-bit) arithmetic for my application?" Happily, some GPUs have limited 64-bit floating-point capability, which opens the possibility of performing multi-precision calculations where the 64-bit operations are only used sparingly when the additional precision might make a difference — say to calculate a sum of a vector or similar such operations.

Numerical accuracy is one of those opaque areas of scientific computing that people try to solve by using the hammer of 64-bit arithmetic to fix the problem. Generally, the problem is caused by thinking that more bits of precision are better and acknowledging that, unfortunately, any number of bits of precision is never really quite enough. The very real fear behind this thinking is that too low a precision can introduce non-physical artifacts into physical simulations, cause important criticality phenomena to be missed, or result in the application exhibiting other undesirable or pathological behavior.

Compatibility with legacy software is also a significant problem. For these applications, producing the exact same result as other computers is a paramount concern when evaluating newer hardware and compilers.

Alistair Rendell provided some nice examples at his SciDAC talk last summer "Build Fast, Reliable, and Adaptive Software for Computational Science." In his talk, he discussed Rump’s example, a well-know demonstration that illustrates how increasing floating-point precision does not necessarily equate to greater accuracy. As can be seen below, the correct answer is never found even as the number of bits used in the floating-point arithmetic to calculate the solution increases from 32- to 64- and finally 128-bits of precision:

ƒ = (333.75 - a2)b6 + a2 (11a2 b2 - 121b4 - 2) + 5.5b8 + a/(2b)

where a = 77617 and b = 33096
32-bit: f = 1.172604
64-bit: f = 1.1726039400531786
128-bit: f = 1.1726039400531786318588349045201838
Correct: f = -0.827396059946821368141165095479816 …

The proposed solution is to use intervals to indicate when there is a problem. Unfortunately, most hardware and software systems do not support an interval arithmetic data type.

A very simple common-sense approach is to run the software system at both 32- and 64-bit precision and compare the results. For many applications, 32-bits of precision are sufficient to get numerically acceptable results. This, in turn, can justify the use of high-performance graphics processors or higher-performance 32-bit data types on conventional processors to speed your scientific computations.

Taking a multi-precision approach can enhance the accuracy of a calculation and justify the use of mainly single-precision arithmetic (for performance) along with the occasional use of double-precision (64-bit) arithmetic for precision-sensitive operations. The paper by Koji Yasuda, "Accelerating Density Functional Calculations with Graphics Processing Unit," uses the multi-precision approach to justify the use of graphic processors to perform ab initio density functional calculations. In his paper, Yasuda notes that since roundoff error should be unbiased and nearly random, the sum of N single-precision numbers with an absolute value, s, contains the relative error of 2-23sN-1/2 and that the error estimate will decrease as the number of terms increases. Using this relative error, along with a straight-forward analysis of the acceptable error in calculating the desired result (the total electronic energy of a molecule), Yasuda was able to describe how to split the calculation to gain the performance of the single-precision graphics processors while exploiting the hardware double-precision capability of the host computer to perform his calculations with acceptable error tolerance.

Understanding how numerical errors accumulate within an application is a necessary step to utilizing the current generation of graphics processors. In addition, many applications can achieve significant speed-ups (on the order of 2X) on conventional hardware just by understanding where single-precision data types can be safely used without incurring significant numerical errors. As we approach ever-larger and more complex problems with computers that can perform 1012 to 1015 floating-point operations per second, look forward to supercomputers that can perform 1018 floating-point operations per second. It appears that, at some point, the scientists using these systems must consider the question of "how do we know the result we calculated is correct?"

Happy "accurate" supercomputing!

1. Alistair Rendell talk:
2. Yasuda, Koji, "Accelerating Density Functional Calculations with Graphics Processing Unit", J. Chem. Theory Comput. 2008, 4, 1230-1236.

Rob Farber is a senior research scientist in the Molecular Science Computing Facility at the William R. Wiley Environmental Molecular Sciences Laboratory, a Department of Energy national scientific user facility located at Pacific Northwest National Laboratory in Richland, WA. He may be reached at 


Share this Story

You may login with either your assigned username or your e-mail address.
The password field is case sensitive.