Scientific Computing

Articles

Advertisement

High Performance Development with Python
Mon, 11/17/2008 - 8:04am
Diane Garey and Steve Lang
High Performance Development with Python

Achieving high productivity prototypical development for mathematical and statistical modeling



Python is an open source dynamic language that is growing in popularity. Because it is a scripting language, it is extremely powerful for rapid prototyping and model development, but it is not the fastest language for deploying high performance computing applications into production environments. This article outlines how to achieve high productivity prototypical Python development for mathematical and statistical modeling without sacrificing performance, wide deployment or maximum flexibility. Combining the ease-of-use of Python with the high performance of numerical libraries means that ideas can more quickly become applications, increasing the productivity of developers and the value of high performance computing (HPC) environments, while overcoming limitations of prototype-only tools.

Python rising
Because they are high productivity languages, scripting languages have been, and continue to be popular since they were first introduced in the 1960s. Most scripting languages are easy-to-learn and, as they mature, become quite powerful. Python, for example, comes with a comprehensive standard library, yet can be learned in just a few days. According to the TIOBE Programming Community Index, the Python scripting language has risen in popularity in the past year to currently rank as the sixth most popular development language in the world. Python is supported by a large open source community that contributes to the development of the language, additional implementations like the Jython Java and IronPython .NET versions, as well as various tool extensions including NumPy and SciPy.

Given its minimalist development philosophy, clear syntax, flexible programming approaches and large standard library, Python provides a high productivity development language for any domain expert, including experts in HPC. It allows for fast prototyping, much like MATLAB and similar tools, making it useful for research projects and ad hoc development. As an open source language, though, Python allows users to avoid the cost of purchasing a commercial tool and eliminates the need to learn a proprietary prototyping language.

Certainly, in traditional HPC fields like science, engineering and financial services, Python use is growing. A significant contributor to this growth is the rich set of tools available that enable prototype and application development for any environment. These tools include:
NumPy — the de-facto standard in Python for handling and manipulating large arrays
SciPy — a popular set of open source analytics for scientific work
Matplotlib — which provides high quality

2-D charting
Since these tools are well-suited to research, algorithm development and data visualization, we feel that Python should definitely be considered for application prototyping. The question we need to answer is: While high productivity prototyping is obviously a good thing, is Python also a viable language for production HPC applications? In many cases, the answer may be “yes.” In some situations, however, developers might want to migrate a Python prototype to a more traditional HPC language like C or Fortran.

Python is supported by a highly involved user community and there are various sources for open source analytics developed for Python. However, it is often difficult or impossible to vet the numerical methods employed and the accuracy of the results, especially across different platforms and OS combinations. Also, open source analytics for Python are typically not designed to be called from languages like C and Fortran when a problem calls for a native language implementation for deployment in a large HPC environment.

We believe that for both performance and potential quality considerations, any limitations — real or perceived — can be easily overcome. There are interesting Python projects currently underway, as well as commercial options that allow Python, combined with traditional HPC languages like Fortran or C, to be both a high productivity and high performance option for application development.

Parallel computing and Python
There are several Python projects currently underway that provide different parallel architectures, including shared memory architectures (SMP) and message passing using MPI for distributed memory architectures. One promising development is ongoing work to enable the popular IPython shell to be used for parallel computing. (Project details can be found on the SciPy Web site).

On the commercial side, vendors of algorithms for HPC applications are recognizing the growth in Python and providing options for customers. The Numerical Algorithms Group (NAG) in the United Kingdom, for example, provides a white paper on their Web site that instructs developers on how to wrap their NAG Fortran algorithms in Python. By following the instructions, a developer can take a NAG Fortran algorithm and develop a custom wrapper to make it available in Python. Visual Numerics recently completed a project to wrap all of its C Library algorithms in Python. That work is now available as PyIMSL and freely available to all IMSL C Library users. In both cases, the concept is similar in that developers can write applications using Python that leverage high performance algorithms “under the covers.”

Bridging the gap between prototype to production stages
A commercial approach that blends Python with C or Fortran algorithms offers a significant benefit to developers v the ability to quickly and easily create a prototype and then turn that prototype into a production application.

The common scenario today is an HPC expert creates a prototype using a scripting language like MATLAB or Python. Typically, in this prototype stage, the domain expert wants to quickly prove a concept or create a model. Eventually, this prototype or model might become part of a production application.

While options for performing HPC work using scripting languages are growing, especially classes of problems where the size of the problem means that the benefits of scripting offset some performance degradation, most large production HPC applications are still written in Fortran or C for performance. Therefore, there can be a significant gap between the prototype to production (P2P) stages. Bridging the gap typically requires a re-design and re-write, which is time-consuming, costly and risky. Any algorithm package used for numerical analysis in the prototype stage which differs from the package used in the production stage can result in problems that are manifested in the applications code in different ways, such as numerical results, capabilities and error handling. One approach that bridges the gap uses PyIMSL — combining Python prototyping with C production application development — while using the same underlying algorithms. When the prototype is ready for production use, the Python application can be rewritten in C using the IMSL C Library. Because the commercial library algorithms have been wrapped in Python, the same numerical algorithms are used behind the scenes throughout all stages of development. Extensive documentation of the underlying algorithms and cited references to academic papers are additional benefits provided by the IMSL C Library, allowing for more informed use of the analytical methods. This means that building prototypes ready for production use becomes easier, faster and less risky. Developers get the high productivity benefits of an easy-to-use scripting language without sacrificing high performance.

Overcoming Python constraints
The combined approach of using Python wrapped around a commercial numerical library for HPC application development also helps developers to overcome one of the major constraints in using the Python language for HPC work. This challenge centers around the global interpreter lock (GIL). The GIL is at the heart of the C implementation of Python and limits the ability to use threading to execute multiple code blocks simultaneously. Understanding this limitation is key to utilizing the performance of multi-core processors in shared memory parallelization. The solution is to launch and manage multiple Python processes and communicate between them, which has more overhead than simple threading. As described in the previous section, having the option to quickly and easily write the production application in C or Fortran for higher performance on shared memory architectures in order to overcome such limitations is often the only viable solution.

For distributed memory architectures (clusters and supercomputers), one hybrid approach can be to use Open MPI and the Python mpi4py interface to allow Python to initiate and steer MPI applications where the core analytics are implemented in C code for performance. Again, an approach such as PyIMSL can be used for prototype work and the core logic converted to, in the case of PyIMSL, C code with calls to the IMSL C Library to ensure consistent results and high performance. This approach allows Python to be leveraged for job management and steering, including any graphics or user interface needs, and high performance native code for the actual analytical code.

Conclusion
For high productivity, high performance application development, we believe that Python is an excellent alternative. Combining the ease-of-use of Python with the high performance of numerical libraries means that ideas can more quickly become applications, increasing productivity of developers and value of the high performance computing environment. •

Diane Garey is a product marketing manager and Steve Lang is an international senior project manager at Visual Numerics. They may be reached at editor@ScientificComputing.com.

Acronyms
GIL Global Interpreter Lock | HPC High Performance Computing | MPI Message Passing Interface | NAG Numerical Algorithms Group | P2P Prototype to Production | SMP Shared Memory Architectures

Related Resources
PyIMSL: www.vni.com/products/imsl/pyimsl/overview.php
SciPy: ipython.scipy.org/moin
TIOBE Programming Community Index: www.tiobe.com/index.php/content/paperinfo/tpci/index.html

Advertisement

Share this Story

X
You may login with either your assigned username or your e-mail address.
The password field is case sensitive.
Loading