Scientific Computing

Articles

From Scripting to Scaling
Mon, 04/12/2010 - 7:22am
Craig Lucas

From Scripting to Scaling
Multi-core is challenging even the most battle-scared programmer
 

1.HECTOR UK National Supercomputing Service 
www.hector.ac.uk 

2. Should Fortran be taught to Undergraduates?
www.walkingrandomly.com/?p=1397 

3. The Numerical Algorithm Group (NAG)  
www.nag.co.uk

4. MPI is a library specification, it is the most common way of enabling parallelism for large parallel machines 
www.mpi-forum.org 

5. Microsoft Pathways Web site for Star-P customers 
www.microsoft.com/pathways/star-p

6. MATLAB Parallel Computing Toolbox 
www.mathworks.co.uk/products/parallel-computing

7. Scilab 
www.scilab.org 

8. Octave 
www.gnu.org/software/octave

9. Co-array Fortran
www.co-array.org

10. Unified Parallel C (UPC)  
upc.gwu.edu

11. DARPA Language Project  
hpls.lbl.gov/wiki/index.php/Main_Page

12. Parallel Processing and Multiprocessing in Python 
wiki.Python.org/moin/ParallelProcessing

13. Best Practices for a MATLAB to C Workflow Using Real-Time Workshop 
www.mathworks.co.uk/company/newsletters/digest/2009/ nov/MATLAB-to-c.htmls_cid=MLD1109ukTA2&s_v1
=5234304_1-6K8T15
 
 

14. HPL Benchmark: measures performance of Top 500 Supercomputers
www.netlib.org/benchmark/hpl; www.top500.org

15. NAG Toolbox for MATLAB
www.nag.co.uk/numeric/MB/start.asp 

16. NAG and Python
www.nag.co.uk/Python.asp

17. Python Wrappers to IMSL C Numerical Library Algorithms
www.vni.com/products/imsl/pyimsl/overview.php 

Over the past year or so, whenever I talk to anyone involved in the support of researchers who use HPC facilities and the subject of  programming  comes up, there seems to be a growing consensus. The next generation of scientific programmer is not using Fortran or C/C++. An increasing number of researchers can fulfil their computational needs using Python or MATLAB, or some other packages/high level languages. Recently, someone came to HECToR,1 the UK’s Supercomputing Service, with a MATLAB script wanting to run much larger simulations than their desktop allowed. So, what do we tell researchers like this?

I am sure many would be tempted to raise an eyebrow and say you should have learned a more, shall we  say, traditional scientific computing language when you were an undergraduate! That debate rages on and on as this recent contribution shows,2 so we will avoid that here. However, I did recently teach Fortran to some graduate students who had only used MATLAB thus far and had Fortran 77 thrust upon them by their supervisor. The sheer panic in their faces only went to reinforce the simplicity of prototyping that some people find in scripting languages like MATLAB, that they don’t in lower-level languages.

I also am not going to debate why people use Python, MATLAB, et al. Their usability, flexibility, easy graphics, fast

prototyping capability, post processing features, interactivity, ease-of-data generation, clean syntax and the whole range of modules/toolboxes that are available are well documented. It is easy to see why they are so attractive.

I am a big believer in the scientist being allowed to do their science, and spending a large amount of their research time learning a new language and then learning a parallel one too doesn’t seem a great use of time, especially as those Ph.D. years tick away. So, the question is: Is that what should they do?

Users of HECToR are lucky in that the research councils in the UK supply a computational science and engineering (CSE) support mechanism, provided by NAG;3 it provides not only a core team to help users, but also software development grants and personnel for creating software for the machine. As a consequence, we have seen some great results in improving code performance. So, this is one solution: let the HPC expert convert your script into Fortran, say, with message passing interface (MPI).4 With this approach, the user could be left with a code they don’t understand, and would struggle to make the slightest modification to it. But, maybe this is the game we are now in! Using parallel computers isn’t getting any easier; multi-core is challenging even the most battle-scared programmer.

So, perhaps the better option would be to stay in their comfort zone, with the environment at least. So, first up MATLAB. Personally, I was a big fan of MATLAB Star-P, developed by Interactive Supercomputing (ISC). Star-P was middleware that took your MATLAB commands and converted them into library calls on the HPC machine where your distributed data sat. Backslash calls were magically swapped with ScaLAPACK calls and your large linear system solved without any knowledge of parallelism at all. But Microsoft has bought up ISC and, whilst the technology will survive, the future of Star-P as a product is not clear.5

The MathWorks have their own products, namely the Parallel Computing Toolbox and, if you want to go off node, you’ll need the Distributed Computing Server.6 This offers some parallelism of both data and tasks, but it is not yet widely used. Maybe one of the free MATLAB clones offers an interesting alternative.

Popular are Scilab7 and Octave,8 and, together with MATLAB itself, they have a wide range of parallel versions, many third party, too many to do justice to here. Perhaps that is the point, there are too many. Some using IO for communication, some toolboxes offering MPI or a subset of it, others offering a way to task farm over many instances of MATLAB or a clone. Many are doing these things very well. But, with the sheer amount and the varying funding the associated projects have, or the support they can offer, I wouldn’t want to hang my hat on anything just yet.

However, if I did, I personally would go with something that involves writing MPI. I have always found it is the concepts behind parallel programming that students struggle with; it is never as easy as just learning the half dozen essential routines. But, the skills you learn are as future-proof as you are going to get. Despite the existence of languages like Co-array Fortran,9 UPC10 and those coming out of DARPA’s HPCS program,11 MPI is still the de facto standard for large parallel machines.

So, what about Python? Perhaps this offers a more obvious transition into HPC. There are several implementations of MPI.12 Python is free, widely available, and there is a huge fan base of Python and it doesn’t look like it is going away anytime soon. I suspect that, eventually, even the most traditional programmers will all be talking Python in the same breath as Fortran and C, rather than muttering derogatorily under it.

On HECToR, we see people using Python in parallel mainly for post-processing tasks. Those using Python for parallel programming are C or Fortran programmers too; they are using Python, or embedding it, simply because they like using it.

OK, so we have avoided the issue of performance. Let’s just say that no one is going to expect the performance of a scripting language to be that of a compiled one. The MathWorks recently talked about this themselves, suggesting a translation to C.13 I know this is HPC heresy, we are supposed to squeeze out every drop of performance, but maybe we don’t. We have to factor in the time of development, and back to the researcher wanting to research, what if it took weeks rather than months to develop their code? Maybe it is OK to sacrifice a little, or even quite a lot, of performance, if they can solve that bigger problem. And, let’s not pretend that every HPC code out there runs like the HPL benchmark!14

Also, performance can be gained by using libraries that are written in higher-performing languages. Here at NAG, we have been providing numerical libraries for many years. Whilst we have provided Fortran, C and multi-core Libraries, we know the importance of other environments, so make our library available through our Toolbox for MATLAB,15 and people also are using NAG with Python.16 Our colleagues at IMSL also have an offering for Python.17
So, to conclude, you either:

  • Shoe-horn parallelism into your familiar environment — still hard work, and maybe performance isn’t great
  • Take the longer route — scrap your code and your way of working, and learn MPI and Fortran or C
  • Get someone else to write a parallel code for you and hope they won’t mind maintaining it too!
Well, I work in HPC. So, I have to take the hard option —option 2 — at least for now.

Craig Lucas is a Senior Technical Consultant at Numerical Algorithms Group. He may be contacted at editor@ScientificComputing.com.

 

Share this Story

X
You may login with either your assigned username or your e-mail address.
The password field is case sensitive.
Loading