Will Your Next Supercomputer Come from Costco?
A leading-edge architecture for just $600
A fun topic for April — not an April Fool's joke — is that you can purchase a commodity 200+ Gflop (single-precision) Linux supercomputer for around $600 from your favorite electronic vendor.
click to enlarge
Figure 1: Peak single-precision floating-point rate comparison
I don't want to mislead you into thinking this is the right platform for many — perhaps most — scientific codes, but it can perform well for some types of calculations. Los Alamos National Laboratory has made a significant investment in Roadrunner, a hybrid supercomputer composed of 16,000 Opteron and 16,000 Cell B.E. processors for a projected 1.6 petaflops (1600 trillion floating point operations per second), which indicates they are certainly believers in the Cell architecture.
So, what's so hot about the Cell B.E. engine?
• For starters, it is energy efficient, meaning the Cell B.E. processors provide a nice ratio of performance to power consumed. That's great for petascale supercomputers — especially since power limitations are now a significant computer design factor — as well as for placing lots of computational power at a remote sensing site, or in your living room.
• Under Linux on the PS3, there are six accessible synergistic processing elements (SPEs) for computation. (A seventh runs in a special mode and is dedicated to aspects of the OS and security, and an eighth is disabled to improve production yields.) Each SPE can run a different program, and the internal communications allows programmers to arrange the data flow in different ways using parallel, pipelined or streamed processing data flow models.
• A wonderful DMA (direct memory access) engine moves data on and off the cell. Unfortunately, DMA is also a drawback, as the programmer must manually choreograph data movement and computation.
What's not so hot?
• Plan to spend lots of time programming assembly code to get close to those wonderful peak performance numbers. You can run C-code on the SPEs but your performance will suffer. My personal experience is that tight loops of C-code will generate around four Gflop/s per SPE.
• It offers limited memory, as each SPE has only 256 kilobytes (yes, kilobytes) of memory for both program and data. Thus, only tight loops can run on each SPE.
• Performance is poor for double-precision (64-bit), relative to single-precision (32-bit) performance. A PS3 cell processor can produce around 204 Gflop/s single-precision performance but only 15 Gflop/s double-precision.
• Be prepared for lots of programming headaches when trying to optimize computation and data movement.
Let's compare the PS3 to a 4-way (dual-socket, dual-core) 2.4 Ghz Opteron with 1 GB of DDR2-667 RAM, according to three of the balance measures I mentioned in my February 2007 column, "HPC Balance and Common Sense":
• single-precision floating point performance
• the ratio of RAM capacity and floating point performance (GB/GFLOP)
• the ratio of RAM bandwidth and floating point performance (GB/s/ Gflop).
click to enlarge
Figure 2: Assumed "good" C-code performance comparison
Assuming a "good" floating point rate for C-code (approximately 4 Gflop/s per SPE), we see the PS3 still remains unbalanced, but the Opteron system starts looking better.
This graphically makes our point: the Cell B.E. architecture is highly specialized for certain types of applications. It can be a wonderful platform if your application fits the architecture. Otherwise, it will probably be a difficult platform upon which to get your code running, and it may perform extremely poorly relative to general-purpose processors.
Hopefully, Linux on the Sony PS3 (as well as this column) tweaked your curiosity. Have some fun and try it! For $600, you can gain experience with a leading-edge computational architecture, as well as make your kids (perhaps even yourself) very happy. My guess is that for most people, trying to program the Cell B.E. engine in the PS3 will generate an appreciation for more balanced and mature computer architectures, but if your application performs well, it will be very fast!
|Cell Processors for Scientific Computing||www.cs.berkeley.edu/~samw/projects/cell/CF06.pdf|
|Cell Workshop Slides, LANL||www.cs.utk.edu/~dongarra/cell2006/cell-slides/04-Ken-Koch.pdf|
|Graph Exploration Algorithms||hpc.pnl.gov/people/fabrizio/papers/ipdps07-graphs.pdf|
|LANL Newsletter, Roadrunner News Announcement||www.lanl.gov/news/newsletter/091106.pdf|
Rob Farber is a senior research scientist in the Molecular Science Computing Facility at the William R. Wiley Environmental Molecular Sciences Laboratory, a Department of Energy national scientific user facility located at Pacific Northwest National Laboratory in Richland, WA He may be reached at editor@ScientificComputing.com.