Thursday, July 17, 2014

Jedi vs. the Droids

How does the performance of a 20 year old supercomputer compare to the devices that we use today? Let's compare the Cray J916 to a recent laptop and a smart phone.

According to an archived copy of the Cray J90 Series webpage the vector processors have a theoretical peak performance of 200 mflops each, giving our 8 CPU system 1.6 gflops. But, your mileage may vary depending on the code that is running. One of the standard benchmarks used for supercomputers is LINPACK. Results for a J916 with the same configuration as ours are listed in Performance of Various Computers Using Standard Linear Equations Software by Jack J. Dongarra from June 1995. An 8 CPU system was measured at 1.436 gflops, an efficiency of about 90% of the theoretical peak.

Cray J90 webpage
"Just right for you" - it certainly is for us...

Next we'll run Linpack for Android. My Samsung Galaxy Note II has a 1.6 GHz ARM Cortex-A9 with four cores. Running LINPACK multi-threaded gives about 200 mflops, just a little faster than a single J90 processor. So, yes, that is (nearly) a mid-1990s entry level supercomputer in my pocket. At least on paper. We're really just exercising the ability to do floating point calculations, and this is not necessarily a good measure of system throughput on a real problem.

I estimate that the theoretical peak performance of the ARM is about 3 gflops or so, giving well below 10% efficiency. (I'm ignoring the GPU as I have no way to run LINPACK on it to benchmark it.) I should mention that the Android version of LINPACK is based on this Java Version and the low efficiency is in part due to the Java Virtual Machine.

But, overall, the Cray system with a 100 MHz clock speed has roughly 7.5 times the performance of an Android running at 1.6 GHz.

Android running LINPACK

Now we'll run the Intel LINPACK Benchmark on a Dell Precision M6400 with a 2.5 GHz Intel Core 2 Duo. At roughly 15 gflops it is an order of magnitude faster than the Cray system. The Intel chip has a Composite Theoretical Performance of 39.684 gflops, so the system is running at less then 40% efficiency. (Again, we are ignoring the GPU.) Running the Java version gives results in the 1.5 gflops range. An order of magnitude lower than the Intel version, and very close to the J916.

Laptop running LINPACK

There are newer and faster devices than these today and we're not even considering modern supercomputers, but the Cray holds up remarkably well for a system that was designed nearly 100 dog years ago.

Raw processor speed isn't everything, however. In my results above I have glossed over memory bandwidth and numerous other important details. In order to get maximum performance you need a "balanced" system. This is why the Cray comes much closer to reaching it's full potential. It is also why Apple's claim in 1999 of making a "personal supercomputer" is a bit of an exaggeration.
“Anyone can build a fast CPU. The trick is to build a fast system.” – Seymour Cray
In future posts I'll be exploring this when I describe the I/O Subsystem and the processor to memory interconnect.

No comments:

Post a Comment