Fornax Chimiæ: Supercomputing

Showing posts with label Supercomputing. Show all posts

Tuesday, July 29, 2014

Silicon to Supercomputer

The J90 logic is implemented using application-specific integrated circuit (ASIC) chips fabricated by IBM. There are 10 unique ASICs that are found in the processor and memory modules. A typical J90 system could contain about 230 of these CMOS chips. The photo below shows a processor module with the cover removed. Each module contains 4 scalar/vector processors. The space at the top of the board can be used for optional HIPPI interfaces or Y1 Channels to additional I/O Processors.

A Cray J90 quad processor module.

The ASIC chip types are:

MBI - DRAM memory interface
MAD - Memory side of memory crossbar for read data
MAR - Memory side of memory crossbar for write data
VA - CPU side of memory crossbar for write data
VB - CPU side of memory crossbar for read data
CI - Channel interface (I/O)
JS - Shared registers for multi-CPU applications
PC - Scalar processor and processor control
VU - Vector processor
MC - Maintenance and clock distribution

There is only one chip (called PC) for each scalar processor and one additional chip (called VU) for each vector processor. There are only 8 chips on each processor module for the CPUs and the rest of the 18 out of 26 chips are used for communication between processors or between the processors and the memory banks. This circuitry is the key to a "balanced" system where the memory bandwidth is great enough to sustain the rate at which the processors can operate on the data.

Jedi vs. the Droids

How does the performance of a 20 year old supercomputer compare to the devices that we use today? Let's compare the Cray J916 to a recent laptop and a smart phone.

According to an archived copy of the Cray J90 Series webpage the vector processors have a theoretical peak performance of 200 mflops each, giving our 8 CPU system 1.6 gflops. But, your mileage may vary depending on the code that is running. One of the standard benchmarks used for supercomputers is LINPACK. Results for a J916 with the same configuration as ours are listed in Performance of Various Computers Using Standard Linear Equations Software by Jack J. Dongarra from June 1995. An 8 CPU system was measured at 1.436 gflops, an efficiency of about 90% of the theoretical peak.

"Just right for you" - it certainly is for us...

Next we'll run Linpack for Android. My Samsung Galaxy Note II has a 1.6 GHz ARM Cortex-A9 with four cores. Running LINPACK multi-threaded gives about 200 mflops, just a little faster than a single J90 processor. So, yes, that is (nearly) a mid-1990s entry level supercomputer in my pocket. At least on paper. We're really just exercising the ability to do floating point calculations, and this is not necessarily a good measure of system throughput on a real problem.

I estimate that the theoretical peak performance of the ARM is about 3 gflops or so, giving well below 10% efficiency. (I'm ignoring the GPU as I have no way to run LINPACK on it to benchmark it.) I should mention that the Android version of LINPACK is based on this Java Version and the low efficiency is in part due to the Java Virtual Machine.

But, overall, the Cray system with a 100 MHz clock speed has roughly 7.5 times the performance of an Android running at 1.6 GHz.

Galaxy Collision

Two galaxies colliding and merging.

Generated using the simulation code GADGET-2 running on a small number of nodes on the HPC cluster at the Center for Computation & Visualization and rendered using IFrIT. The T= shows simulation time in billions of years.

The source code computes the gravitational forces between the ordinary matter within the galaxies and the dark matter halos surrounding them. The dark matter is not rendered in this visualization. The code was created in 2000 and last updated in 2005. It is optimized for massively parallel computers with distributed memory.

A somewhat better quality version can be viewed here.

Sunday, July 6, 2014

The Theory Cluster

The use of high performance computers has had a tremendous impact on the progress of science. Theses machines have enabled us to advance our understanding of everything from elementary particles to the large scale structure of the universe. The fastest systems of any era are referred to as supercomputers. For many years supercomputing was synonymous with the machines designed by Seymour Cray at Control Data Corporation and later at Cray Research.

Supercomputers have always been very large and expensive. They require a large amount of electrical power and exotic cooling systems. They are typically a shared resource only used at large government research laboratories and academic institutions. By the late 1980s a new class of minisupercomputer was introduced. With a price starting at less than one million dollars these smaller air-cooled systems could exclusively be used by a research group or academic department.

In the mid 1990s the Brown University Department of Physics was the first physics department in the U.S. to acquire a Cray system. In Augusts 1995 a Cray EL98 was installed. This was followed in late 1996 with the installation of a Cray J916. They were used for high-energy and condensed matter theoretical physics. Details of the research are at the Computational High Energy Physics group page.

The Theory Cluster Machines webpage of the High Energy Physics Group at Brown University. The page was created in late 1995 and includes a publicity photo of the Cray EL98 that had just been installed. The snapshot was captured using NCSA X Mosaic on a SPARCstation 5 running Solaris.

Fornax Chimiæ