The Eleven Rules of Supercomputer Design

Excerpted from: The Architecture of Supercomputers: Titan, a Case Study, Siewiorek & Koopman, Academic Press, 1991, ISBN: 0-12-643060-8

In a videotaped lecture (Bell, 1989), C. Gordon Bell listed eleven rules of supercomputer design. They sum up his lessons learned from the Titan experience, and are annotated here in light of the discussions of previous chapters.

1) Performance, performance, performance. People are buying supercomputers for performance. Performance, within a broad price range, is everything. Thus, performance goals for Titan were increased during the initial design phase even though it increased the target selling price. Furthermore, the focus on the second generation Titan was on increasing performance above all else.

2) Everything matters. The use of the harmonic mean for reporting performance on the Livermore Loops severely penalizes machines that run poorly on even one loop. It also brings little benefit for those loops that run significantly faster than other loops. Since the Livermore Loops was designed to simulate the real computational load mix at Livermore Labs, there can be no holes in performance when striving to achieve high performance on this realistic mix of computational loads.

3) Scalars matter the most. A well-designed vector unit will probably be fast enough to make scalars the limiting factor. Even if scalar operations can be issued efficiently, high latency through a pipelined floating point unit such as the VPU can be deadly in some applications. The P3 Titan improved scalar performance by using the MIPS R3010 scalar floating point coprocessing chip. This significantly reduced overhead and latency for scalar operations.

4) Provide as much vector performance as price allows. Peak vector performance is primarily determined by bus bandwidth in some circumstances, and the use of vector registers in others. Thus the bus was designed to be as fast as practical using a cost-effective mix of TTL and ECL logic, and the VRF was designed to be as large and flexible as possible within cost limitations. Gordon Bell's rule of thumb is that each vector unit must be able to produce at least two results per clock tick to have acceptably high performance.

5) Avoid holes in the performance space. This is an amplification of rule 2. Certain specific operations may not occur often in an "average" application. But in those applications where they occur, lack of high speed support can significantly degrade performance. An example of this in Titan is the slow divide unit on the first version. A pipelined divide unit was added to the P3 version of Titan because one particular benchmark code (Flo82) made extensive use of division.

6) Place peaks in performance. Marketing sells machines as much or more so than technical excellence. Benchmark and specification wars are inevitable. Therefore the most important inner loops or benchmarks for the targeted market should be identified, and inexpensive methods should be used to increase performance. It is vital that the system can be called the "World's Fastest", even though only on a single program. A typical way that this is done is to build special optimizations into the compiler to recognize specific benchmark programs. Titan is able to do well on programs that can make repeated use of a long vector stored in one of its vector register files.

7) Provide a decade of addressing. Computers never have enough address space. History is full of examples of computers that have run out of memory addressing space for important applications while still relatively early in their life (e.g., the PDP-8, the IBM System 360, and the IBM PC). Ideally, a system should be designed to last for 10 years without running out of memory address space for the maximum amount of memory that can be installed. Since dynamic RAM chips tend to quadruple in size every three years, this means that the address space should contain 7 bits more than required to address installed memory on the initial system. A first-generation Titan with fully loaded memory cards uses 27 bits of address space, while only 29 bits of address lines are available on the system bus. When 16M bit DRAM chips become available, Titan will be limited by its bus design, and not by real estate on its memory boards.

8) Make it easy to use. The "dusty deck" syndrome, in which users want to reuse FORTRAN code written two or three decades early, is rampant in the supercomputer world. Supercomputers with parallel processors and vector units are expected to run this code efficiently without any work on the part of the programmer. While this may not be entirely realistic, it points out the issue of making a complex system easy to use. Technology changes too quickly for customers to have time to become an expert on each and every machine version.

9) Build on other's work. One mistake on the first version of Titan was to "reinvent the wheel" in the case of the IPU compiler. Stardent should have relied more heavily on the existing MIPS compiler technology, and used its resources in areas where it could add value to existing MIPS work (such as in the area of multiprocessing).

10) Design for the next one, and then do it again. In a small startup company, resources are always scarce, and survival depends on shipping the next product on schedule. It is often difficult to look beyond the current design, yet this is vital for long term success. Extra care must be taken in the design process to plan ahead for future upgrades. The best way to do this is to start designing the next generation before the current generation is complete, using a pipelined hardware design process. Also, be resigned to throwing away the first design as quickly as possible.

11) Have slack resources. Expect the unexpected. No matter how good the schedule, unscheduled events will occur. It is vital to have spare resources available to deal with them, even in a startup company with little extra manpower or capital.

A final comment from (Bell 1989) is that supercomputers are extremely complex and difficult to build. It takes $50 million to start up and ship a product for a small-scale supercomputer startup company. The entry cost is high, and the job is tough. But, the team at Stardent has shown that it can be done successfully.

Phil Koopman -- koopman@cmu.edu