S-1 (supercomputer)

S-1, short for Stanford-1, was a supercomputer designed at Lawrence Livermore National Laboratory (LLNL) by Lowell Wood's "O-group" beginning in 1975. It was developed primarily by the engineering department at Stanford University while the MIT AI-lab designed its Amber operating system. Funding was provided by the US Navy.

The basic design used a core "uniprocessor" design that could be connected together in a multiprocessor configuration. Early designs supported up to sixteen uniprocessors connected together using a crossbar switch to sixteen memory banks up to 1 GiB each. The uniprocessors also had cache memory to reduce the number of trips through the switch, and there was a separate system for quickly passing small amounts of data between the processors.

The immediate goal was to produce a single-processor machine with the performance of the CDC 7600 for much lower cost. This would be quickly followed by one with 16 faster processors, each with the performance of the Cray-1 and an aggregate machine performance about 10 times the Cray. This would be followed by process-shrinks, culminating in the Mark V design, planned for 1985, that would be a "supercomputer on a wafer".^[1]

The single-processor Mark I was completed in 1978, but the follow-up multiprocessor Mark IIA was repeatedly delayed until around 1985. The IIA proved to be highly unreliable and was abandoned after about a year of use. None of the later generations of the design were built, and the S-1 project ended in 1988. The only lasting legacy of the project was the CAD program used to design it, known as SCALD, which became a successful 3rd party product.

History

Background

Lowell Wood was a physicist at LLNL and protege of Edward Teller. Wood ran the speculative "O-group" within the lab, which was not tied specifically to weapons design. In the early 1970s, Wood noticed that other branches of the military were not using computers in places where he felt they could be useful. In particular, he noticed that the US Navy's SOSUS system was gathering much more information than they had the ability to process.^[2] The high cost of a computer capable of processing the data, around $10 million (equivalent to $58,435,000 in 2024), was too high in an era of tightening military budgets for it to be practical.^[3]

Wood's concern with contemporary designs was that they were built in a way that made it difficult for them to adapt to the latest technologies in the chip fabrication world. He proposed a concept specifically to take advantage of these advances as they became available.^[4] The idea was to use a simplified CPU design that could be implemented with medium scale integration chips, then move to large scale integration (LSI) and finally to a single-chip implementation. Ultimately, the goal was to combine a number of these single-chip designs along with memory onto a single wafer scale integration supercomputer.^[1]^[5]

Wood was an interviewer for the Hertz Foundation scholarships, which put him in touch with many of the brightest students in many fields.^[1] Wood would often offer Hertz applicants summer jobs at Livermore to those in applicable fields.^[6] Through this process, Wood hired Curt Widdoes in 1973. In 1975, Widdoes was using his Hertz scholarship to pay his way through the Ph.D. program in computer science at Stanford University, where he was working on the Minerva multiprocessor system. During the summer of 1975, Tom McWilliams was also given a summer job at the lab. Wood called Widdoes, introduced the two and convinced them to begin the design of a supercomputer.^[6] Wood sold the concept to the Navy who funded development.^[2]

The Livermore Computer Center, the organization within Livermore who provided computing support, was upset by Wood's S-1 to the point of open hostility.^[1] The Center had been leading the development of supercomputer systems with a variety of commercial vendors, notably Control Data Corporation (CDC) and Cray Inc, and had recently been involved in the aborted development of the Westinghouse SOLOMON, the first massively parallel computer. When SOLOMON ended, they funded development of the CDC Star-100, the first vector computer.^[7]

SCALD

Widdoes had used the SUDS (Stanford University Drawing System) while working on the Minerva design. SUDS allowed you to lay out individual chips on-screen and the provide a list of connections between them which it would then plot as a complete schematic diagram. For the S-1 project, Widdoes expanded on this system to produce SCALD, short for Structured Computer-Aided Logic Design.^[1] The basic idea behind SCALD was to build SUDS models out of other SUDS models in a programmed fashion.^[8]

For instance, an individual adder circuit in a design might consist of five LSI chips connected together with various wires. The uniprocessor's arithmetic logic unit (ALU) might contain 36 of these adders. Using SCALD, the adder could be designed once, and then imported 36 times into the ALU design, and then the ALU could be imported into the machine's overall design. In this fashion, the design could be worked on in a hierarchical fashion, and corrections to the macros would work their way up through the entire design. The system later added the ability to take the designs and produce a list of instructions for an automated wire wrap machine, allowing SCALD to be used from initial design to actual physical boards.^[9]

During the early design stages, Widdoes and McWilliams had to use borrowed time on the DEC PDP-10 at the Stanford Artificial Intelligence Laboratory (SAIL). They were given time by John McCarthy, who allowed them to use it from 5 AM until the "real users" showed up at 9.^[10] The wrote the system in the Pascal programming language, with the aim of allowing it to be ported to other machines like the IBM 370. At the time, only Pascal and FORTRAN were highly portable, and FORTRAN was deemed unsuitable.^[11]

Mark I

The first generation S-1, Mark 1, was completed in 1978. The core logic was deliberately based on the PDP-10, using an expanded version of the PDP-10 instruction set. Like the PDP-10, the Mark I used a 36-bit word length with 9-bit bytes.^[12]^[a] The FPU had three formats, 18-bit halfword, 36-bit single word, and 72-bit double word. There were 32 registers, mapped onto the first 32 words of memory. Register 3 (R3) was the program counter.^[13] Lacking support for virtual memory address translation, a key concept was the use of 18-bit relative pointers which offset from the current address. These could be used as pointers into blocks of memory who's base address might move but who's relative addresses would not.^[13]

The CPU included a simple branch prediction unit, a system that allows the CPU to guess which side of a branch will be taken and begin processing those instructions before the test has completed.^[14] This is a key concept in processor pipelining that adds parallelism to the CPU. The Mark I's system was based on two status bits, the "prediction bit" and the "dynamic reverse bit". The reverse bit was set if the branch prediction failed, that is, the CPU guessed the wrong outcome. If this occurred a second time, the reverse bit would already be set, in which case it was cleared and the prediction bit was set. This caused the processor to flush the pipeline.^[13]

The system also relied on large, for the era, caches to reduce the need to fetch from main memory. Each processor had a 4 kW data cache and a separate 4 kW instruction cache. In a multi-processor machines all the processors shared access to a common main memory which could be up to 16 GB, although any one processor could only access 1 GB bank at a time.^[13]

In contrast to the other high-performance machines that were emerging at the same time, like the Star-100 and Cray-1, the S-1 did not have an explicit vector processor and lacked a scatter/gather system. Math was entirely discrete, although it did include several unusual hardware implementations like single-instruction trigonometry, matrix transpose, and even a Fourier transform.^[13]

The processor was constructed using chips from the ECL-10k family, which used emitter-coupled logic transistors.^[14] Although the system was designed to have as many as 16 processors, the Mark I was built with only one. This required 5,300 chips, arranged into twelve boards, each about 18 by 24 inches.^[15] The boards were arranged on either side of three vertical cabinets, known as "pages". The pages were hinged along the back so they could be closed up book-like into a relatively compact form, while still being able to be opened for servicing.^[13]

Like the CDC designs, the S-1 used I/O processors (IOP, or channel controllers) to handle input and output. As most devices worked on 8-bit bytes, the IOPs also handled translation from the 9-bit to 8-bit formats and back again, as well and dealing with endianness. The original IOP was simply a custom programmed DEC PDP-11 connected to the S-1 using Unibus.^[13]

The goal was to have the single-CPU machine match the performance of the CDC 7600, the fastest machine available at the time.^[14] Benchmarks placed it at about 1⁄3 of that, around 10 MIPS. This was respectable performance for a machine of its size, and especially cost. Although it did not meet its original performance goal, it did meet the goal of producing a useful high-performance machine for much less cost, and was a success in that regard.^[2]

Mark IIA

Work began on the second generation machine, Mark II, in the fall of 1978. The expected delivery of a working multiprocessor version was some time in 1983. The design of the Mark IIA was done using an updated version of SCALD running on the Mark I. This also included the addition of the Timing Verifier module, which could run simulations of the signals travelling through the circuitry to ensure there was enough time for the inputs to become valid.^[12] It also included the fully-automated wire wrap system, which was built due to problems with manual wiring leaving bits of metal in the dense mats of twisted pair wires that would sometimes pierce the insulation on surrounding wires.^[16]

The main team on the Mk. II started with Widdoes and McWilliams, but they were joined by Mike Farmwald and Jeff Rubin. The team in total also included a host of technicians actually building the machine; about 20 to 30 people in total. The logic design itself required the team of four and took two and a half years to complete.^[8] During this process, Widdoes and McWilliams left the project to commercialize SCALD at their new company, Valid Logic Systems.^[15]

One simple change to the Mark II was to move from the ECL-10k to ECL-100k chips, providing 15 MIPS due to increased clock speeds.^[17] The design also added 4 kW caches for instruction and 16k cache for data to improve memory performance. The main change was a greatly expanded floating point unit (FPU) with support for transcendental functions, as well as the first implementation of vector instructions. These included a hardware implementation of a Fast Fourier Transform solver. The vector instructions were memory-memory, like the Star-100 but unlike the Cray-1's register-register architecture, and it lacked a scatter/gather unit, so it worked best on large contiguous data sets.^[15]

This greatly expanded uniprocessor functionality required 64 boards to implement, with a total of 25,000 ECL-100k chips.^[17] The effort was continually pushed back, missing several completion dates mentioned at conferences or reports, and did not become fully functional by 1985, and only in a single-processor form. While it did meet its goal of matching performance of the Cray-1, by this time much faster computers were available, including the Cray-2.^[2]

The machine proved to be very difficult to keep operating due to the enormous number of wire wrap connections, leading to failures roughly every week. These would require the machine to be opened up and the offending wrap fixed.^[18] After about a year, the team tired of the constant maintenance and the machine was abandoned. Additional processors were apparently under construction, but it is not clear if any were completed or connected to the first.

Mark IIB/AAP

At some point during the construction of the Mark IIA, attention turned to an improved Mark IIB, which was later known as the "Advanced Architecture Processor", or AAP. This version abandoned the original PDP-10-like instruction set and moved to one that was RISC-like. It also replaced the original crossbar switch with a ring-like design that allowed up to 256 processors on the bus.^[2]

It is not clear how much actual work on the AAP was carried out, and no complete machine was ever built. By the time it was being designed, many of the original S-1 team had left for industry, where RISC designs were starting to come to market. The R2000, released in January 1986, offered roughly the same performance as the Mark IIA's CPU but shrunk to a four-chip implementation that easily fit in a desktop case.^[2]

Notes

^ At the time, "byte" referred to any fixed length of bits, whereas today it almost always refers to 8 bits.

References

Citations

^ ^a ^b ^c ^d ^e MacKenzie 1998, p. 120.
^ ^a ^b ^c ^d ^e ^f Smotherman 2013.
^ MacKenzie 1998, p. 119.
^ Widdoes 1979, pp. 2–3.
^ Wood 1985, p. 5.
^ ^a ^b Stump 2008, p. 2.
^ MacKenzie 1998, pp. 118–119.
^ ^a ^b Stump 2008, p. 9.
^ Stump 2008, pp. 3–4.
^ Stump 2008, p. 4.
^ Stump 2008, p. 12.
^ ^a ^b Stump 2008, p. 14.
^ ^a ^b ^c ^d ^e ^f ^g Hockney & Jesshope 1988, p. 39.
^ ^a ^b ^c Stump 2008, p. 3.
^ ^a ^b ^c Stump 2008, p. 7.
^ Stump 2008, p. 8.
^ ^a ^b Wood 1985, p. 17.
^ Wood 1985, p. 54.

Bibliography

Hockney, R.W; Jesshope, C.R (1988). Parallel Computers 2: Architecture, Programming and Algorithms. CRC Press.
"The S-1 Project: developing high-performance digital computers". Energy and Technology Review. Lawrence Livermore National Laboratory. September 1979. pp. 1–15.
MacKenzie, Donald (1998). Knowing Machines: Essays on Technical Change. MIT Press.
Murray, Michael (20 June 1990). Diagnostics on the S-1 Advanced Architecture Processor (Technical report). Lawrence Livermore National Laboratory.
Smotherman, Mark (April 2013). "S-1 Supercomputer (1975-1988)". Clemson University.
Wood, Lowell (11 April 1985). The S-1 Project: Advancing the Digital Computing Technology Base for National Security Applications (PDF) (Technical report). Lawrence Livermore National Laboratory.
McWilliams, Tom; Widdoes, Curt (12 February 2008). "SCALD Oral History" (PDF) (Interview). Interviewed by Stump, Holly. Computer History Museum.
Widdoes, Curt (1979). S-1 Project: developing high-performance digital computers (Technical report). Lawrence Livermore National Laboratory.

[13] At the time, "byte" referred to any fixed length of bits, whereas today it almost always refers to 8 bits.

[FOOTNOTEMacKenzie1998120-1] MacKenzie 1998, p. 120.

[FOOTNOTESmotherman2013-2] ^ ^a ^b ^c ^d ^e ^f Smotherman 2013.

[FOOTNOTEMacKenzie1998119-3] MacKenzie 1998, p. 119.

[FOOTNOTEWiddoes19792–3-4] Widdoes 1979, pp. 2–3.

[FOOTNOTEWood19855-5] Wood 1985, p. 5.

[FOOTNOTEStump20082-6] Stump 2008, p. 2.

[FOOTNOTEMacKenzie1998118–119-7] MacKenzie 1998, pp. 118–119.

[FOOTNOTEStump20089-8] Stump 2008, p. 9.

[FOOTNOTEStump20083–4-9] Stump 2008, pp. 3–4.

[FOOTNOTEStump20084-10] Stump 2008, p. 4.

[FOOTNOTEStump200812-11] Stump 2008, p. 12.

[FOOTNOTEStump200814-12] Stump 2008, p. 14.

[FOOTNOTEHockneyJesshope198839-14] ^ ^a ^b ^c ^d ^e ^f ^g Hockney & Jesshope 1988, p. 39.

[FOOTNOTEStump20083-15] Stump 2008, p. 3.

[FOOTNOTEStump20087-16] Stump 2008, p. 7.

[FOOTNOTEStump20088-17] Stump 2008, p. 8.

[FOOTNOTEWood198517-18] Wood 1985, p. 17.

[FOOTNOTEWood198554-19] Wood 1985, p. 54.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[a]

[13]

[14]

[15]

[16]

[17]

[18]