Stack Computers: the new wave © Copyright 1989, Philip Koopman, All Rights Reserved.
This appendix is a survey of the stack architectures included in Chapter 2. It includes most stack machines which have been presented in conferences and described in journals. Each machine has a summary of its taxonomy category, implementation technology, problem solving applications, and design history. Additionally, each entry has references and a summary description.
Taxonomy category: SL1
Implementation: 16-Bit minicomputer
Applications: Direct APL execution, military environment
Who and when: Raytheon for the US Navy, 1971
References: Nissen, S. & Wallach, S. (1973)
The AADC (All Applications Digital Computer) was designed for direct execution of the APL language. The target application area was Naval platforms (especially Naval aircraft), so small size and weight were important. The APL language was chosen for efficient machine code and execution. In particular, APL was chosen for its conciseness, which was predicted to give smaller programs and therefore fewer page faults in a virtual memory environment. The AADC converted expressions from infix to Polish notation on-the-fly at execution time. Programs were interpreted at run-time by the program management unit and executed by an arithmetic processor. The execution unit used 1-operand stack notation.
Taxonomy category: SS1
Implementation: 16-Bit microcoded silicon-on-sapphire
Applications: Radiation hard for military use, multi-tasking
Who and when: Rockwell International, 1981
References: Best et al. (1982)
The AAMP (Advanced Architecture MicroProcessor) was designed for military and space use. A stack architecture was chosen for ease of compilation and good code density since it can use mostly 1-byte instructions. AAMP uses a single stack with a frame pointer for activation records as well as expression evaluation with a separate stack pointer. The expression evaluation area is just on top of the current frame. Many instructions are 1 byte long, with the possibility of using local variable addresses relative to the frame pointer for 1-operand addressing. Four top-of-stack registers are used for evaluation, with spillage into program memory.
Taxonomy category: MS0
Implementation: 16-Bit microcoded bit-sliced
Applications: Direct execution of Forth
Who and when: Computer Tools, 1979
References: Rust (1981)
The ACTION Processor FORTHRIGHT is a microcoded Forth-language processor. Typical of Forth hardware implementations, it has a data stack used for expression evaluation and parameter passing as well as a return address stack used for subroutine return address storage. The top elements of both stacks are kept in registers in the bit slices. Stacks reside in program memory to reduce hardware costs.
Taxonomy category: SS0
Implementation: 64-Bit processor
Applications: High reliability, multiprocessor spacecraft computer
Who and when: Intermetrics, 1973
References: Miller & Vandever (1973)
The Aerospace Computer used stack instructions to save program memory space, which has a major impact on reducing size, weight, power, and cost for spacecraft applications. Stack instructions were also chosen to direct support high order languages to enhance software reliability. The design draws heavily from the B6700 architecture. All computation was done in floating point in the ALU, with integers converted to floating point format as fetched from memory. When set, the highest bit of an operand on the stack indicated that the element was really a pointer to memory, which caused an transparent fetch.
Taxonomy category: ML0
Implementation: Emulator on various early European computers
Applications: Conceptual machine emulated for transportable ALGOL
programming
Who and when: ALCOR joint project, 1958-60
References: Samelson & Bauer (1962)
The ALCOR (ALgol COnverteR) joint project was a very early conceptual design for the interpretation of ALGOL 60. The European group devised a high level language machine architecture which was emulated on various machines. The conceptual machine had two stacks which were used for expression parsing and evaluation. One stack held pending operations, while the other stack held intermediate results. Variables and return addresses were statically allocated in program memory.
Taxonomy category: ML0
Implementation: Research prototype project
Applications: Direct execution of ALGOL
Who and when: Burroughs, 1961
References: Anderson (1961)
The exploration for a direct execution architecture was motivated by the observation that two-thirds of computer time was then spent doing compilation and debugging. The focus of the research was on making computers easier to use. The approach taken was to directly execute a high level language. The machine discussed used three hardware stacks to execute ALGOL constructs. Two stacks formed a value and operator stack pair for expression parsing and evaluation. The third stack held subroutine return address information.
Taxonomy category: SL2
Implementation: 32-Bit microprocessor
Applications: General purpose RISC processor
Who and when: Advanced Micro Devices (AMD) 1987
References: Johnson (1987)
The AM29000 is a RISC processor. While its instructions are not stack-oriented, it provides considerable hardware support for stacks for parameter passing for high level languages. It has 192 registers, 64 of which are conventional registers, the other 128 of which are used as a stack cache. A stack frame pointer into the register file provides relative addressing of registers. If the stack cache overflows, it is spilled to program memory under software control. The chip has the capability of dividing the 256 register address space into 16 banks for multi-tasking. Each instruction may access registers either globally, or based on the register stack pointer.
Taxonomy category: MS0
Implementation: Microcoded emulation on IBM 360/25 with WCS
Applications: Direct execution of APL
Who and when: International Business Machines, 1973
References: Hassitt et al. (1973)
APL is an inherently interpreted language, so creating an APL direct execution machine is an attractive alternative to interpreters on conventional machines. This project used an IBM 360 Model 25 with writable control store to emulate an operational APL machine. The machine used two stacks resident in program memory: one for expression evaluation, the other for temporary allocation of variable space.
Taxonomy category: SS1
Implementation: 32-Bit microcoded emulation on a B1700
Applications: Block structured language execution
Who and when: State University of New York at Buffalo, 1972
References: Lutz (1973)
The BSM (Buffalo Stack Machine) was a microcoded emulation of a stack architecture that ran on a Burroughs B1700 system. The architecture was designed to support ALGOL-60 type languages. Variables were stored as tagged data in memory with 32 data bits and 4 tag bits. Interrupts were treated as hardware invoked procedure calls, thus saving state automatically on the stack. A sticky point with doing this was that interrupts on stack overflow/underflow had to be made before the stack is completely full/empty to prevent a system crash.
Taxonomy category: SS0
Implementation: A family of minicomputers
Applications: General purpose multi-user computing
Who and when: Burroughs Corporation: 1961-77 (and beyond)
References: Carlson (1963), Doran (1979), Earnest (1980), Organick
(1973)
The Burroughs line of stack computers originated with the ALGOL-oriented B5000 machine in 1961. One of the motivations for this machine was the observation that conventional machines required compilers that were so complex that they were too expensive to run (in 1961).
The B5000 was a 0-operand pure stack machine that kept the stack elements in program memory. The top two stack elements of the B5000 were kept in special registers in the CPU. A special feature of these registers is that there were hardware status bits that allowed 0, 1, or 2 of the registers to contain valid data. This reduced the amount of memory bus traffic by eliminating redundant reads and writes to memory (for example, a POP followed by a PUSH would not cause the same value to be read in from memory, then written back out). The B7700 elaborated on this scheme by using a 32-register stack buffer.
The stacks on these machines were used both for expression evaluation and stack frames for ALGOL procedure calls. Thus, return addresses were interleaved with parameters on the stack. One of the advantages to keeping the stacks resident in program memory was rapid response to interrupts and a low cost for task swapping. Stacks enabled the hardware to treat procedure calls, interrupts, and task calls in a uniform manner.
Taxonomy category: SS0
Implementation: 8-Bit microcoded VLSI chip
Applications: University VLSI design project
Who and when: California Institute of Technology, 1979
References: Efland & Mosteller (1979)
This stack machine was implemented as a student project for a VLSI design course. The objective was to design and lay out the simplest possible computer in a two and one-half week period. To keep the design simple, the students chose a 0-operand stack machine. The stack on this machine was maintained in program memory with 2 registers containing the top two stack elements on-chip. The instruction set was patterned after the primitives needed by a student-written Pascal compiler.
Taxonomy category: SL2
Implementation: 32-Bit CMOS microprocessor
Applications: C language RISC machine.
Who and when: AT&T Bell Laboratories, 1987
References: Ditzel et al. (1987a), Ditzel et al. (1987b), Ditzel &
McLellan (1982), Ditzel & McLellan (1987)
The CRISP microprocessor is a RISC machine optimized for executing the C programming language. It is designed as a register-less machine, with all operands memory-resident. However, since the C language uses stacks to allocate storage for local parameters, most of the operand data references are to memory locations relative to a stack pointer register. To support these stack references, CRISP has a 32-element stack cache on-chip. Thus, when a memory-to-memory operation is performed on data near the top of the stack, the operands are fetched and stored using the on-chip cache. CRISP also supports branch folding, a technique where branches are executed in parallel with some instructions.
Taxonomy category: SL2
Implementation: 2-Chip microprocessor.
Applications: Experimental multiprocessor design.
Who and when: Xerox Palo Alto Research Center, 1985
References: Atkinson & McCreight (1987)
The Dragon is an experimental design created with an emphasis on compact binary instruction encodings and fast procedure calls. Variable length instructions and the use of stack-register addressing keep instruction size small while allowing the use of 3-operand instructions. The Dragon has a 128-element execution unit register stack with variable size frames implemented by a pointer pair that define the upper and lower frame bounds.
Taxonomy category: SS1
Implementation: Conceptual design
Applications: Structured programming
Who and when: Vrije University, The Netherlands, 1978
References: Tanenbaum (1978)
This often-cited paper gives a discussion of how structured programming techniques should impact machine design, then presents an example design, the Experimental Machine-1 (EM-1). The motivation behind the EM-1 is to provide an efficient environment for well-structured programs. To do this, it uses a single memory-resident stack in typical block-structured language style, and provides 1-operand addressing to access local variables on the stack frame for evaluation using the top of stack. The design skirts the issue of memory bus contention between stack items and instructions by presuming the existence of a stack cache independent of an instruction cache.
Taxonomy category: SS0
Implementation: Microcoded interpreter on IBM 360/30
Applications: Research into implementing direct high level
interpreters in microcode
Who and when: IBM Systems Development Division, 1967
References: Weber (1967)
EULER is an extension of the ALGOL programming language. The EULER project discussed by this paper was an early attempt to implement a direct interpretation machine by adding special-purpose microprogramming to a standard IBM Model 360 computer. In operation, an EULER program was compiled to an intermediate byte-code format. Each byte-code invoked a routine resident in microprogram memory. This system may well have been the first "p-coded" machine. The use of microcoded interpretation was justified for this project by the fact that EULER supports structures such as dynamic typing, dynamic storage allocation, and list processing that were poorly handled by available compilers. The EULER implementation used a single stack resident in program memory for dynamic data allocation and expression evaluation. Data operation instructions were 0-operand RPN byte codes.
Taxonomy category: ML0
Implementation: Discrete LS-TTL
Applications: Execution of Forth programming language
References: Winkel (1981)
The Forth Engine was a discrete TTL microcoded stack processor for the Forth language. In addition to a hardware stack for evaluation and subroutine parameter passing and the hardware stack for return address storage, this processor featured a 60-bit writable control store for microcode.
Taxonomy category: MS0
Implementation: Conceptual Design
Applications: Direct execution of the FORTRAN language
Who and when: University of Science and Technology of China, PRC,
1980
References: Chen et al. (1980)
This paper presents a conceptual design paper for a direct execution FORTRAN machine. The proposed machine would have several hardware stacks for return address, loop limit and branch address, and expression evaluation storage. While the implementation method was not specified, isolated memory space stacks would certainly be appropriate to reduce memory traffic. As with most other direct execution machines, stacks were mandatory to support program parsing.
Taxonomy category: ML0
Implementation: 32-Bit 2 micron silicon compiler CMOS
microprocessor
Applications: General purpose space-borne computing and control.
Optimized for the Forth language
Who and when: Johns Hopkins University, 1986
References: Fraeman et al. (1986), Hayes (1986), Hayes & Lee
(1988), Hayes et al. (1987), Williams et al. (1986)
The Johns Hopkins University/APL Forth processing chip is designed for spacecraft processing applications. The chip executes Forth primitives, and allows multiple operations to be compacted into microcode-like fields in the instruction. Although Forth is a 0-operand language, the chip allows selecting any of the top 4 stack elements to be used with the top stack element for an operation, thus making it a "1/2" operand machine. The on-chip data and return stacks are rather small: 16 elements each, forced mostly by technology constraints.
Taxonomy category: SL0
Implementation: 32-Bit processor simulation
Applications: Graph Reduction
Who and when: Oregon Graduate Center, 1985
References: Kieburtz (1985)
The G-Machine was specially built to perform graph reduction in support of executing functional programming languages. It executed G-code, which was a zero-address machine language designed to manipulate its single stack. Program memory was highly structured to support the requirements of graph reduction. Each memory word included four fields used for reference counting and two 32-bit cells used for graph pointers.
Taxonomy category: SS0
Implementation: Conceptual design
Applications: Multiple communicating processors.
Who and when: University of Washington, 1973
References: Herriot (1973)
The GLOSS conceptual design was an attempt to define a generic high level language machine for a variety of languages, including ALGOL 68, LISP 1.5, and SNOBOL 4. It was based on using a demand driven data-flow system where sub-processes were invoked on multiple parallel processors in a manner similar to procedure calls. Each processor had a set of evaluation stacks resident in memory.
Taxonomy category: SS0
Implementation: Add-on hardware to a minicomputer
Applications: Experimental minicomputer addition
Who and when: Keio University, Japan, 1974
References: Ohdate et al. (1975)
The stack hardware discussed in this paper was back-fitted onto an existing HITAC-10 minicomputer. In order to simplify the design and construction, the stack hardware was added as an I/O device on the system bus. The stack controller had four top-of-stack registers. All extra elements were stored in an area of memory using DMA. The controller had two stack limit pointers for memory bounds checking. The stack controller was used for subroutine parameter passing; no arithmetic could be performed on stack elements.
Taxonomy category: SS1
Implementation: 16-Bit Minicomputer family
Applications: General purpose multi-user computer.
Who and when: Hewlett Packard, 1976-1980's
References: Bartlett (1973), Bergh & Mei (1979), Blake (1977)
The HP3000 family is a commercial line of minicomputers based on a 1-operand stack architecture. The origins of the family may be found in the HP300 computer, which could be considered a 3-address machine that had two top-of-stack registers buffering a program memory resident stack. Later, the HP3000 series used a stack/accumulator addressing mode, and included four top-of-stack registers. The stacks were featured in the architectures to ease implementation of reentrancy, recursion, code sharing, program protection, and dynamic storage allocation in a multi-user environment. The stack is used not only for expression evaluation, but also for parameter passing and subroutine return address storage.
Taxonomy category: MS0
Implementation: 16-Bit AM2903 bit-sliced processor
Applications: Spacecraft experiment control. Optimized for the
Forth language
Who and when: Johns Hopkins University, Applied Physics Laboratory
1982
References: Ballard (1984)
The HUT processor was designed to control the Hopkins Ultraviolet Telescope (HUT) Space Shuttle experiment. At the time it was designed, no space-qualified microprocessors were powerful enough for the task, so a bit-sliced processor was custom designed for the job. The designers chose to implement a Forth language processor for simplicity of implementation, extensibility, and flexibility.
Taxonomy category: SS1
Implementation: Family of mini-computers
Applications: General purpose computing.
Who and when: ICL, 1975
References: Keedy (1977)
The designers of the ICL family were concerned with protection and code sharing in a multiprogrammed environment, as well as efficient compilation and execution with compact object code. While often compared with the contemporary Burroughs machines, the ICL machines had several distinct characteristics. One of these was the use of a stack/accumulator 1-operand addressing scheme and several specialized registers. With this capability, a register or memory location could be used with the top stack element for operations.
Taxonomy category: SS2 (when used in stack mode)
Implementation: Family of 16 and 32-bit microprocessors
Applications: General purpose computing
Who and when: Intel Corporation 1980's
References: Intel (1981)
The 80x86 processor family, which includes the 8088, 8086, 80186, 80286, and 80386 is a family of microprocessors with a general purpose register architecture. Simple PUSH and POP instructions are supported to manipulate the stack. Many high level language compilers produce code that uses the BP (base pointer) register as a frame pointer to a combined return address and parameter passing stack. When used in this mode, the 80x86 family can be considered to be doing stack processing. In the context of stack computers, the 80x86 is simply included in this listing as a representative example of a conventional machine that can be used as an SS2 architecture.
Taxonomy category: MS0
Implementation: Conceptual design
Applications: Directly interpretable languages
Who and when: North Electric Co., 1973
References: Welin (1973)
The Internal Machine was a conceptual design for a machine that could efficiently execute directly interpretable languages. A stack instruction model was picked for generality. The design specifies two stacks: one for expression evaluation and parameter passing, and a second stack for subroutine return addresses.
Taxonomy category: SS1
Implementation: Conceptual design for microcoded interpreter
Applications: General purpose computing
Who and when: Rand Corporation, 1958
References: Shaw et al. (1959)
The Information Processing Languages (IPL's) were a series of conceptual language designs for implementing high level programs. IPL-VI was a language designed meant to be implemented as an interpreted language with microcode support. IPL-VI emphasized advanced (for 1959) computing structures for non-numerical computing, especially list manipulation. A stack was used to pass parameters between subroutines. Since all memory in the IPL-VI design was formatted as list elements, the subroutine parameter LIFO consisted of a list of elements that pointed to the next element further down in the list. IPL-VI instructions used 1-operand addressing.
Taxonomy category: SS0
Implementation: 16-Bit microprocessor
Applications: Direct execution of Pascal P-code
Who and when: Nippon Electric Co., 1980
References: Tanabe & Yamamoto (1980)
The ITS processor was designed to execute UCSD Pascal P-code. The designers claimed a several-times speedup over fully compiled code on a contemporary microprocessor (presumably an 8086). The ITS had a 256-word stack on-chip, which was apparently only used for expression evaluation. The top two stack elements were kept in registers for speed.
Taxonomy category: ML0
Implementation: 48-Bit mainframe
Applications: General purpose computing using ALGOL
Who and when: English Electric, 1960
References: Allmark & Lucking (1962), Duncan (1977), Haley
(1962)
The KDF-9 was perhaps the first true stack computer. It was inspired by the advent of ALGOL-60, and introduced many of the features found on modern stack computers. The KDF-9 had an expression evaluation stack which could be used for parameter passing, as well as a separate return address stack. Unfortunately, these stacks were limited by technology considerations to only 16 elements apiece (constructed from magnetic cores!). A problem with the design was that while 16 elements is quite sufficient for expression evaluation, the ALGOL compiler was constrained by the 16-element stack depth, causing slow compilation.
Taxonomy category: ML0
Implementation: 16-Bit wide AM2903 bit-slice with writable control
store.
Applications: Academic Research
Who and when: Kobe University, Kobe Japan, 1983
References: Kaneda et al. (1983), Wada et al. (1982a), Wada et al.
(1982b)
This machine was designed to execute both Forth and Pascal efficiently using a stack architecture. Forth was executed by directly implementing Forth primitives in microcode. Pascal was executed by supporting a UCSD P-code emulator. This machine had separate data memory in addition to program memory.
Taxonomy category: SS0
Implementation: Microcoded interpreter on Varian V73
Applications: Experimental
Who and when: Group for Datalogical Research & Royal Institute
of Technology, Sweden, 1980
References: Bage & Thorelli (1980)
The LAX2 architecture was implemented as a partially microcoded interpreter with the goals of cost effective software production along with good memory and execution time economy for string manipulation and interactive applications. The architecture used tagged data types. Each process in the machine had a private memory area shared between the evaluation stack and a garbage-collected heap for temporary string storage.
Taxonomy category: ML1/MS1
Implementation: 16-Bit AM2901 bit-sliced processor
Applications: Direct execution of Modula-2 M-code and interactive
user interfaces.
Who and when: ETH (Swiss Federal Institute of Technology), 1979.
References: Ohran (1984), Wirth (1979)
Lilith was a Modula-2 execution machine developed by Niklaus Wirth. The goals of the machine were to provide efficient support for the Modula-2 language as well as an effective user interface. A stack architecture was chosen for compact code. The Lilith had two stacks: a program memory resident stack for parameter passing, and a hardware expression evaluation stack. The instruction set was stack-based, with the ability to read an element from any location in the parameter stack and operate upon it with the top element of the evaluation stack.
Taxonomy category: ML1
Implementation: Various machines. 1982 to present.
Applications: List processing, artificial intelligence research
Who and when: Various companies (such as Symbolics)
References: Lim (1987), Moon (1985), Sansonnet et al. (1982)
LISP machine architecture is a whole topic in its own right; this is simply a summary of some of the common characteristics of LISP-specific processors. LISP machines tend to have multiple stacks and 1-address (relative to top of stack) instruction formats. Some machines have substantial hardware stacks (over 1K word) that can overflow into program memory. Procedure calls tend to be very important, because of the recursion commonly used in traversing lists. These machines typically store data elements in a garbage-collected program/data memory.
Taxonomy category: SS1
Implementation: Unspecified
Applications: Execution of Modula-2 M-code (using StarMod language)
Who and when: University of Wisconsin - Madison, 1980
References: Cook, R. & Donde, N. (1982), Cook, R. & Lee, I.
(1980)
The MCODE machine was designed to execute Modula-2 M-code, which was compiled from a language called StarMod, a Modula-2 derivative. MCODE was based on Tanenbaum's EM-1 design, but with several improvements to solve problems that arise in real computers. One improvement was the use of a set-mode instruction that changed the interpretation of the data types (integer, floating point, etc.) for all subsequent operations.
Taxonomy category: SS0
Implementation: Architectural family for workstations
Applications: Graphics-intensive workstations (Alto, Dorado
machines)
Who and when: Xerox Office Products Division, 1979
References: Johnsson & Wick (1982), McDaniel (1982), Sweet &
Sandman (1982)
Mesa was actually a modular high level language expanded to include an architecture for a family of processors. The goals of the architecture were efficient implementation, compact encoding, and technology independence. In order to accomplish these goals, the Mesa architecture specified a single stack for use in expression evaluation and parameter passing to procedures for the purpose of producing compact 0-operand stack instructions. While the stack implementation was not specified, Mesa follows the general pattern of machines with a small stack buffer and the bulk of the stack in program memory. An interesting instruction was the "recover" operation, which captured a previously popped stack value that had not yet been overwritten by doing a stack push without writing a value.
Taxonomy category: ML0
Implementation: TTL discrete components, 16-bit machine
Applications: General purpose processing
Who and when: Xycom & Advance Processor Designs, 1987
References: Burnley & Harkaway, 1987
The Advanced Processor Design MF1600, which is the processor used on the Xycom XVME-616 product, is a high performance Forth machine design that makes use of fast TTL logic devices. It features a 16-bit data path and a microcode memory ROM that can be customized by the manufacturer for specific applications.
Taxonomy category: SL1
Implementation: Simulated machine
Applications: Functional Programs
Who and when: University of Utah, 1982
References: Castan & Organick (1982)
The micro-3L processor project used the 3L-model (Lisp Like
Language model) for specifying a processor that is well suited to
list processing. The project proposed creating a multiprocessor system to
execute functional languages. Each micro-3L processor was to use a 256-element
register file. 128 of the registers were intended to be used as a return
address stack, with overflow handled by swapping into main memory. Data
manipulations were performed using an Accumulator and an operand from the
bottom 128 elements of the register file.
Taxonomy category: SS0
Implementation: Microcode upgrade to a 16-bit register-oriented
minicomputer.
Applications: Running a version of PL/I
Who and when: Microdata Corporation, 1973
References: Burns & Savitt (1973)
The Microdata 32/S was a version of the Microdata 3200 general-purpose minicomputer that had additional microcode to implement stack instructions. The 3200 system was a 16-bit minicomputer implemented in discrete TTL technology. The reason for adding the stack-based capabilities was that compilers of the time could not produce efficient code. Stack architectures made code generation easier. The reason good code generation is important was to remove the impetus for programming in assembly language. The main memory stack was used for expression evaluation and parameter passing, with up to four stack elements buffered in registers.
Taxonomy category: MS0
Implementation: 16-Bit 2.0 Micron HCMOS gate array
Applications: Low cost real time control
Who and when: Minimum Instruction Set Computer, Inc., 1988
References: MISC (1988)
The MISC M17 microprocessor is a low cost, embedded micro-controller. The M17 instruction set is based on Forth primitives. In contrast with most other Forth machines, the M17 reduces hardware costs at some compromise in performance by keeping its two stacks in program memory with a few top-of-stack buffer registers on-chip.
Taxonomy category: MS2 (when used in stack mode)
Implementation: Family of 32-bit microprocessors
Applications: General purpose computing
Who and when: Motorola Corporation 1980's
References: Kane et al. (1981)
The 680x0 processor family, which includes the 68000, 68010, 68020, and 68030, is a family of microprocessors with a general purpose register architecture. Registers are divided into two groups: address registers and data registers. The address registers support postincremented and predecremented addressing modes. This allows a programmer to use up to eight stacks, one stack per address register. By convention, the A7 register is used as the stack frame pointer for most languages. Of course, the 680x0 family is usually not used as a multiple-stack machine, but nonetheless this capability exists.
Taxonomy category: SS1
Implementation: Minicomputer
Applications: Research
Who and when: University of Manchester, 1971
References: Morris & Ibbett (1979)
The MU5 used a 1-operand instruction format with a single stack in program memory. Stack instructions were used because they led to easy code generation, compact programs, and easily pipelined hardware. An interesting twist is that there were five registers accessible to the programmer, all of which were simultaneously at the "top-of-stack." Pushing a value into any register pushed the previous register value onto the single stack. This arrangement is subtly different from having the top five stack elements accessible as registers.
Taxonomy category: ML0
Implementation: 16-Bit Gate Array processor
Applications: Real Time Control, direct support for Forth
programming language
Who and when: Novix, 1985
References: Golden et al. (1985), Jennings (1985), Miller (1987),
Novix (1985)
The NC4000, later renamed the NC4016, was the first chip designed to execute Forth. Since it is on a gate array, the two hardware stack memories reside off-chip, connected to the processor by dedicated stack busses. The top two data stack elements are buffered on-chip, as is the top return stack element. The processor executes most Forth primitives including subroutine call in a single clock cycle, and allows multiple primitive operations to be compressed into a single instruction in some circumstances.
Taxonomy category: SL0
Implementation: Experimental machine using MSI/LSI standard logic
and gate arrays
Applications: Functional programming/graph reduction
Who and when: Burroughs Corporation Austin Research Center, 1986
References: Scheevel (1986)
The Normal Order Reduction MAchine (NORMA) is a research processor developed by Burroughs for high speed graph reduction operations in support of functional programming languages. Five specialized functional units that handle arithmetic, graph memory, garbage collection, graph processing, and external I/O are connected using a central bus. The graph processor maintains a single stack used during the depth-first traversal of the tree-structured program graphs.
Taxonomy category: ML0
Implementation: Emulator running on Lilith computer
Applications: Support for Pascal & Modula-2 programs
Who and when: Federal Institute of Technology, Zurich, 1984
References: Schulthess (1984)
The Object Pascal Architecture (OPA) is a design for a machine that efficiently executes compiled Pascal code. The OPA contains three stacks: one for descriptors and expression evaluation, one for storing subroutine parameters, and one for return addresses. The OPA instruction set is billed as a "reduced high level language" instruction set, since it supports Pascal constructions with a small number of opcodes.
Taxonomy category: ML0
Implementation: Experimental processor
Applications: Direct execution of Tiny-Pascal source code
Who and when: University of Maryland, 1981
References: Lor & Chu (1981)
The Pascal interactive computer is an experimental system for direct execution of Pascal source code. Since the system includes a hardware compiler as well as execution unit, hardware stacks in the system abound. Some of the stacks are used to store return addresses, operator precedence, expression evaluation values, and subprogram nesting levels. Since expressions are evaluated as they are interpreted, the actions taken by the execution unit are the same as would be taken by a 0-operand stack architecture.
Taxonomy category: MS1 (when used in stack mode)
Implementation: Family of mini & microcomputers (also, later
the VAX family)
Applications: General purpose mini-computer
Who and when: Digital Equipment, 1970
References: Bell et al. (1970)
The DEC PDP-11 was an early general-purpose computer to integrate stack usage into a general-purpose register machine. While the machine is clearly register-oriented, it includes as a subset of its capabilities those of a one-address stack machine. By using register-indirect addressing with auto-postincrement and auto-predecrement, a general-purpose register can be used as a stack pointer for an evaluation stack. The PDP-11 also has a stack pointer for use with interrupts, traps, and subroutine calls. Later, the VAX line of computers introduced hardware support for single-stack dynamic frame allocation for block-oriented languages. Of course the PDP-11 is really a general-purpose register machine, but Bell's article describes how it can be used in an MS1 stack mode.
Taxonomy category: SS1
Implementation: Bit-sliced processor (AMD 290x)
Applications: Research into emulating intermediate forms for block
structured languages
Who and when: Stanford University, 1980
References: Harris (1980)
The Pascal Oriented MicroProcessor (POMP) project used a bit-sliced processor to execute stack code. Stack code was chosen to reduce program size from 3 to 8 times smaller than traditional compiler outputs. In fact, the POMP code was claimed to be only 50% larger than Flynn's ideal DEL encoding, but is much easier to decode since it was encoded in byte-wide blocks. The stack machine could accesses up to 8 local variables for operations, making it a 1-operand machine.
Taxonomy category: ML2
Implementation: Architectural proposal
Applications: General purpose computing
Who and when: University of Illinois, 1985
References: Eickemeyer & Patel (1985)
The Parallel Stack Processor (PSP) architecture is an attempt to preserve the function of a normal general purpose register machine yet reap the benefits of having hardware stacks for saving registers on a subroutine call. To accomplish this, the machine hides a stack behind every register in the machine. Whenever a subroutine call is encountered, each register is pushed onto its own stack simultaneously, performing a single-cycle multiple register save. Strictly speaking, this is more of a register machine that has hardware to save registers than a stack processor architecture, but the idea is intriguing for other stack applications.
Taxonomy category: SL2
Implementation: 32-Bit minicomputer
Applications: General purpose RISC processor
Who and when: Pyramid Technology, 1983
References: Ragan-Kelley & Clark (1983)
The Pyramid 90x was one of the first commercial processors to have many RISC attributes. The 90x uses a register stack that is organized as 16 non-overlapped windows of 32 registers plus 16 global registers for a total of 528 registers. The registers are spilled to memory if subroutine nesting is more than 15 levels deep.
Taxonomy category: ML0
Implementation: Architectural study
Applications: Direct support for Forth programming language
Who and when: Queens College of CUNY, 1984
References: Vickery (1984)
The QFORTH architecture was built for multitasking single-user execution of the Forth programming language. The internal architecture included two source busses (which could be read the top two elements of the data stack) and a single destination bus to write the top-of-stack back. The stack management unit internally buffered the top stack elements in high speed registers, and allowed for a single stack memory to be partitioned into several simultaneously used stacks.
Taxonomy category: ML0
Implementation: Laboratory model
Applications: Execution of reduction language programs
Who and when: GMD Bonn, 1979
References: Kluge & Schlutter (1980)
Reduction languages use structures of the form: apply function to argument. These structures are well represented by subtrees with a function node having children that are its operands. Since the execution of a program involves evaluating these tree structures, three major stacks are central to the operation of the machine. One stack acts as a program source, another as a program sink, and the third as a temporary evaluation stack area. An interesting feature of the machine is that there is no program memory, and the operation of the machine does not involve any addresses as such. All programs are shuffled between the source and sink stack memories.
Taxonomy category: ML0
Implementation: 1.5 Micron CMOS using 3 gate arrays
Applications: Object oriented programming
Who and when: Linn Products, 1984-88
References: Pountain (1988)
Rekursiv is designed for fast execution of object-oriented programs. It supports a very high level instruction set that may be extended using a large amount of off-chip microcode, and has extensive support for memory management designed into the system. An evaluation stack is used for expression evaluation, while a control stack is used for microcode procedure return address storage.
Taxonomy category: SL2
Implementation: 32-Bit microprocessor
Applications: RISC processor for C and other high level languages
Who and when: University of California, Berkeley, 1981
References: Patterson & Piepho (1982), Patterson & Sequin
(1982), Sequin & Patterson (1982), Tamir & Sequin (1983)
The RISC I was the first highly publicized RISC computer. It owes a substantial amount of its performance to the use of register windows. The "gold" RISC I chip uses an overlapped register window scheme with 78 registers. At any given time, there are 32 addressable registers: 10 global registers, 6 registers shared with the calling subroutine, 10 private registers, and 6 registers used to pass parameters to subroutines at the next deeper nesting level. The registers are accessed using normal 2-operand register-to-register instructions. The RISC I allows accessing the contents of a register as a memory location by automatically mapping the memory access into the register space. This solves the up-level addressing problem that can occur in languages like Pascal.
Taxonomy category: MS0
Implementation: Forth-in-ROM on 6502 and 68000 microcontrollers.
Applications: Embedded controllers that run Forth programs.
Who and when: Rockwell International, 1983
References: Dumse (1984)
While not strictly speaking hardware-supported stack machines, microcontrollers that have Forth burned into their ROM's are an interesting member of the stack-based computer family. The R65F11, based on the 6502 processor, and the F68K processor, based on the 68200 microcontroller of the 68000 processor family, are general purpose microcontrollers that come with preprogrammed Forth primitives. These chips in effect emulate a two-stack Forth engine, using variables and program memory to provide the emulation. Other dedicated Forth microcontrollers have been made since (including the Zilog Super8 chip), but Rockwell was the first to do it.
Taxonomy category: ML0
Implementation: 16-Bit, 2 micron standard cell CMOS microprocessor
Applications: Semicustom design for application-specific designs.
Optimized for Forth programming language
Who and when: Harris Semiconductor, 1987-89
References: Danile & Malinowski (1987), Harris Semiconductor
(1988a), Harris Semiconductor (1988b), Jones et al. (1987)
The RTX (Real Time Express) is a macrocell in the Harris standard cell library. This allows the processor to be built as a stand-alone microprocessor, or as an integrated microprocessor with I/O devices, hardware multiplier and stack memory on-chip. The instruction set directly corresponds to FORTH programming language primitives. The design uses an unencoded instruction format that allows multiple operations to be compacted into each instruction. As with many Forth processors, the RTX 2000 supports single-cycle subroutine calls.
Taxonomy category: ML0
Implementation: 32 Bits, 2.5 micron CMOS
Applications: Stack-based processing for real time control and
expert systems.
Who and when: Harris Semiconductor and WISC Technologies, 1987-89
References: Koopman (1987c), Koopman(1987d), Koopman (1989)
The Harris RTX 32P is a prototype 32-bit stack processor chip set. A unique feature of the RTX 32P is the combination of an opcode with a next-address field in every instruction. This allows zero-cost subroutine calls, returns, and unconditional branches by overlapping the next address computation with the opcode execution. The system can execute one opcode and a subroutine call each memory cycle.
Taxonomy category: ML0
Implementation: 16-Bit AM2901 bit-slice microcoded processor
Applications: Research processor for Forth language
Who and when: Wright State University, 1984
References: Grewe & Dixon (1984)
The RUFOR system is a conventional bit-sliced approach to building a machine optimized for the Forth programming language. There are two hardware stacks, one for data and one for return addresses. The top entry of each stack is held in one of the 2901 internal registers, so that only a single input bus to the ALU and a single output bus back to the stacks is required.
Taxonomy category: ML2
Implementation: 3-Chip, 32-bit microprocessor using 3 micron CMOS
Applications: High level language support for real time control
Who and when: Wright State University, 1987-88
References: Dixon (1987), Longway (1988)
The SF1 (which stands for Stack Frame computer number
1) is an experimental multi-stack processor designed to efficiently
execute high level languages, including both Forth and C. The current
implementation has five stacks, any two of which may be selected as the source
and destination for an instruction. The SF1 allows arbitrary access to its
stack elements by using a 13 bit address relative to the top stack element in
the instruction format.
Taxonomy category: SL2
Implementation: Microprocessor
Applications: Support for Smalltalk-80 language
Who and when: University of California, Berkeley, 1984
References: Bush et al. (1987)
The Smalltalk On A RISC project (SOAR) modified the Berkeley RISC II architecture to adapt it to Smalltalk-80. Since Smalltalk-80 is a stack-oriented bytecode language, this is an exercise in mapping stack code onto a register-oriented RISC, which in turn has its registers arranged in an overlapped window register stack. The window size of the register stack was only 16 registers, half that of RISC II, since Smalltalk methods tend to be smaller than procedures in traditional programming languages.
Taxonomy category: ML2
Implementation: Conceptual design
Applications: Use of bubble memories for main program storage
Who and when: University of Massachusetts/Amherst, 1975
References: Foster (1975)
SOCRATES (Stack-Oriented Computer for Research and Teaching) was a design that proposed using magnetic bubble memories as its main storage. At the time of the design, bubble memories were projected to cost 100 times less per bit than other memories. The only problem was that they could only be accessed sequentially. SOCRATES took advantage of this situation by proposing 64 addressable registers of 32 bits, with each register being the top element of a 32K word bubble memory configured as a LIFO stack.
Taxonomy category: ML1
Implementation: Conceptual design
Applications: Execution of block-structured languages
Who and when: Academy of Sciences of the USSR, 1968
References: Myamlin & Smirnov (1969)
This paper presented a design for a stack computer for executing block-structured languages. The design had two stacks: one for holding arithmetic operations and one for holding operands. While not a directly interpreting machine, it was apparently intended to have source programs maintain an infix format with infix to postfix conversion done on-the-fly. Stacks could be addressed as part of program memory if desired, but were physically separate components.
Taxonomy category: MS0
Implementation: Discrete TTL prototype machine
Applications: Research
Who and when: Iowa State University, 1971
References: Ditzel & Kwinn (1980), Hutchison & Ethington
(1973)
The SYMBOL project constructed an operational computer using no software. The editor, debugger, and compiler were all implemented using random logic circuits. User programs were entered in source code, then compiled and executed using hardwired control circuits. The compilation unit transformed code into a stack-based intermediate form before execution. Several other stacks were used elsewhere as required by the compiler.
Taxonomy category: SS0
Implementation: Family of 16- and 32-bit microprocessors
Applications: Parallel processing
Who and when: INMOS Limited, 1983
References: Whitby-Strevens (1985)
The Transputer is a single-chip microprocessor system designed for parallel processing. Since replicating a complete processor with memory and peripherals is very expensive, the Transputer attempts to squeeze an entire functional system onto a single chip to hold costs down for systems with large numbers of processors. This constraint places program memory space at a premium, so a stack-based instruction set was selected to reduce program size. The Transputer uses 3 registers to form an expression evaluation stack.
Taxonomy category: ML0
Implementation: Simulated design
Applications: Research
Who and when: Carnegie Mellon University, 1980
References: Harbison (1982)
The Tree Machine (TM) architecture was an attempt to make compilers simpler by performing common compiler optimizations using a value cache. This cache would do common subexpression elimination and invariant code motion in hardware by caching results to recently computed expressions. A stack-based architecture was chosen because this allowed better operation with the value caching hardware and eliminated the compiler complexity associated with register allocation. The TM used two stacks: a data stack for expression evaluation, and a control stack for dependency information and return address storage.
Taxonomy category: MS0
Implementation: Conceptual design
Applications: Executing block-structured languages
Who and when: Massey University, New Zealand, 1971
References: Doran (1972)
Doran's tree machine recognized that good programs have an inherent tree structure, and was tailored to execute these well-structured programs. The machine had three stacks resident in program memory: a control stack for return addresses, a value stack to store intermediate results for non-tree-leaf nodes, and a data stack for scratch storage allocation. An interesting feature of the machine was that conditional branches are not required. All conditional execution were accomplished with a conditional procedure return to the parent program node.
Taxonomy category: ML0
Implementation: Conceptual design
Applications: Support for Forth programming language
References: Vaughan & Smith (1984)
This paper discusses the design of a Forth-based computer. The architecture was chosen because Forth is good at representing the tree nature of structured programs. Forth's small subroutine size allows good code compaction through subroutine reuse. The proposed design featured two independent hardware stacks. The return stack had one top-of-stack register, while the data stack had two registers.
Taxonomy category: SS0
Implementation: 5-Chip LSI set
Applications: Direct execution of Pascal P-code
Who and when: Western Digital, 1979
References: O'Neill (1979)
The Western Digital Pascal micro-engine (the WD9000 chip set) was built to execute Pascal P-code. Since P-code presumes the existence of a single data stack, the WD9000 supported a single program memory resident stack for expression evaluation and parameter passing.
Taxonomy category: ML0
Implementation: Discrete LS-TTL, 16-bit data paths
Applications: Stack-based processing
Who and when: WISC Technologies, 1986
References: Haydon & Koopman (1986), Koopman (1986), Koopman
(1987b), Koopman & Haydon (1986)
The WISC CPU-16 is a user-microcodable processor with a Forth-language machine heritage. It has both a data stack and a return address stack. Additionally, it has a 2K word by 30 bit writable control store for user-defined microcode. The architecture is general, and allows supporting other languages besides Forth. An interactive single-step capability is intended for use in teaching microcode techniques to students.
Taxonomy category: ML0
Implementation: 32 Bits, discrete TTL
Applications: Stack-based processing for real time control and
expert systems.
Who and when: WISC Technologies, 1986-87
References: Koopman (1987c), Koopman(1987d)
The WISC CPU/32 is the discrete TTL system upon which the Harris RTX 32P is based. The RTX 32P and CPU/32 are microcode and instruction set compatible.
Bulman (1977):
A general tutorial on stack architectures with emphasis on the B5500 and
HP3000 as example architectures. Also mentions the Data General Eclipse and
PDP-11 as examples of conventional machines incorporating some stack concepts.
Carlson (1975):
A good survey of various high level language computer architectures, many
of which are stack-oriented.
Doran (1975):
Reviews the use of stacks for expression evaluation, data tree traversal,
and subroutine return address saving, concentrating on the Burroughs
architectures.
McKeeman, W. (1975):
A comprehensive tutorial on the operation of stack-based computers and
the role of stacks in general purpose computing.
Myers (1982):
While not specifically about stack machines, this text describes many of
the architectural innovations that are pertinent to discussions on stack
machines. Of particular interest is the discussion of semantic gap.
Siewiorek, Bell & Newell (1982):
This computer architecture text contains many chapters on stack machines.
Yamamoto, M. (1981):
Gives a large list of high level language machines developed in Japan,
many of which are stack-oriented.
Phil Koopman -- koopman@cmu.edu