Stack Computers: the new wave © Copyright 1989, Philip Koopman, All Rights Reserved.
Chapter 5. Architecture of 32-bit Systems
The Johns Hopkins University/Applied Physics Laboratory (JHU/APL) FRISC 3 is a hardwired 32-bit processor optimized for executing the Forth programming language. The name "FRISC" stands for "Forth Reduced Instruction Set Computer." The "3" acknowledges two previous prototype stack processors. The focus of the FRISC 3 is on single cycle execution of Forth primitives in a real time control environment.
JHU/APL developed the FRISC 3 in response to their need for a fast Forth language processor for spaceborne control processing applications in satellites and Space Shuttle experiments. The roots of the FRISC 3 project may be traced back to the JHU/APL HUT project (see Appendix A), which was a bit-slice processor optimized for the Forth language.
After the completion of the HUT processor, the design team at Johns Hopkins designed a prototype 4.0 micron silicon-on-sapphire 32-bit Forth processor (FRISC 1) and a 3 micron bulk CMOS version (FRISC 2), both of which were full-custom designs. The latest version, FRISC 3, is the commercial quality processor that is an outgrowth of their earlier work.
Silicon Composers has purchased commercial production rights to the FRISC 3, and has renamed the design the SC32. The description in this section applies to both the FRISC 3 and the SC32, although we shall call the design the FRISC 3 throughout the remainder of the book.
The primary use of the FRISC 3 is for embedded real time control, especially in spacecraft (which is the focus of the JHU/APL group), but also for other industrial and commercial applications.
Figure 5.1 is an architectural block diagram of the FRISC 3.
Figure 5.1 -- FRISC 3 block diagram.
The Data Stack and Return Stack are implemented as identical hardware stacks. They each consist of a stack pointer with special control logic feeding an address to a 16 element by 32 bit stack memory arranged as a circular buffer. The top four elements of both stacks are directly readable onto the Bbus. In addition, the topmost element of the Data Stack may be read onto the Tbus (Top-of-stack bus) and the topmost element of the Return Stack may be read onto the Abus (return Address bus). Both stack buffers are dual-ported, which allows two potentially different elements of the stacks to be read simultaneously. Only one stack element may be written at a time.
One of the innovative features of the FRISC 3 is the use of stack management logic associated with the stack pointers. This logic automatically moves stack items between the 16-word on-chip stacks and a program memory stack spilling area to guarantee that the on-chip stack buffers never experience an overflow or underflow. This logic steals program memory cycles from the processor to accomplish this, avoiding the extra stack data pins on the chip in exchange for a small performance degradation spread throughout program execution. The designers of the FRISC 3 call this feature a stack cache, because it caches the top few stack elements for quick access on-chip. This cache is not like normal data or instruction caches in that it does not employ an associative memory lookup structure to allow access to data residing in scattered areas of memory.
The ALU section of the FRISC 3 includes a standard ALU that is fed by latches on the Bbus and the Tbus. These two ALU sources on separate busses allow the topmost Data Stack element (via the Tbus) and any of the top four Data Stack elements (via the Bbus) to be operated on by an instruction since the Data Stack is dual-ported. The Bbus can feed any non-stack bus source through the B side of the ALU as well.
The latches from the Bbus and Tbus that feed the ALU inputs are used to capture data during the first half of a clock cycle. This allows the Bbus to be used to write data from the ALU to other registers within the chip on the second half of the clock cycle. The shift block on the B input of the ALU is used to shift the B input left one bit for division, but can also pass data through unshifted. Similarly, the shift unit on the ALU output can shift data right one bit for multiplication, if desired, while feeding the Bbus.
The latch on the ALU output allows pointer-plus-offset addressing to access memory. On the first clock cycle of a memory fetch or store, the ALU adds the literal field value via the Tbus to the selected data stack word from the Bbus. On the second cycle, the Bbus is used to transfer the selected "bus destination" to or from memory.
The flag register (FL) is used to store one of 16 selectable condition codes generated by the ALU for use in conditional branches and multiple precision arithmetic. The ZERO register is used to supply the constant value 0 to the Bbus.
Four User Registers are provided to store pointers into memory or other values. Two of these registers are reserved for use by the stack control logic to store the location of the top element of the program memory resident portions of the Data Stack and Return Stack.
A Program Counter (PC) is used to supply the A-bus with program memory addresses for fetching instructions. The PC may also be routed via the ALU to the Return Stack for subroutine calls. The Return stack may be used to drive the Abus instead of the PC for subroutine returns. The Instruction Register may be used to drive the Abus for instruction fetching, subroutine calls, and for branching.
Figure 5.2 shows FRISC 3's four instruction formats: one for control flow, one for memory loads and stores, one for ALU operations, and one for shift operations. The FRISC 3 uses unencoded instruction formats similar in spirit to those found on the NC4016, RTX 2000, and M17. All instruction formats use the highest 3 bits of the instruction to specify the instruction type.
Figure 5.2a -- FRISC 3 instruction format -- control flow.
Figure 5.2a shows the control flow instruction format. The three control flow instructions are subroutine call, unconditional branch, and conditional branch. The conditional branch instruction is taken if the FL register was set to zero by the most recent instruction to set the FL register. The address field contains a 29 bit absolute address. Unconditional branches may be used by the compiler to accomplish tail-end recursion elimination.
Figure 5.2b -- FRISC 3 instruction format -- memory access.
Figure 5.2b shows the memory access instruction format. Bits 0-15 contain an unsigned offset to be added to the address supplied by the bus source operand. This is accomplished by latching the bus source and the offset field from the instruction at the ALU inputs, performing an addition, and routing the resultant ALU output to the Abus for memory addressing.
Bits 16-19 specify control information for incrementing and decrementing the Return Stack Pointer and/or Data Stack Pointer. Bits 20-23 specify the Bbus Destination. In this notation, "TOS" means Top of Data Stack, "SOS" means Second on Data Stack, "3OS" means 3rd element of Data Stack, "TOR" means Top of Return Stack, etc. Bits 24-27 specify the Bus Source for the Bbus in a similar manner.
Bit 28 specifies whether the next instruction fetched will be addressed by the top element of the Return Stack or the Program Counter. Using bit 28 to specify the Return Stack as the instruction address is combined with a Return Stack pop operation to accomplish a subroutine return in parallel with other operations.
Bits 29-31 specify the instruction type. For the memory access format instructions, the four possible instructions are: load from memory, store to memory, load address (low), and load address (high). The load and store from/to memory instructions use the bus source to supply an address, and the bus destination field to specify the data register destination or source. The load and store instructions are the only instructions that take two clock cycles, since they must access memory twice to accomplish both data movement and the next instruction fetch.
The two load address instructions simply load the computed memory address into the destination register without accessing memory at all. This may also be thought of as an add-immediate instruction. The load address high instruction shifts the offset left 16 bits before performing the addition. The load address instructions are also the means for loading literal values, since the address register can be selected to the ZERO register. In this manner a load address high followed by a load address low instruction can be used to synthesize a full 32-bit literal.
Figure 5.2c -- FRISC 3 instruction format -- ALU operations.
Figure 5.2c shows the ALU instruction format. Bits 0-6 of this instruction format specify the ALU operation to be performed. The A side of the ALU is connected to the Tbus, while the B side is connected to the Bbus. Bit 7 enables loading the FL Register with the condition code selected by bits 10-13 of the instruction. These condition codes provide various combinations of a Zero bit, Negative bit, Carry out bit, and oVerflow bit, as well as constant 0 and 1. Bits 8-9 select the carry in to the ALU operation. Bit 14 selects whether the actual ALU result or the contents of the FL register is driven onto the Bbus. Bit 15 is a 0, indicating that the instruction is an ALU operation.
Bits 16-28 are identical to the memory access instruction format shown in Figure 5.2b. Bits 29-31 specify the ALU/shift operation instruction type.
Figure 5.2d -- FRISC 3 instruction format -- shift operations.
Figure 5.2d shows the shift instruction format. Bit 0 of this instruction format is unused. Bit 1 specifies whether the FL register input is taken from the condition codes selected by bits 10-13 or the shift-out bit of the selected shift register. Bits 2-3 select special step operations for performing multiplication and restoring division. Bit 4 selects whether the shift-right input bit comes from the FL register or the ALU condition code. Bits 5-6 specify either a left or right shift operation. Bit 7 specifies whether the FL Register is to be loaded with the shift output bit or the condition code generated by bits 10-13 and bit 1. Bits 8-9 select the carry-in for the ALU operation, while bit 14 determines whether the ALU output or the FL register is driven to the Bbus. Bit 15 is a 1, indicating that the instruction is a shift operation.
Bits 16-28 are identical to the memory access instruction format shown in Figure 5.2b. Bits 29-31 specify the ALU/shift operation instruction type.
All instructions execute in one clock cycle, with the exception of the memory load and memory store instructions, which take two clock cycles. Each clock cycle is broken during execution into a source phase and a destination phase. During the source phase, the selected Bbus source and the Tbus value are read into the ALU input latches. During the destination phase, the Bbus destination is written. Each instruction is fetched in parallel with the execution of the previous instruction.
Subroutine calls are accomplished in a single clock cycle. Subroutine returns take no extra time to the extent that they can be combined with other instructions.
0 >R 0< @ 0= AND 0> BRANCH 0BRANCH CALL 1 DROP 1+ DUP 1- EXIT 2* LITERAL 2+ NEGATE 2/ NOT 4+ OR + OVER -1 R> - R@ < S->D <> U< = U> > XOR
Table 5.1(a) FRISC 3 Instruction Set Summary -- Forth Primitives. (see Appendix B for descriptions)
The FRISC 3 is capable of a very large number of compound Forth primitives. Space precludes listing all of them, so we shall give some illustrative examples. LIT + @ (address plus offset fetch) LIT + ! (address plus offset store) <variable> @ (fetch a variable) <variable> ! (store a variable) 2 PICK (copy the third element on the stack) 3 PICK (copy the fourth element on the stack) R> DROP R@ < SWAP DROP OVER OVER + LIT + DROP LIT OVER + DUP LIT + OVER - DROP DUP DUP + DROP OVER DUP AND OVER @ DUP XOR 2 PICK @ DUP 1+ 3 PICK @ OVER + OVER ! 2 PICK + 2 PICK ! 3 PICK + 3 PICK ! R@ + + >R R> + DUP >R DUP < DUP R> DROP DUP > R> DROP DUP The flexibility of the FRISC 3 also supports many operations not encompassed in the Forth language, such as stack manipulation words on the Return Stack (e.g. Return Stack SWAP).
Table 5.1(b) FRISC 3 Instruction Set Summary -- Compound Forth Primitives.
Most of the usual Forth primitives as well as manipulations of the top four stack elements on both the Data Stack and Return Stack are supported by the FRISC 3 instruction set. Table 5.1 shows a representative sample of FRISC 3 instructions.
Like all the other machines designs discussed so far, the FRISC 3 has a separate memory address bus (the Abus) for fetching instructions in parallel with other operations. In addition, the FRISC 3 does not have a dedicated top-of-stack register for the Data Stack, but instead uses a dual-ported stack memory to allow arbitrary access to any of the top four stack elements. This provides a more general capability than a pure stack machine and can speed up some code sequences.
The stack control logic is a means to prevent catastrophic stack overflow and underflow during program execution by "dribbling" elements onto and off of the stack to keep at least 4 elements on the stack at all times without overflowing. This demand-fed approach to stack buffer management is discussed in greater detail in Section 220.127.116.11. Each stack has 16 words used as a circular buffer. The stack controllers perform stack data movement to and from memory whenever there would be less than four or more than 12 elements in an on-chip stack. The movement is performed one element at a time, since the stack pointers can only be incremented or decremented once per instruction. Each stack element transfer to or from memory consumes two clock cycles. Chapter 6 discusses the cost of these extra cycles, which the FRISC 3 designers claim is typically below 2% of overall program execution time for their machine.
The FRISC 3 is implemented on 2.0 micron CMOS technology with a silicon compiler using 35,000 transistors. It is packaged in an 85 pin Pin Grid Array (PGA). The FRISC 3 runs at up to 10 MHz.
The FRISC 3 is designed for real time control applications, especially in the area of spaceborne systems. It is designed to execute Forth efficiently, although it should be reasonably efficient at running C or other conventional languages. C support is enhanced by the capability of using one of the User Registers as a frame pointer and using the offset of the memory load and store instructions to do frame pointer plus offset addressing.
The information in this section is based on the description of the FRISC 3 in Hayes & Lee (1988). Information on previous versions of the FRISC architecture may be found in Fraeman et al. (1986), Hayes (1986), and Hayes et al. (1987).
Phil Koopman -- email@example.com