Stack Computers: 8.3 SYSTEM IMPLEMENTATION APPROACHES

8.3 SYSTEM IMPLEMENTATION APPROACHES

Once a decision has been made between a 16-bit and 32-bit processor, there still remains the choice of selecting a manufacturer. Each of the seven stack machines covered in detail in this book has a different set of tradeoffs in the areas of system complexity, flexibility, and performance. These tradeoffs reflect their suitability for different applications. One of the tradeoffs is the decision between hardwired and microcoded control.

8.3.1 Hardwired systems vs. microcoded systems

The question of whether the control circuitry should be hardwired or microcoded is an old debate within all computing circles. The advantages of the hardwired approach are that it can be faster for executing those instructions that are directly supported by the system. The disadvantage is that hardwired machines tend to only support simple instructions, and must often execute many instructions to synthesize a complex operation.

Microcoded machines are more flexible than hardwired machines. This is because an arbitrarily long sequence of microcode may be executed to implement very complicated instructions. Each instruction may be thought of as a subroutine call to a microcoded procedure. In machines with microcode RAM, the instruction set may be enhanced with application specific instructions to provided significant speed increases for a particular program.

The hardwired stack machines all support some rather complex stack operations that are combinations of data stack manipulations, arithmetic operations, and subroutine exits. This is accomplished by manipulating different fields in the instruction format. To the degree that this is possible, the hardwired machine instruction formats are rather like microcode. In fact, Novix has called the NC4016 instructions a form of "external microcode."

In the microcoded stack machines, simple operations such as additions may often take longer than on a hardwired machine. Complicated opcodes, such as double-precision arithmetic operations, do not pack into a single instruction on the hardwired machines. For these complex instructions, the microcoded machines can run faster by providing special complex opcodes. In general, this increased flexibility can more than eliminate the raw speed gap between the two kinds of processors. The final conclusion as to which type of processor is faster for a particular application is in general not clear without evaluating both approaches. The important point is to perform a careful evaluation of the requirements of an application before selecting a stack processor.

8.3.2 Integration level and system cost/performance

In addition to exploring the implementation tradeoffs between hardwired control and microcoded control, the 16-bit stack processors discussed in Chapter 4 display the full range of integration level decisions. Integration level is the amount of system hardware that is placed onto the processor chip. The more system functions that are placed on the processor chip, the higher the integration level. Also at issue, however, are the cost/performance tradeoffs made in the design with respect to the minimum number and type of components necessary to run the system.

The WISC CPU/16 displays the lowest integration level of those processors examined. It uses off-the-shelf building blocks to create a processor with dozens of components. Of course, this design approach eliminates the need to repay the large initial chip layout investment required when producing a single-chip version.

The MISC M17 is a simple single-chip stack processor. Since it uses program memory for its stacks, only the processor chip and program memory are required for operation. The integration level is reasonably high, and the system complexity is low. The penalty paid for the simplicity of the design is that speed is somewhat slower than what is possible with separated stack memories.

The Novix NC4016 also is a single-chip processor, and has an integration level comparable to that of the M17. Not surprisingly, both processors are fabricated using gate arrays of roughly comparable sizes. The major distinction of the NC4016 is that it uses separate memory chips for both stacks. Separate stack memories provide faster potential processing rates for a given clock speed because of the increased memory bandwidth available, but at the cost of requiring more components at the system level.

The Harris RTX 2000 increases the level of system integration beyond the NC4016 by including on-chip stack memories. This actually reduces system complexity while providing potential speed increases, since on-chip memory can be faster than off-chip memory. The cost is more transistors on the chip. However, these extra transistors do not necessarily increase chip size by much. This is because the RTX 2000 uses a different design methodology called standard cell design that is well suited to providing on-chip memories. In fact, RTX 2000 customized systems can be designed that include program memory as well as stack memory on-chip, providing a single-chip stack computer system.

It is likely that most stack computers designed in the future will have differing tradeoffs in the areas of data path widths (16-bit and 32-bit widths for most processing, and perhaps 24-bit widths for signal processing and 36-bit widths for tagged data architectures), level of system integration, required off-chip support, and raw performance. These characteristics must all be taken into consideration when matching a processor selection to cost, performance, and other requirements in a target application.

NEXT NEXT SECTION

Phil Koopman -- koopman@cmu.edu