Stack Computers: the new wave © Copyright 1989, Philip Koopman, All Rights Reserved.

Chapter 7. Software Issues


The choice of which programming language to use to solve a particular problem should not be taken lightly. Sometimes the choice is dictated by external forces, such as when using Ada for a US Department of Defense contract. In other cases the language choice is constrained by the existence of a limited number of compilers. In general, though, there is considerable choice available in selecting the language for a new system design.

Software selection should not be considered as an isolated issue. The language used should reflect the entire system being developed, including the system operating environment, the suitability of the language to solve the problem at hand, development time and costs, the maintainability of the finished product, the strengths of the underlying processor at running various languages, and the previous programming experience of the programmers assigned to the project. Note that the experience of the programmers was not placed first on the list. A poor choice based on programmer bias for a familiar language can result in problems that more than offset any perceived gain in productivity.

7.2.1 Forth: strengths and weaknesses

Forth is the most obvious language to consider using on a stack machine. That is because the Forth language is based upon a set of primitives that execute on a virtual stack machine architecture. All the stack machines presented in this book support highly efficient implementations of Forth. All of these machines can use a Forth compiler to generate efficient machine code. The biggest advantage of using Forth then, is that the highest processing rate possible can be squeezed from the machine.

One of the characteristics of Forth is its very high use of subroutine calls. This promotes an unprecedented level of modularity, with approximately 10 instructions per procedure being the norm. Tied in with this high degree of modularity is the interactive development environment used by Forth compilers. In this environment programs are designed from the top down, using stubs as appropriate. Then they are built from the bottom up, testing each and every short procedure interactively as it is written. On large projects, the top-down and bottom-up phases are repeated in cycles.

Since Forth is a stack-based, interactive language, no testing programs or "scaffolding" need be written. Instead, the values to be passed to the procedure are pushed onto the stack from the keyboard, the procedure to be tested is executed, and the results are returned on the top of the stack. This interactive development of modular programs is widely claimed by experienced Forth programmers to result in a factor of 10 improvement in programmer productivity, with improved software quality and reduced maintenance costs. Part of this gain may come from the fact that Forth programs are usually quite small compared to equivalent programs in other languages, requiring less code to be written and debugged. A 32K byte Forth program, exclusive of symbol table information, is considered a monster, and may take several hundred thousand bytes of source code to generate.

One of the advantages of the Forth programming language is that it covers the full spectrum of language levels. Some languages, such as assembly language, allow dealing only at the hardware level. Other languages, such as FORTRAN, deal at an abstract level that has little to do with the underlying machine. Forth programs can span the full range of programming abstraction. At the lowest level, Forth allows direct access to hardware ports in the system for real time I/O handling and interrupt servicing. At the highest level, the same Forth program can manage a sophisticated knowledge base.

The one facet of Forth that is most interesting (and baffling to many casual observers) is that it is an extensible language. As every procedure is added to the language, the apparent language available to the programmer grows. In this manner, Forth is much like LISP. There is no distinction made between core procedures in the language and extensions added by the programmer. This enables the language to be flexible to an extent beyond comprehension to people who have not extensively used the capability.

The extensibility of Forth does have mixed blessings. Forth tends to act as a programmer amplifier. Good programmers become exceptional when programming in Forth. Excellent programmers can become phenomenal. Mediocre programmers generate code that works, and bad programmers go back to programming in other languages. Forth also has a moderately difficult learning curve, since it is different enough from other programming languages that bad habits must be unlearned. New ways of conceptualizing solutions to problems must be acquired through practice. Once these new skills are acquired, though, it is a common experience to have Forth-based problem solving skills involving modularization and partitioning of programs actually improve a programmer's effectiveness in other languages as well.

Another problem with some Forth systems is that they do not include a rich enough set of programming tools to suit many programmers. Also, older Forth systems cooperate poorly with resident operating systems. These traits stem from Forth's history of use on very small machines with few hardware resources. In real time control applications, these limitations are generally not much of a problem. Other applications need better support tools. Fortunately, the trend is for newer Forth systems to provide much better development environments and library support than in the past.

The result of all these effects is that Forth is best used on medium-sized programming projects involving no more than two or three programmers who have compatible programming styles. In any very large programming project, clashing styles and abilities tend to prevent the production of extremely high quality software. However, within these constraints, Forth programs are consistently delivered in a very short time with excellent results, often solving problems that could not be solved in any other language, or at least, not solved within budget and development time constraints.

7.2.2 C and other conventional languages

Of course, there will always be applications that are better done in conventional languages. Probably the most common reason for using a conventional language will be the existence of a large body of existing source code that must be ported onto a better processor.

To illustrate the tradeoffs involved, let us look at the problem of porting an application written in C onto a stack processor using a C compiler written for the stack machine. We shall skip over the problem of translating the program from the source C code into an intermediate form, since this is independent of the machine upon which the program is to run. The portion of the C compiler that is of interest is the so-called "back end." The back end is the portion of the compiler that takes a predigested intermediate form of a program and produces code for the target machine.

Actually, generation of stack-based code for expression evaluation is relatively straightforward. The topic of converting infix arithmetic expressions to stack-based (postfix/RPN) expressions is well researched (Bruno & Lassagne 1975, Couch & Hamm 1977, Randell & Russell 1964).

The problem in generating code for stack machines from C is that there are several assumptions about the operating environment deeply entrenched in the language. The most profound of these is that there must be a single program-memory-resident stack that contains both data and subroutine return information. This assumption can not be violated without "breaking" many C programs. As an example, consider the case of a pointer that references a local variable. That local variable must reside in the program memory space, or it can not be properly referenced by the program.

To make matters worse, C programs typically push large volumes of data, including strings and data structures, onto the C-stack. Then, C programs make arbitrary accesses within the area of the current stack frame (the portion of the stack containing variables belonging to the current procedure). These restrictions make it unfeasible to attempt to keep the C stack on a stack machine's Data Stack.

How then can a stack machine be made efficient at running C programs? The answer is that the stack machine must efficiently support frame-pointer-plus-offset addressing into program memory. The RTX 2000 can use its User Pointer to accomplish this efficiently. The FRISC 3 can use one of its user-defined registers and the load/store with offset instructions. The RTX 32P's commercial successor will have a frame pointer register and dedicated adder for computing memory addresses. In all cases, the access to local variables can be made in the same time as required for any other memory operation: two memory cycles -- one for the instruction and one for the data. This is the best that can be hoped for in any processor that does not resort to expensive techniques such as separated data and instruction caches.

The other notion often found in C, and other high level languages, that does not map well onto stack machines is that of a "register variable." Since stack machines do not have a set of registers, this implies that compiler optimization opportunities may be missed by stack machines. This is only partially true. While it is true that stack machines are not well suited for juggling a large number of temporary values on the stacks, a small number of frequently accessed values can be kept on the stack for quick reference. For example, these values might include a loop counter kept on the return stack and two addresses for a string compare kept on the data stack. In this manner, most of the efficiency of the hardware can be captured for the majority of C programs.

There is one additional concept that can make most C programs as fast as Forth programs on stack machines. That is the concept of supporting Forth as the "assembly language" of the processor. This approach is being vigorously pursued on by several stack machine vendors. Using this approach, existing C programs are transferred to the stack machine. Their execution characteristics are then profiled. This profiling information is used to identify the few critical loops within the program. These loops are then rewritten in Forth for better speed, perhaps augmented with application specific microcode in the case of the RTX 32P. Using this technique, C programs can attain virtually the same performance as all-Forth programs with very little effort.

When this good performance is added to the other stack machine qualities of low system complexity with high processing speed, C becomes a viable language in which to program stack machines.

7.2.3 Rule-based systems and functional programming

There is evidence that programming languages used to implement rule-based systems, such as those written in Prolog, LISP, and OPS-5 are very well suited to stack machines. One very exciting possibility is the marriage of real time control applications with rule-based system knowledge bases. Preliminary research into this area has been encouraging. Much work has been done using Forth as an implementation vehicle. Areas explored include: LISP implementations (Hand 1987, Carr & Kessler 1987), an OPS-5 implementation (Dress 1986), a Prolog implementation (Odette 1987), neural network simulations (Dress 1987), and development environments for real time expert systems (Matheus 1986, Park 1986). Most of these Forth implementations have subsequently been ported to stack machine hardware with excellent results. For example, the rule-based Expert-5 system described by Park (1986) runs 15 times faster on the WISC CPU/16 than on a standard IBM PC. A similar rule-based system (actually closer to Park's Expert-4, which is slower than Expert-5) runs approximately 740 times faster on the RTX 32P than on a standard 4.77 MHz 8088 PC. This speedup of nearly three orders of magnitude is astonishing to some, but merely reflects the suitability of using a stack machine, which is good at tree traversal, for solving problems that use decision trees.

The speedup observed for the rule-based system is actually based on a principle that applies to a wide variety of problem areas. Stack machines can treat a data structure as an executable program. Consider for a moment an example of a tree data structure, with pointers at the internal nodes and program action tokens at the leaves. The nodes of the trees that are pointers can just be the addresses of the children, which equates to subroutine calls in many stack processors. The leaves of the trees can be executable instructions, or subroutine calls to procedures that accomplish some task. A conventional processor would have to use an interpreter to traverse through the tree in search of the leaves. A stack processor can just directly execute the tree instead. Since stack machines execute subroutine calls very quickly, the results can be extremely efficient. The technique of directly executing tree-formatted data structures is responsible for the tremendous speed of the RTX 32P example cited in the previous paragraph.

Stack machines are well suited to LISP programming as well as to expert systems. This is because LISP and Forth are very similar languages in many respects. Both treat programs as lists of function calls to other lists. Both are extensible languages. Both use Polish notation for arithmetic operations. The major difference is that LISP involves dynamic storage allocation for its cells, while Forth uses a static storage allocation. Since there is no reason that a stack machine should be any worse at garbage collection than other machines, LISP should run efficiently on a stack machine.

Many of the same arguments about stack machines' suitability for LISP apply to Prolog. In a Prolog implementation for the RTX 32P, this author made an additional discovery about how to efficiently map Prolog onto stack machines. Prolog uses typed data that can be either an actual data element or a pointer to other data. A possible encoding for Prolog data elements is one that uses the highest 9 bits of a 32-bit word for a data type tag. The lowest 23 bits are then used as either a pointer to another node, a pointer to a 32 bit literal value, or a short literal value. Using this data format, data items are actually executable as instructions. Instructions for the RTX 32P can be constructed that allow traversing an arbitrarily long series of pointer dereferences at the rate of one dereference per memory cycle, simply by executing the data structure as a program. Nil pointer checking can be accomplished by defining the nil pointer value to be a subroutine call to an error trapping routine. These kinds of data handling efficiencies are simply not possible with other types of processors.

Functional programming languages offer the promise of a new way of solving problems using a different model of computation than that used by conventional computers (Backus 1978). A particular method of executing functional programs is the use of graph reduction. The same techniques of direct execution of the program graphs that were discussed for rule-based systems above are equally applicable to graph reduction. Thus, stack machines should be good at executing functional programming languages. Belinfante (1987) has published a Forth-based implementation of graph reduction. Koopman & Lee (1989) describe a threaded interpretive graph reduction engine.

From a theoretical point of view, efficient graph reduction machines such as the G-machine and Norma fall into

the SL0 category of the taxonomy in Chapter 2. ML0 machines have a superset of the capabilities of SL0 machines, and should therefore be efficient at graph reduction as well. Initial investigations into this area by this author show that the RTX 32P, which is a very simple stack machine, can compete quite effectively with even very complex graph reduction machines such as Norma.

One of the side effects of using a functional programming language is that a high degree of parallelism is available during program execution. This raises the idea of a massively parallel computer made of stack processors that is programmed with a functional programming language.


HOME Phil Koopman --