(A Usenet posting that was printed in Usenet Nuggets, Computer Architecture News, Vol. 21, No. 1, March 1993)
This is a consolidated attempt to address some of the points people have been bringing up about stack-based architectures. It's written from the point of view of someone who has actually made a living designing and selling them (first with WISC Technologies, later with the Harris Semiconductor RTX family).
Stack processors don't need to be pipelined for ALU and operands, because the operands are immediately available in the top of stack buffer registers. Access to the on-CPU stack RAM can be completely hidden by pipelining. Access to off-chip memory is typically not pipelined in current implementations, but can be (P. Koopman, "Some Ideas for Stack Computer Design", 1991 Rochester Forth Conference, Pg. 58). It makes sense that the false dependencies introduced by stack addressing could be overcome with a superscalar implementation if one were so inclined (but, I'm not).
On-chip stack buffers only need to be about 16 deep. Spilling can be done by stack overflow interrupts (or, for that matter, statically scheduled by the compiler the same as register spills). Cost for interrupt-driven overflows is less than 1% for only 16 registers and essentially 0% for 32 registers for reasonable programs (P. Koopman, Stack Computers, pp. 139-146).
A neat thing about stack CPUs is that context switching for interrupts takes essentially zero time -- no registers need to be saved; you just put the ISR values onto the top of the stack and clean them off when you're done (presumes enough stack space is available -- no big deal to arrange). (P. Koopman, Stack Computers, pp. 146-152.)
Program size doesn't matter (much) for workstations. But, for embedded control it matters a lot, especially when you're limited to on-CPU chip memory, and the CPU has to cost less than $5-$10. Anecdotal evidence indicates stack computer program size can be smaller than CISC programs by a factor of 2.5 to 8 (and, another factor of 1.5 to 2.5 smaller than RISC, depending whom you want to believe). This comes not just from compact opcodes, but also from reuse of short code segments and implicit argument passing with subroutine calls. Code size comparisons I've seen don't take this into account. (P. Koopman, Stack Computers, pg. 118-121.)
Stack compilers aren't currently very efficient -- but that's because no-one has really tried all that hard. I've published an algorithm and experimental results that suggest that stacks can be made about as efficient as registers in terms of keeping local variables at the top level of the memory hierarchy. The work is based on GNU C intermediate code (P. Koopman, A preliminary exploration of optimized stack code generation", Journal of Forth Applications and Research, 6(4), 1994.)
Overall, I'd say that stack machines are an excellent fit for high performance in a low cost system (not necessarily highest performance given unlimited cost). They should do especially well in embedded applications.
Phil Koopman -- koopman@cmu.edu