<! -- Keywords (to help out non-meta searches): middleware pSOS HP-UX HPUX Mach; fault tolerance; fault management; operating systems; software reliability; -->
Charlotte Rekiere
|
crekiere@cs.cmu.edu |
Commercial computer systems have escaped the scrutiny for fault-tolerance
typically reserved for mission critical systems. As computer systems become an
integral part of daily activities people are beginning to depend on and expect
fault-free behavior. The implementation of a fault-management middleware layer
to an existing operating system can prove to be an effective way to quickly add
fault-management features to commercial computer systems.
This paper evaluates and defines a taxonomy of the implementations of four
fault-management middleware layers in three commercial off-the-shelf Operating
Systems: pSOS (embedded), Mach 3.0 (micro-kernel) and HP-UX (monolithic
kernel).
The middleware development process for HP-UX is described and analyzed for
performance and system overhead. Adding assertions shows the ease of
implementing fault-management features to the HP-UX middleware. As a
demonstration, assertions are used to protect an application from incorrect
kernel behavior exposed in the unmodified operating system through running
Robustness Benchmarks [Dingman96].