<! -- Keywords (to help out non-meta searches): middleware pSOS HP-UX HPUX Mach; fault tolerance; fault management; operating systems; software reliability; -->


Middleware Enabled Fault Management
for
Commercial Operating Systems


Charlotte Rekiere
Carnegie Mellon University

crekiere@cs.cmu.edu
ph: (412)268-6480
fax: (412)268-5229


Abstract:

Commercial computer systems have escaped the scrutiny for fault-tolerance typically reserved for mission critical systems. As computer systems become an integral part of daily activities people are beginning to depend on and expect fault-free behavior. The implementation of a fault-management middleware layer to an existing operating system can prove to be an effective way to quickly add fault-management features to commercial computer systems.

This paper evaluates and defines a taxonomy of the implementations of four fault-management middleware layers in three commercial off-the-shelf Operating Systems: pSOS (embedded), Mach 3.0 (micro-kernel) and HP-UX (monolithic kernel).

The middleware development process for HP-UX is described and analyzed for performance and system overhead. Adding assertions shows the ease of implementing fault-management features to the HP-UX middleware. As a demonstration, assertions are used to protect an application from incorrect kernel behavior exposed in the unmodified operating system through running Robustness Benchmarks [Dingman96].


Get the Paper