Robustness Testing of A Distributed Simulation Backplane

Kimberly Fernsler
Philip Koopman

Carnegie Mellon University
Pittsburgh, Pennsylvania

Paper presented at: ISSRE 99, Boca Raton, FL, November 1-4, 1999.


Abstract

Creating robust software requires not only careful specification and implementation, but also quantitative measurement. This paper describes Ballista exception handling testing of the High Level Architecture Run-Time Infrastructure (HLA RTI). The RTI is a standard distributed simulation system intended to provide completely robust exception handling, yet implementations have normalized robustness failure rates as high as 10%. Non-robust testing responses include exception handler crashes, segmentation violations, "unknown" exceptions, and task hangs. Other issues include different robustness failure modes across ports to two operating systems, and mandatory client machine rebooting after a particular RTI failure. Testing the RTI led to scalable extensions of the Ballista architecture for handling exception-based error reporting models, testing object-oriented software structures (including callbacks, pass by reference, and constructors), and operating in a state-rich, distributed system environment. These results demonstrate that robustness testing can provide useful feedback to high-quality software development processes, and can be applied to domains well beyond the previous work on testing operating systems.


Paper preprint: pdf file (198 KB)

Presentation slides: pdf file (459 KB)


BALLIST HOME PAGE Ballista Home Page

koopman@cmu.edu