Philip Koopman
ECE Department & Institute for Complex Engineered Systems
Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
©1999 IEEE. Published in the Post proceedings of the Computer Security, Dependability, and Assurance: From Needs to Solutions (CSDA'98), 11-13 November 1998, Washington, D.C.
Quantitative assessment tools are urgently needed in the areas of fault tolerance, software assurance, and computer security. Assessment methods typically employed in various combinations are fault injection, formal verification, and testing. However, these methods are expensive because they are labor-intensive, with costs scaling at least linearly with the number of software modules tested. Additionally, they are subject to human lapses and oversights because they require two different representations for each system, and then base results on a direct or an indirect representation comparison.
The Ballista project has found that robustness testing forms a niche in which scalable quantitative assessment can be achieved at low cost. This scalability stems from two techniques: associating state-setting information with test cases based on data types, and using one generic, but narrow, behavioral specification for all modules. Given that this approach has succeeded in comparing the robustness of various operating systems, it is natural to ask if it can be made more generally applicable.
It appears that Ballista-like testing can be used in the fault tolerance area to measure the generic robustness of a variety of API implementations, and in particular to identify reproducible ways to crash and hang software. In software assurance, it can be used as a quality check on exception handling, and in particular as a means to augment black box testing. Applying it to computer security appears more problematic, but might be possible if there is a way to orthogonally decompose various aspects of security-relevant system state into analogs of Ballista data types.
While Ballista-like testing is no substitute for traditional methods, it can serve to provide a useful quality assurance check that augments existing practice at relatively low cost. Alternately, it can serve to quantify the extent of potential problems, enabling better informed decisions by both developers and customers.
Paper: