Also, check the publications list for available technical publications.
What is the goal of the Ballista project?
The goal of the Ballista project is to automatically test off-the-shelf software components for robustness problems. These results will apply to as broad a class of software components as possible, but in general will work best on modules that take all inputs from a parameter list.
While we also are looking at ways to automatically harden components to prevent robustness failures, in the near term only automated testing will be useful for industry projects.
What is robustness?
The IEEE Standard Glossary of Software Engineering Terminology (IEEE Std 610.12-1990) defines the following terms:
Strictly speaking, the current goal of Ballista is test the error tolerance of software (per IEEE 610.12 definitions). However, because the term "error tolerance" is generally synonymous with "fault tolerance" in the hardware fault tolerance community, we generally say that the goal of Ballista is the exceptional input portion of improving robustness. We are currently working on research to extend Ballista to address testing in stressful environmental conditions as well.
What is a "robustness failure"?
A robustness failure is defined within the context of Ballista to be a test case which, when executed, produces a non-robust reaction in the form of a system crash, Restart failure (task hang), or Abort failure ("core dump" or in general generates a signal that causes abnormal task termination).
Whether robustness failures are due to actual bugs is discussed below.
How does Ballista work?
Ballista testing works by bombarding a software module with combinations of exceptional and acceptable input values. The reaction of the system is measured for either catastrophic OS failure (generally in the form of a machine reboot), a task "hang" (detected by a watchdog timer), or a task "abort" (detected by observing that a child process terminates abnormally). The current implementation of Ballista draws upon a list of heuristic test cases and runs all or a large number of combinations of these inputs. In the future more thorough testing will be performed.
Our publication archive has more extensive information.
What is an example of the kind of robustness failures Ballista tests for?
Ballista in general tests for non-robust reactions to exceptional input
conditions. As a really simple example, consider the reaction of the function:
int atoi(const char *nptr);
to an input value of NULL. Because atoi() takes a character string pointer as an input value and attempts to convert the string to an integer, a NULL input value is an exceptional condition (it doesn't point to a legitimate string, and is additionally not documented to provide a useful function). So, if atoi(NULL) causes an Abort failure (a "core dump"), that particular test case is said to generate an robustness failure.
Ballista also tests more complicated situations, including some that involve setting machine state information before a test is executed. For example tests involving file descriptors will create a file with certain properties and feed the corresponding file handle to a function as part of a test case. Ballista can also test functions taking more than one input arguments.
Are the things Ballista finds really bugs?
Generally a "bug" is considered to be a software defect resulting in the behavior of a program differing from what is in the requirements specification. (Unfortunately, IEEE Std 610.12 doesn't help much with this area of terminology.) Whether what Ballista finds is really a "bug" depends on several factors discussed below. The short version of the answer is that Ballista may or may not be finding "bugs" depending on your philosophical bent. But, what it is finding are what we call "robustness failures," which are problems that cause the software to be non-robust (which is not necessarily the same as being defect-free).
Software specifications generally treat the reaction to exceptional inputs in one of the following three ways:
Ballista declares things to be robustness failures based on considering "doesn't crash; doesn't hang" to be an implicit part of any software specification. So, if Ballista finds a robustness problem in a piece of software, it might or might not be a "true bug", but can be considered a "robustness failure". Ballista sometimes finds ways to crash entire operating systems; typically people agree that such situations result from software defects regardless of any written specification.
It turns out that many software specifications are intentionally incomplete, ambiguous, or otherwise less than completely specific when it comes to robustness requirements. In many cases responses to exceptional inputs are left entirely at the discretion of implementors (but, if implemented, a standard mechanism for returning error codes might be provided). And, there are legitimate reasons for this to have been done, especially when comittees had to reach concensus. Again, this is a case where software might be defect-free from a specification point of view, but still non-robust. Part of the purpose of the Ballista work is, in this era of increased computerization of areas critical to daily life, to raise awareness in the area of robustness so that hopefully future software specifications and standards will address it more thoroughly.
What is the relationship between number of software defects and the number of robustness failures?
The number of software defects (i.e., source code "bugs") will in general be far smaller than the number of robustness failures reported by Ballista. This is because Ballista generates exhaustive combinations of parameter values when doing testing in order to elicit robustness failures in cases that only occur with certain combinations of parameter values. (Note that in some cases Ballista only samples the testing space in the interest of execution time, but still usually there are fewer software defects than robustness failures reported.)
Ballista is predicated on the principle that source code will not be available for off-the-shelf software components. (This assumption may not always be true, but is often true and is useful from a research point of view for seeing how far the technology can be taken with that simplifying assumption.) Given this assumption, and the fact that Ballista is more about testing (in a quality assurance sense) than debugging, the only metric available for reporting robustness problems is the number of test case failures rather than root cause software defects.
Doesn't providing robustness cost speed?
In most current software, robustness costs a modest amount of speed. But, as hardware architectures evolve we think that the speed penalty will be reduced. From what we've seen it is fairly common for speed checks to only cost a few percent in execution time of real programs, and we have found ways to make it even less.
Some ways to address the speed issue include:
There are arguments that complete robustness is impossible because it isn't possible to be able to check all inputs in all situations. Our current opinion is that robustness is largely possible, and that with some care and a little bit of invention.
There are also an argument that says it is better to "dump core" than return an error condition, on the theory that core dumps are more urgent to fix and give better pinpointing of errors than using error return code. We'd counter that having a "convert errors into core dumps" switch would be a neat thing to have in a library, but really once you ship a finished application out the door then core dumps aren't what you want -- what you want is to implement error checking in your application.
How does Ballista relate to other robustness testing?
Ballista is the product of many years of research in the area of dependability, and particular fault injection, at Carnegie Mellon's Institute for Complex Engineered Systems. The focus of our research has been on repeatability and portability. A complete survey of other approaches is beyond the scope of this FAQ, but can be found in our various publications. However, two approaches warrant special note:
Isn't Ballista just the same as Purify or Boundschecker?
NuMega's Boundschecker and Rational's Purify are tools that help with detecting and debugging invalid memory references and other problems that are similar in some respects to what Ballista tests for.
Both these tools appear useful, but do not necessarily address the specific problem we are working on. Put another way, they're quite interesting, and play in the same general "market", but are not really the same. Some key differences are:
How can I get access to Ballista testing?
You can download Ballista for free under the GPL. See the Ballista home page for pointers to the download site.
Are you related to the Ballista OS security product?
No, the company in Canada that was selling a product called "Ballista" for doing SATAN-like testing on Windows products has nothing to do with us; and they have since changed the name of that product. Our use of Ballista® is a registered trademark of Carnegie Mellon University.
Ballista home firstname.lastname@example.org
Last updated 6/2/2002. Special thanks to the folks who have exchanged e-mail with me about the above issues.