Ballista Project
Frequently Asked Questions

What is the goal of the Ballista project?
What is robustness?
What is a robustness failure?
How does Ballista work?
What is an example of the kind of robustness failures Ballista tests for?
Are the things Ballista finds really bugs?
What is the relationship between number of software defects and the number of robustness failures?
Doesn't providing robustness cost speed?
How does Ballista relate to other robustness testing?
Isn't Ballista just the same as Purify or Boundschecker?
How can I get access to Ballista testing?
Are you related to the Ballista OS security product?

Also, check the publications list for available technical publications.

What is the goal of the Ballista project?

The goal of the Ballista project is to automatically test off-the-shelf software components for robustness problems. These results will apply to as broad a class of software components as possible, but in general will work best on modules that take all inputs from a parameter list.

While we also are looking at ways to automatically harden components to prevent robustness failures, in the near term only automated testing will be useful for industry projects.

What is robustness?

The IEEE Standard Glossary of Software Engineering Terminology (IEEE Std 610.12-1990) defines the following terms:

"Robustness. The degree to which a system or component can function correctly in the presence of invalid inputs or stressful environmental conditions."
"Error tolerance. The ability of a system or component to continue normal operation despite the presence of erroneous inputs."

Strictly speaking, the current goal of Ballista is test the error tolerance of software (per IEEE 610.12 definitions). However, because the term "error tolerance" is generally synonymous with "fault tolerance" in the hardware fault tolerance community, we generally say that the goal of Ballista is the exceptional input portion of improving robustness. We are currently working on research to extend Ballista to address testing in stressful environmental conditions as well.

What is a "robustness failure"?

A robustness failure is defined within the context of Ballista to be a test case which, when executed, produces a non-robust reaction in the form of a system crash, Restart failure (task hang), or Abort failure ("core dump" or in general generates a signal that causes abnormal task termination).

Whether robustness failures are due to actual bugs is discussed below.

How does Ballista work?

Ballista testing works by bombarding a software module with combinations of exceptional and acceptable input values. The reaction of the system is measured for either catastrophic OS failure (generally in the form of a machine reboot), a task "hang" (detected by a watchdog timer), or a task "abort" (detected by observing that a child process terminates abnormally). The current implementation of Ballista draws upon a list of heuristic test cases and runs all or a large number of combinations of these inputs. In the future more thorough testing will be performed.

Our publication archive has more extensive information.

What is an example of the kind of robustness failures Ballista tests for?

Ballista in general tests for non-robust reactions to exceptional input conditions. As a really simple example, consider the reaction of the function:
int atoi(const char *nptr);
to an input value of NULL. Because atoi() takes a character string pointer as an input value and attempts to convert the string to an integer, a NULL input value is an exceptional condition (it doesn't point to a legitimate string, and is additionally not documented to provide a useful function). So, if atoi(NULL) causes an Abort failure (a "core dump"), that particular test case is said to generate an robustness failure.

Ballista also tests more complicated situations, including some that involve setting machine state information before a test is executed. For example tests involving file descriptors will create a file with certain properties and feed the corresponding file handle to a function as part of a test case. Ballista can also test functions taking more than one input arguments.

Are the things Ballista finds really bugs?

Generally a "bug" is considered to be a software defect resulting in the behavior of a program differing from what is in the requirements specification. (Unfortunately, IEEE Std 610.12 doesn't help much with this area of terminology.) Whether what Ballista finds is really a "bug" depends on several factors discussed below. The short version of the answer is that Ballista may or may not be finding "bugs" depending on your philosophical bent. But, what it is finding are what we call "robustness failures," which are problems that cause the software to be non-robust (which is not necessarily the same as being defect-free).

Software specifications generally treat the reaction to exceptional inputs in one of the following three ways:

The function is documented to return an error code for an exceptional input, but does not do so. This is pretty clearly a bug.
The behavior of the function to exceptional inputs is unspecified. This is clearly a "robustness failure" in that it reduces the robustness of the software, but it is often not held as being a bug by software developers.
The behavior of the function is specifically stated to be undefined for exceptional input values. Software developers will in general strenuously claim it is not a bug. And in fact, there have in the past been good reasons for specifying reaction to exceptional conditions as undefined (for example: to avoid overspecification in multiple-error situations; to leave room for extensions; and to achieve consensus or save time in a standards meeting [thanks to D.C. for that list]). But, Ballista work still considers this to arguably still be a robustness problem that is undesirable in a mission- or safety-critical system.

Ballista declares things to be robustness failures based on considering "doesn't crash; doesn't hang" to be an implicit part of any software specification. So, if Ballista finds a robustness problem in a piece of software, it might or might not be a "true bug", but can be considered a "robustness failure". Ballista sometimes finds ways to crash entire operating systems; typically people agree that such situations result from software defects regardless of any written specification.

It turns out that many software specifications are intentionally incomplete, ambiguous, or otherwise less than completely specific when it comes to robustness requirements. In many cases responses to exceptional inputs are left entirely at the discretion of implementors (but, if implemented, a standard mechanism for returning error codes might be provided). And, there are legitimate reasons for this to have been done, especially when comittees had to reach concensus. Again, this is a case where software might be defect-free from a specification point of view, but still non-robust. Part of the purpose of the Ballista work is, in this era of increased computerization of areas critical to daily life, to raise awareness in the area of robustness so that hopefully future software specifications and standards will address it more thoroughly.

What is the relationship between number of software defects and the number of robustness failures?

The number of software defects (i.e., source code "bugs") will in general be far smaller than the number of robustness failures reported by Ballista. This is because Ballista generates exhaustive combinations of parameter values when doing testing in order to elicit robustness failures in cases that only occur with certain combinations of parameter values. (Note that in some cases Ballista only samples the testing space in the interest of execution time, but still usually there are fewer software defects than robustness failures reported.)

Ballista is predicated on the principle that source code will not be available for off-the-shelf software components. (This assumption may not always be true, but is often true and is useful from a research point of view for seeing how far the technology can be taken with that simplifying assumption.) Given this assumption, and the fact that Ballista is more about testing (in a quality assurance sense) than debugging, the only metric available for reporting robustness problems is the number of test case failures rather than root cause software defects.

Doesn't providing robustness cost speed?

In most current software, robustness costs a modest amount of speed. But, as hardware architectures evolve we think that the speed penalty will be reduced. From what we've seen it is fairly common for speed checks to only cost a few percent in execution time of real programs, and we have found ways to make it even less.

Some ways to address the speed issue include:

Have versions of software with and without input checking. This gives a software designer the choice to trade speed against robustness. There might be a temptation to turn on checking in development and turn it off in production code, but this problem has existed as long as the concept of assertions in general have existed. There seems to be evidence that a substantial fraction of run-time errors involves improper exception handling (we're collecting information, but at the moment it's anecdotal), so we recommend checking as much as possible in production code.
In many embedded applications the speed of the computer portion of the system is limited by dollars, not available technology. So a designer could simply elect to spend a little more money in return for increasing software robustness (and, with Ballista, now there is at least a preliminary way to quantify robustness).
We think there are ways to minimize the speed penalty by exploiting hardware archectural features and using clever software techniques. See our publications page for more information, and in particular John DeVale's thesis and our 2002 DSN paper.

There are arguments that complete robustness is impossible because it isn't possible to be able to check all inputs in all situations. Our current opinion is that robustness is largely possible, and that with some care and a little bit of invention.

There are also an argument that says it is better to "dump core" than return an error condition, on the theory that core dumps are more urgent to fix and give better pinpointing of errors than using error return code. We'd counter that having a "convert errors into core dumps" switch would be a neat thing to have in a library, but really once you ship a finished application out the door then core dumps aren't what you want -- what you want is to implement error checking in your application.

How does Ballista relate to other robustness testing?

Ballista is the product of many years of research in the area of dependability, and particular fault injection, at Carnegie Mellon's Institute for Complex Engineered Systems. The focus of our research has been on repeatability and portability. A complete survey of other approaches is beyond the scope of this FAQ, but can be found in our various publications. However, two approaches warrant special note:

The University of Wisconsin Fuzz project, which tests operating system calls for responses to randomized input streams.
The "crashme" benchmark, which tests robustness both in terms of exceptional conditions and overloading the system with many simultaneous tests. This benchmark was highly effective at getting an entire machine to crash. In fact, some of our early work was what we called "cmu_crashme", and was a way to increase repeatability of the crashme work.

Isn't Ballista just the same as Purify or Boundschecker?

NuMega's Boundschecker and Rational's Purify are tools that help with detecting and debugging invalid memory references and other problems that are similar in some respects to what Ballista tests for.

Both these tools appear useful, but do not necessarily address the specific problem we are working on. Put another way, they're quite interesting, and play in the same general "market", but are not really the same. Some key differences are:

Ballista actively seeks out potential robustness failures, whereas the other tools instrument software to detect if any robustness failures are encountered in normal execution. Thus, Ballista is more likely to find problems that are caused by exceptional conditions that aren't tested or encountered during development. Many system failures are caused by improper handling of exceptions that weren't covered by testing, so we think this is an important difference. Furthermore, Ballista automatically generates exceptional condition tests, so you don't have to create the test scripts.
Part of our results are aimed at quantitatively comparing the robustness of multiple implementations of the same API. This is not a goal of the commercial products, which are more aimed at an application developer/testing audience.
We are particularly interested in situations for which source code for a module that is to be hardened is not available, and/or cannot be modified. This happens in instances for which you have purchased or decided to re-use some commercial-off-the-shelf (COTS) module. While this may not be typical in an application programming shop, it is a very typical situation in a system integration environment where you are trying to get software components from multiple (competing) vendors to work in a single system. Probably those other tools would have some utility in instrumenting new application code that couples with COTS modules, but that is not their specific purpose.
NuMega's product in particular seems to have been pre-taught about OS interfaces. Ballista is only using OS interfaces as a convenient example; our goal is to accomplish hardening of arbitrary software in a way that is as highly automated as possible. So rather than a tool which "knows" about OS interfaces, we are trying to build a tool that is very easily taught some new interface.

How can I get access to Ballista testing?

You can download Ballista for free under the GPL. See the Ballista home page for pointers to the download site.

No, the company in Canada that was selling a product called "Ballista" for doing SATAN-like testing on Windows products has nothing to do with us; and they have since changed the name of that product. Our use of Ballista^® is a registered trademark of Carnegie Mellon University.

Ballista home page.

koopman@cmu.edu

Last updated 6/2/2002. Special thanks to the folks who have exchanged e-mail with me about the above issues.

Ballista Project Frequently Asked Questions

Ballista Project
Frequently Asked Questions