Ballista OS Robustness Test Suite - Catastrophic Detection

While running the Ballista OS Robustness Test Suite catastrophic robustness failures may be uncovered. Catastrophic failures manifest themselves as computer crashes or panics, and require a reboot to recover. Due to their nature catastrophic robustness failures are not automatically counted or recorded by the Ballista OS Robustness Test Suite. The manual steps needed to verify, and reproduce the catastrophic failure as well as restart the test suite follow.

  1. First, you will want to determine the last function tested. One method for doing this is: (starting from the .../ballista directory)
    > cd outfiles
    > ls -lrt outfile.*
    The last outfile. file listed corresponds with the last function the system recorded testing. The text between the first and second period ('.') is the function name. After the second period is a list of parameters. For example:
    .../ballista>cd outfiles
    .../ballista/outfiles>ls -lrt outfile.*
    <...>
    -rw------- 1 kdevale system 214 Feb 15 19:29 outfile.fgetpos.b_ptr_file.b_ptr_fpos_t
    -rw------- 1 kdevale system 279 Feb 15 19:29 outfile.fsetpos.b_ptr_file.b_ptr_fpos_t

    The last recorded function is fsetpos. The parameters associated with this test were b_ptr_file and b_ptr_fpos_t.
    Note, occasionally with Linux the order of outfile. files can get slightly jumbled. We suggest comparing the last 4 or 5 outfile. files listed, and determining which file is associated with the latest entry in callTable.all or callTable<system>.all. Use this function specification as the last recorded function and continue processing.

  2. The next step is verifying that the last recorded function caused the catastrophic failure.
    1. Copy callTable.all or callTable<system>.all to a backup file. (where <system> corresponds with your operating system i.e., DUNIX) If there exists an appropriate callTable<system>.all file please use it instead of callTable.all.
      > cd ..
      > cp callTable.all callTable.backup
    2. Modify callTable.all or callTable<system>.all so that the only entry it contains is the last recorded function. In addition to the function name the parameter names should also match the outfile. name.

      callTable.all

      stdio.h function int fsetpos b_ptr_file b_ptr_fpos_t

      In our example the only entry in callTable.all should be the line corresponding to fsetpos with parameters of b_ptr_file and b_ptr_fpos_t.
    3. Run ostest.pl
      .../ballista>perl ostest.pl
    4. If the system crashes then continue with step 3. Otherwise, we will need to check one additional function for the source of the catastrophic error. It is possible that the Ballista OS Robustness Test Suite was unable to record the function under test before the system crashed. Therefore, we will need to determine the function that immediately follows the one we just checked.

      In the callTable backup file find the function that we just tested. The "following" function specification is the first entry that follows without a comment.

      callTable.all

      <...>
      stdio.h function int fgetpos b_ptr_file b_ptr_fpos_t
      stdio.h function int fsetpos b_ptr_file b_ptr_fpos_t

      # getchar intentionally omitted since it requires stdin
      stdlib.h function long labs b_long

      In our example the blank line and commented line following fsetpos are ignored and labs with parameter b_long is the "following" function specification.

      Now repeat steps 2b and 2c substituting the "following" function for the last recorded function. If this function reproduces the robustness failure continue with step 3.

      If running the test suite on the "following" function does not reproduce the function crash it is unlikely the crash you encountered is associated with the operation of the test suite. Copy the callTable backup file to its original location and try restarting the test suite.

      > cp callTable.backup callTable.all
      > perl ostest.pl
  3. At this point you should know the function (and parameters) that caused the robustness failure. After making note of the function, you will want to omit this function specification from the test suite and restart testing.
    1. Copy the callTable backup file created in 2a to its original name.
      > cp callTable.backup callTable.all
    2. Comment out the identified function in callTable.all or callTable<system>.all. Comments are denoted by a # in the first column.
      In our example lets say that the catastrophic failure was associated with fsetpos. Therefore in callTable.all this function entry should now be preceded with a comment

      callTable.all

      <...>
      stdio.h function int fgetpos b_ptr_file b_ptr_fpos_t
      #stdio.h function int fsetpos b_ptr_file b_ptr_fpos_t

      # getchar intentionally omitted since it requires stdin
      stdlib.h function long labs b_long
    3. Make note of the function associated with the catastrophic failure. You will need this information later. If relevant you may want to copy the outfile. file associated with the function to another location for further processing. (The outfiles subdirectory will be deleted as part of rerunning the test suite.)

    4. Run the test suite.
      > perl ostest.pl