Delta Debugging
Introduction
Have you ever been fuzzing a program and received a crash, only to find the input file was huge? Trying to manually determine which portions of an input file trigger the bug can be an extremely frustrating and time consuming process. Huge input files can make the triage of bugs much harder. This blog post describes a technique known as delta-debugging which can help you automatically produce an input file that is as small as possible while still triggering the bug in the original input file.
Background
Delta debugging is a technique to iteratively test and reduce an input file until it is as small as possible while still maintaining a desired characteristic. Originally developed by Andreas Zeller, delta debugging works by removing increasingly smaller amounts of the input file and testing to see whether the inputs change the test outcome. Zeller has released an open source Python implementation of delta debugging available on the delta debugging web page at Saarland University. Delta debugging is very effective at minimizing test cases, and has been applied to many complex programs, such as web browsers. GRIMM has expanded the original delta-debugging implementation to ease the associated development burden, and work with Python 3. The most up to date version of the code is available on GRIMM’s github repository.
Example - ccd2cue
Setting up the example
In order to illustrate delta debugging’s usefulness, let’s walk through using delta debugging to analyze some crashes from one of GRIMM’s recent NotQuite0DayFriday release. The bugs in ccd2cue described in the NotQuite0DayFriday write-up were found via fuzzing, and as such resulted in large input files.
The first thing we’ll need to do is build ccd2cue and test the crash. We tested this on a 64-bit Ubuntu Linux machine with build-essential and gcc-multilib. The ccd file is from the NotQuite0DayFriday linked above.
$ wget 'https://ftp.gnu.org/gnu/ccd2cue/ccd2cue-0.5.tar.gz'
$ tar xf ccd2cue-0.5.tar.gz
$ cd ccd2cue-0.5/
$ env CFLAGS=-m32 ./configure --prefix=$PWD/../install
$ make
$ make install
$ cd ../
$ ./install/bin/ccd2cue -o /dev/null toc_and_make_reference_name.ccd
Running ccd2cue with the toc_and_make_reference_name.ccd file, results in a SIGABRT signal being sent to the process. However, the toc_and_make_reference_name.ccd file is the result of fuzzing, and as such includes a number of unnecessary parts. In order to speed up the root-cause analysis, we’ll use delta-debugging to minimize the test file.
Developing the Delta Debugging Test Function
Before we can apply delta debugging, we must first develop a test function. This function will take an input to test and return whether or not the input is interesting. For our example, our test function will run ccd2cue in GDB and check for a SIGABRT signal. The full source code for the ccd2cue delta debugging test script is available on GitHub.
The first thing that our test function does is write the input from the delta debugging algorithm to a file, as ccd2cue is expecting the input as a file.
def _test(self, deltas):
# Build input file
input_filename = mkstemp(prefix="ccd2cue-crash-", suffix=".ccd")[1]
with open(input_filename, "wb") as f:
for (index, byte) in deltas:
f.write(bytes([byte]))
Next, our test function runs ccd2cue with the input from the delta debugging algorithm inside of GDB. To ease the burden of creating delta debugging test functions, GRIMM has developed a python GDB wrapper that allows our test function to easily run ccd2cue in GDB and analyze the results.
g = Gdb(self.executable)
# Append the input filename to the target args & run
args = []
for entry in self.target_args:
args.append(entry)
args.append(input_filename)
debug("Running: {} {}".format(self.executable, " ".join(args)))
t = Thread(target=self.wait_for_gdb, kwargs={"gdb": g})
t.start() # Start waiting for gdb
response = g.run(args=args, read_to_prompt=True)
debug("gdb response = {}".format(response))
Finally, our test function checks the GDB output for SIGABRT, indicating whether or not ccd2cue aborted. If ccd2cue aborted, our test function reports FAIL, otherwise it reports PASS.
# Check for cases where we didn't crash
if ("exited normally" in response or "exited with code" in response):
g.quit()
t.join()
return self.PASS
if "SIGABRT" in response:
g.quit()
t.join()
return self.FAIL
# Shouldn't ever happen
error("Unhandled exception occurred {}".format(response))
g.quit()
t.join()
return self.UNRESOLVED
Obtaining the Results
Now that we’ve created our test function, we can run the delta debugging script, and obtain the minimized test file.
$ python3 scripts/dd-ccd2cue.py --input-file toc_and_make_reference_name.ccd --target-args "-o /dev/null" ./install/bin/ccd2cue
The resulting ccd file is written to crash-minimal.ccd. The minimized file is only 147 bytes, as compared to the original’s 2403 bytes. A quick look at this file (shown below) and the bug becomes obvious. More details on this bug are available in the NotQuite0DayFriday write-up.
TocEntries =111111111111111111111111111111111111111111111111111111111111111111111111111 acksScrambled = 101a2a3a4a5a6
Sessions=3
[Entry2
afl-tmin
While delta debugging is great, it is by no means the only way to minimize an input file. The popular afl-fuzz fuzzer comes with the test case minimizer afl-tmin. afl-tmin uses a different approach to minimizing input files that focuses on efficiency. afl-tmin iteratively removes decreasing blocks of the data from the input file until no more bytes can be removed while still causing the same behavior.
So which tool is better? Both tools produce similar results, but it really comes down to the goal of the minimization. Like afl-fuzz, afl-tmin is limited in the programs that it can operate on, which means file-processing programs that can easily be run via the Linux command line and exit when done processing input. Additionally, afl-tmin has been designed and optimized to run very quickly, resulting in quicker test case minimizations than delta debugging. On the other hand, delta debugging can handle larger, more complex programs, such as web browsers, and can send inputs in whatever manner the test script writer chooses, such as via the network.
While delta debugging is more flexible in this sense, it is also more work. In order to employ delta debugging, the user must write a test function for delta debugging that analyzes the target program’s response to an input and determines if it passes or fails. However, this same flexibility can allow for interesting use cases. If a target program has multiple independent bugs, a test function can be created to analyze the target program’s crash and fail only on the desired bug, as demonstrated in GRIMM’s sqlite dynamic debugging test script. This same problem would prevent afl-tmin from properly minimizing the test file, as both bugs would result in a crash and it would converge on whichever bug had the smallest triggingering input.
Conclusion
The next time you’re frustrated trying to minimize huge test cases, don’t worry. Just break out some delta debugging and let it clean things up for you. Once you’ve minimized your test cases, where do you go from there? In future blog posts, we’ll be describing some of the options available for crash analysis to help you quickly get to the root of the problem.
If you’d like help in finding or triaging bugs, feel free to contact us.