People tend to think that when a fuzzer finds a bunch of crashes that it’s exciting and fun, and it is… the first time. However, when there are 181 supposedly-unique crashes and it’s time to go through each of them to determine the impact (aka which ones are exploitable, as opposed to only denial of service), it’s a lot less fun. In fact, it can be downright grueling.
Here’s what the process really looks like:
- Seed files -> Fuzzers -> Crashing inputs
- Crashing inputs -> Minimization -> Bucketing -> Per bug crashes
- Per bug crashes -> automated analysis -> automated triage report
- Automated triage report + input file -> Human using disassemblers and debuggers -> Proof of Concept
- Proof of Concept -> exploit development -> exploit
If you’re looking for academic work on this topic, it’s often called “root cause analysis” in the literature. As each step in this process could easily span a series of several long blog posts, we’ll only briefly describe each step and some of our favorite tools used in each step.
Fuzzing
Fuzzing is an automated technique for repeatedly testing a program’s ability to process a given input. It’s one of the most popular techniques for vulnerability research. Fuzzing is a vast topic and could easily be its own blog post or series of blog posts. As such, we’ll just briefly cover a few of the more popular fuzzing tools.
afl-fuzz
afl-fuzz is one of the most popular fuzzers, and for a good reason too. AFL has found a large number of bugs in real world software. AFL instruments the target program to allow it to receive feedback when fuzzing. This feedback substantially improves the functional coverage of the target program, increasing the likelihood that a vulnerable path is found. One of AFL’s main selling points is its ease of use. AFL is very easy to setup and run. Furthermore, AFL is extremely efficient and has been highly optimized to increase fuzzing speed. AFL is able to run on most UNIX-like systems, however it requires source code for the target application for all of them except Linux. There is also a fork by Ivan called WinAFL, which, as the name implies, runs on Windows. It does not require source code for the target application.
Peach Fuzzer
Peach Fuzzer is another popular fuzzer that can be used to search for vulnerabilities in programs. Unlike AFL however, where fuzzing requires no knowledge of the input format, Peach fuzzer requires a model of the input format before fuzzing can start. This model is used to instruct Peach fuzzer how to mutate the input. This allows Peach to be much more efficient when mutating highly structured inputs, as it knows how each piece of the input relates to the other pieces. For instance, if Peach fuzzer expands the input file, it will know if it must increase a length field inside of that file.
Minimization
As fuzzing involves randomly mutating a test input file, the crashing input files it generates are often large and complex. In order to make analysis easier, we’d like each of the crashing input files to only contain the bytes which are required to cause the crash. There are a number of tools out there to automatically minimize crashing test cases, but the two most famous tools are afl-tmin and delta debugging.
afl-tmin
The popular afl-fuzz fuzzer comes with the test case minimizer afl-tmin. afl-tmin uses an approach to minimizing input files that focuses on efficiency. afl-tmin iteratively removes decreasing blocks of the data from the input file until no more bytes can be removed while still causing the same behavior. afl-tmin is a very fast and simple way to cut a single test case down to size.
While afl-tmin is great for a lot of programs, afl-tmin, like afl-fuzz, is limited in the programs that it can operate on. afl-tmin focuses on file-processing programs that can easily be run via the Linux command line. Generally, larger, more complex programs, such as a web browsers, aren’t able to be run in afl-tmin.
Delta Debugging
Delta debugging is a technique to iteratively test and reduce an input file until it is as small as possible while still maintaining a desired characteristic. As compared to afl-tmin, delta debugging is more work to use, but it is also more flexible. Delta debugging can minimize input files for a wider range of programs and can work around some of the shortcomings in afl-tmin. One of GRIMM’s previous blog post describes the delta debugging process and gives an example on a real world software application.
Bucketing
Once input files have been minimized to ensure that all unnecessary components are removed, it’s useful to remove any duplicate input files that trigger the same bug. Rather than having 181 input files that trigger three bugs, this step filters out duplicate input files such that only one file per bug remains.
crashwalk
crashwalk is a tool for automating some analyses of crashing input files. crashwalk debugs the target program while it processes each input file. When the input file causes a crash, crashwalk analyzes the program state using the debugger. It then buckets each input file based on a hash of the program’s backtrace. This method ensures that the bucketed input files all follow the same program execution path prior to triggering the bug. However, if there are multiple methods to trigger a bug, this method will have a separate bucket for each method. crashwalk is also limited to macOS and Linux; it doesn’t support Windows.
afl-cmin
The popular afl-fuzz fuzzer also comes with with the corpus minimization tool afl-cmin. afl-cmin uses the afl-fuzz instrumentation to track the execution path a program takes when processing an input. It then compares that execution paths for all of the inputs, and saves the smallest file that traverses each of the unique program execution paths. afl-cmin is typically used to reduce the size of a fuzzing corpus, but it can also be used for bucketing crashing inputs. However, if two input files take different program execution paths before triggering the same bug, afl-cmin may include both files.
Automated Analysis
Once minimization and bucketing are complete, the next step in the crash triage process is to analyze the crashes. Before manually sitting down with a debugger and disassembler, it’s useful to look at the information that automated tools can provide.
crashwalk
In addition to bucking of inputs, crashwalk provides some automated analysis through jfoote’s exploitable GDB plugin. This plugin analyzes crashes inside of GDB to determine the type of bug that caused each crash (stack overflow, heap problem, null pointer dereference, etc) and whether or not it is likely exploitable. With this information, it’s possible to rank a set of bugs based on severity.
valgrind
valgrind is a dynamic analysis tool that automatically checks for memory management and threading bugs in a program. Valgrind instruments a program at runtime to detect incorrect stack and heap access, uninitialized data use, use after free bugs, and a number of other bugs that can lead to memory corruption. Valgrind detects the bugs at their time of use, rather than when they cause the program to crash; making the analysis of the crashing input much easier. Furthermore, valgrind is very easy to use and requires no modifications to the target program. One downside to valgrind is that it only works on Linux and macOS, but similar alternatives exist that support Windows, such as Dr. Memory.
Taint Analysis
One category of automated analyses that can help while triaging bugs is taint analysis. Taint analysis involves determining which computations are affected by the input file. By focusing on the specific input bytes that are used by the crashing portions of the target program, we can reduce the size of the input file that needs to be analyzed.
PANDA
PANDA is an open-source Platform for Architecture-Neutral Dynamic Analysis. It is built upon the QEMU whole system emulator, and so analyses have access to all code and data in the guest operating system. PANDA adds the ability to record and replay executions, enabling iterative, deep, whole system analyses. Because PANDA can view the entire system at once, it can easily provide taint analysis of the input file and determine which portions of the file are used in the crashing input. In addition to taint analysis, PANDA has a large number of plugins that can provide other types of analyses.
Symbolic Execution
In the last ten years, there has been a great deal of research in using symbolic execution for computer security. Like ordinary processor execution, symbolic execution runs through the instructions of a program and executes them. However, rather than using a concrete piece of data from the input file, symbolic execution operates on variables which represent any input value. Thus, it’s possible to analyze a program for any values which may make it crash, rather than only checking a single value, as is done with fuzzing.
Symbolic execution is a complex program analysis technique, which can easily be its own series of blog posts. However, symbolic execution can lead to a number of sophisticated automated analysis tools, such as automated exploit generation. If you’re interested in using these tools, we’d recommend reading some of the research literature to learn more.
Unfortunately, most symbolic execution frameworks requires customization for the user’s specific purpose. As compared to other tools listed in this blog post, the symbolic execution frameworks listed here requires a substantial time investment in order to use effectively. These tools provide frameworks for developing crash analysis tools, rather than ready-made applications.
Angr
Angr is a Python framework for analyzing binaries. Angr provides symbolic execution capabilities as well as a number of various other analyses. An analyst can use these capabilities to build tools which can greatly expedite crash analysis and even some portions of exploit development. Angr is largely architecture independent and is capable of analyzing i386, AMD64, ARM, MIPS, and PowerPC programs.
Manticore
Manticore is another Python framework that provides binary analysis. Manticore supports symbolic execution, taint analysis, and a number of other powerful program analyses. As compared to Angr, Manticore provides a simpler interface with an easier learning curve. Manticore supports Linux i386, AMD64, and ARM programs.
Triton
Triton is a dynamic binary analysis framework. It provides components such as a symbolic execution engine, a taint analysis engine, abstract syntax tree representations of the processor instructions set semantics, SMT simplification passes, an SMT solver interface and, the last but not least, Python bindings. Based on these components, an analyst can build powerful program analysis tools, automate reverse engineering, and perform crash analysis.
Manual Analysis
While automated analysis can provide a large amount of information about a crash, at some point a human is required to look at the crash (at least for now). Typically, the goal of this step is for the analyst to understand the crash. This takes different forms, depending on the analyst’s goal, but can either end with a detailed write-up, patch for the affected software, or a proof-of-concept input file that illustrates exploitability.
Disassemblers
In order to see the processor instructions that make up the target program, a disassembler is required. Disassemblers convert the program’s machine code back into instruction mnemonics that are readable by a human. These instructions can then be reviewed to determine what exactly is happening in the crash. While disassemblers as simple as objdump can be useful, interactive disassemblers provide a much more effective interface for reviewing a program’s disassembly. Interactive disassemblers allow the user to browse the disassembly graphically, add comments, rename variables, and provide a number of other helpful tools. The industry standard (and most expensive) disassembler is IDA Pro, but several other promising alternatives exist such as Binary Ninja, Hopper, and radare2.
Debuggers
While disassemblers allow an analyst to statically analyze a program, debuggers are used to dynamically analyze the program. There are large a number of different debuggers available that provide a host of different features. Typically, the debugger that is used varies with the host operating system, such as GDB for Linux, WinDbg for Windows, and LLDB for macOS. However, there are several alternative debuggers that have extended features.
pwndbg, GEF, and PEDA
Rather than creating a completely new debugger, several projects attempt to add features to GDB and customize it to aid in vulnerability research, exploit development, and reverse engineering. pwndbg, GEF, and PEDA are three examples of this type of project. These projects help the manual analysis process by adding extra commands with features such as heap inspection and instruction emulation. Additionally, these projects customize the GDB layout to more clearly show the memory and register contents, and the program’s code flow.
rr
rr is a debugger that records the program’s execution and then allows the user to deterministically replay the execution as often as needed. Because this execution is deterministic, the same addresses and any randomized parameters will be chosen every time the target program is run. Additionally, rr supports efficient reverse execution, which can be used to analyze the program’s memory before the crash happens. This allows the user to more quickly understand how things went wrong and what caused the crash.
QIRA
QIRA is a “timeless” debugger. It records the registers and memory contents after each instruction and allows the user to quickly view the state of the program at any point in its execution. QIRA can answer questions such as “At what points in the program was this variable changed?” and “What was the value of this register each time the instruction at a specific address was executed?” QIRA provides a web-based graphical interface for accessing the program’s state, as well as integration with the IDA Pro disassembler.
Exploit Development
Like the previous steps, exploit development is a large and complex topic. Further, the tools required vary based on the platform and nature of the bug being exploited. However, there are a few generic tools and frameworks that can hasten exploit development.
ROP Chain Development
One common task in exploit development is the creation of a Return-Oriented Programming (ROP) chain that sets up arbitrary code execution. This task involves isolating the return, call, and jump instructions in a target binary and looking for useful snippets of code before them. Several tools exist to help find and use these code snippets. Some of the more advanced tools also try to connect these code snippets together to form larger chains that are easier to use.
Exploit Development Frameworks
While each exploit has unique requirements, there can be a lot of common tasks, such as network communication, structure packing, and buffer encoding. To help reduce the development time of exploits, exploit frameworks create an API that provides helper functions for these common tasks. This allows the exploit developer to focus on the exploit specific components, without having to worry about things like socket programming. Some examples of exploit development frameworks are pwntools and Metasploit.
Conclusion
As you can see, triaging the outputs from fuzzing can be a rather involved process. Thankfully, many tools exist to ease this burden. This blog post lists some of the ones that we’ve found useful during our work investigating crashes at GRIMM. Hopefully, this list will direct you to the right tools for your needs.
If you’d like help finding, triaging, or exploiting bugs, feel free to contact us.