There are two primary ways to produce a fuzzed data buffer:
-
Automatic generation of fuzzed data
- Mutation of a sample of valid data (mutation template), obtained from a capture or created by some other test automation
Generation can be split into two steps:
-
Generation of a valid mutation template
-
Mutation of the template to produce a fuzzed buffer
Generation vs. Mutation
While it has been said that monkeys typing randomly can produce Hamlet entirely by chance [4], the number of permutations for accomplishing this is NP complete. Such complexity is feasible for simple parsers (see the following Step 1); for complex parsers, however, fuzzers will have to run for an impractically long time. This amount time is necessary for getting deep inside complex parsing code, to create sufficiently valid data to pass the various validations and checks that trigger parsing errors and, thus, prevent data from reaching the inner components of parsers.
A much more efficient technique for creating good fuzzing coverage is to produce fuzzing buffers by mutating well-formed buffers—mutation templates. Each such template is an equivalence class of the input.
For example, consider code that contains the following statement:
if (packet->Referrer && strcmp (packet->Referrer, sMySite) == 0 && packet->UserAgent && packet->UserAgent == uaIE6 && p->WwwAuthenticate && p->WwwAuthenticate != NULL)
{
if (packet->AcceptLanguage && strstr (packet->AcceptLanguage, "en-us") == 0)
{
printf ("%s", packet->AcceptCharset);
}
}
The flaw in this code is the assumption that the AcceptCharset header is always present in the request, if all of the other conditions hold. To find such a flaw, an HTTP request that contains all four headers—Referrer, UserAgent, WwwAuthenticate, and AcceptLanguage, and without the AcceptCharset header—should be attempted. Typically, when AcceptCharset is dereferenced in the call to printf, this code will crash.
If our fuzzing engine is able to add or delete random headers from the request, it will take a long time (if it is at all possible) to generate an HTTP request that has exactly those four headers present. However, if we provide a template buffer with all of the possible HTTP headers present, the fuzzer will very quickly create the bogus request with all of the headers, but without the AcceptCharset header present.
Selecting or Creating the Optimal Set of Mutation Templates
Figure 1. Fuzzing-process details
Selection of the mutation templates can be performed by using the same methodology by which equivalence classes for test matrixes are determined. This equivalence-class determination can be done manually for simple parsers.
A few automation methods to create templates for fuzzing of complex parsers are proposed:
- Automatic template generation, using model-based or pairwise testing. You can define all of the parameters and their relationships that affect the parser, and use a modeling [10] tool or pairwise-testing tool, such as PICT [5], to generate sets of parameters for each template. Then, develop a template-generation tool to create a valid data buffer, according to the parameters that are contained in the model.
- Select an optimal set of templates captured from a production environment, based on the code-coverage difference. You can select a large number of sample templates (that is, capture 100,000 HTTP requests by using a network sniffer or 100,000 files by using a Web crawler). Run the target code with each template as input, recording code-coverage information. Then, sort the templates according to the marginal coverage that they add, and pick up only those templates that add substantial coverage (at least three percent).
- Leverage existing test automation, which already accounts for test matrixes. Then, invoke the fuzzing engine on data that is produced by the test, before delivering it to the tested software.
Fuzzing Engine
One systematic way to produce fuzzed data is by developing a fuzzing mutation engine. The mutation engine should implement the following requirements:
-
Analyze the mutation template, and divide it into a memory tree of groups of fields and individual fields.
-
Randomly select one of the fields or groups, and fuzz it individually in memory.
-
Serialize the memory tree back into a fuzzed buffer.
Figure 2. Fuzzing mutation engine
Analyzing Mutation Template
In order to analyze and parse a mutation template, the fuzzing engine must have knowledge about the data that it represents. The data description can be done either programmatically or by using a data-definition language, such as a schema. The schema might include recursive definitions of groups, elements and their relations, and layering (for example, of field encodings, such as URL encoding or base-64 encoding)—providing information on how to parse each field.
Fuzzing Fields
To fuzz an individual field, the fuzzing engine has to implement a library of fuzzers for each atomic data type. The fuzzers should be implemented to produce values of equivalence classes for each basic data type, which are likely to trigger typical code-defect patterns. For example:
-
For an integer field, a fuzzer can assign boundary values (for example, min_int, max_int, 0, max_int/2, max_int/4, min_int +1, max_int -1).
-
For string values, a fuzzer can inject null characters or format specifiers (for example, “%s”).
-
For a binary field, a fuzzer can flip bits randomly.
Fuzzing Groups
Group fuzzing can be done through manipulation of the fields that a group contains—once again, by following defect patterns. Examples are:
-
Addition of 1,000 fields.
-
Removal of some or all fields.
-
Duplication of existing fields.
Relations
In many data formats, multiple fields in the data buffer are related. One field often governs one or more other fields. For example, Web browsers use the Content-Length header to determine the length of the HTTP message body that follows. If a fuzzer produces an HTTP message body whose length is different from the mutation template, the Content-Length header will not be consistent, and fuzzed data will not be interpreted by the browser as it should be.
Typical relations that are found in the data include the following:
-
One field governs a length of another field.
-
One field governs an offset of another field.
-
One field governs a count of fields in a group.
-
A bit-field value governs the presence of an optional field.
-
A field is a checksum of some part of the buffer.
The mutation engine must be able to read governing field values during template parsing to understand the rest of the buffer. The mutation engine also must update governing fields after fuzzing governed fields.
Encodings
Often, a data buffer contains a recursive structure, in which one field encodes a group of other fields. One example is a mail message that contains an attachment that is encoded with base-64 encoding. To parse the attachment structure, the mutation engine must decode the encapsulated structure, and then continue parsing the inner decoded structure. After fuzzing has been performed on one of the inner fields, the structure should be encoded back (for example, by using base-64, signing, or encryption) and embedded in the fuzzed buffer.
Delivering the Fuzzed Buffer to the Tested Software
After the fuzzed buffer has been created, it must be delivered to the tested software. Wherever possible, this should be done in a stress-like fashion—doing so concurrently from multiple threads in a loop (many fuzzing bugs manifest only under stress). The data delivery itself is dependent on tested software. It can be as simple as sending fuzzed data in a single network packet or opening fuzzed data as a file. In more complex cases, special tools are needed—for example, to send IOCTLs or call APIs using fuzzed data as parameters.