comby-decomposer is a tool that automatically decomposes program syntax into
(1) a set of templates and (2) a set of concrete fragments extracted from the
syntax. Basically, it un-substitutes some input file to create a corpus of
templates and fragments.
Here's a visual example of outputs when we decompose a function with respect to parentheses (...) and braces {...}.
You can then use mix-and-match templates and fragments and substitute these back in to create new combinations of output "programs". It's been most useful for fuzzing compilers.
- Install
comby - Install
fdupesfor your distribution (typicallysudo apt-get install fdupesorbrew install fdupes)
Copy the programs you want to decompose into the sources directory. If you don't have any programs ready, you
can try it with some example programs in this repository:
cp try_example_solidity/* sourcesThen, just run:
./run.sh .sol .sol
This says "decompose programs for all .sol files in the sources directory using the .sol language (defined by comby -list).
You get these outputs:
fragmentscontains the extracted fragmentstemplatescontains the templatized versions of the input string.
These are deduplicated and unique. You can try the same for example rust programs:
cp try_example_rust/* sources
./run.sh .rs .rsand check out examples in templates and fragments!
By default programs are decomposed around delimiter syntax like (...),
{...} and [...]. This means that comby-decomposer looks for this syntax
and un-substitutes the program around these patterns. It will also look for nested syntax inside these. You can use any comby
pattern you like though! I These patterns are defined inside
extraction_specifications using a simple configuration file and you'll find the default ones
for parentheses, braces, and brackets in there already. But if you wanted to
un-substitute around numbers in the input program, you can simply add a
specification like extract_number.toml that matches the regex [0-9]+ like this:
[extract_number]
match=':[~[0-9]+]'
(Tip: the string :[~[0-9]+] is just using the comby syntax :[...] to define a regular expression [0-9]+)
Now if you delete the other specifications and ./run.sh .sol .sol again on
the example program, you'll only get one decomposition (since there's only one
number to un-substitute), and your templates and fragments folders contain
these:
templates
function f(uint:[1] arg) public {
g(notfound);
}fragments
256
If you have many or large programs to decompose, you can run ./postprocess.sh after ./run.sh. This will:
- only keep templates and fragments that are 8KB to 28KB in size
- delete templates with no holes
- delete templates with more than 10 holes
Look inside the postprocess.sh to customize it.
If you have templates and fragments and want to randomly generate new
programs by substituting fragments into templates, you can use the included
server. Just run these commands:
npm
86B5
install express body-parser @iarna/toml minimist
export NODE_OPTIONS="--max-old-space-size=8192"
node server.js --generate 1.0Then in a separate terminal, request a new input:
curl -X POST http://localhost:4448/mutate
Every request generates a random input. The server picks a random template from
the templates directory, and substitutes up to 10 random fragments from
the fragments directory. If you want more control over server generation
you'll have to dig into server.js.
See https://github.com/agroce/afl-compiler-fuzzer#01-usage to use the server with an AFL fuzzer
Comby supports a lot of languages (see comby -list for supported ones),
but even if you just created one, you'll probably get some mileage from using
the generic parser and default patterns: ./run .my-language-extension .generic.
comby-decomposer was created to generate inputs to fuzz compilers. We can substitute
and recombine templates and fragments to create new inputs.
If you want to use it with an AFL fuzzer that's been modified to request inputs from the server, see https://github.com/agroce/afl-compiler-fuzzer#01-usage.
The fun part is comby-decomposer is
syntax-aware and works on most any language (or structured input like JSON) to
some degree, and you can get started right away: You just need some example
programs for your program. No need to define a grammar to generate inputs, or
know exactly how the grammar of your target language works. That's because it
just uses Comby to do the decomposition
work.
The basic idea is that comby-decomposer creates inputs that are much more
likely to be syntactically valid expressions inside previously valid (now
templatized) programs. Generating inputs with templates and likely-valid
expressions help nudge a feedback-directed fuzzer (like AFL) to explore
"deeper" code in a compiler earlier than, say, just starting with a
compiler's initial test suite. So, instead of using only an initial test
corpus, comby-decomposer helps generate templates and fragments for
synthesizing new test programs on demand. It banks on the idea that these
inputs are unlikely to be generated early (or at all) by default input
mutations in a fuzzer's feedback loop.
For example, comby-decomposer generated this Solidity program that sent the compiler into an infinite loop:
contract C {
bytes32 constant x = x;
function f() public pure returns (uint t) {
assembly {
t := x
}
}
}It also generated new valid-looking programs using the Zig compiler's test suite to find crashes deeper in the analysis and code generation phases of the compiler:
export fn entry() void {
var x = false;
const p = @ptrCast(?*u0, &x);
return p < null;
}You can find more Zig examples here.