NOTE: This is a Paper Reading for Topics in Programming Languages: Automated Testing, Bug Detection, and Program Analysis. The original paper can be found here.

What is the problem being tackled?

The paper tackles the problem of generating random, syntactically valid inputs to exercise various code paths in the semantic analysis stages of programs and leveraging feedback to generate new inputs via mutations.

How was it addressed by prior work?

On one hand, QuickCheck-like random-input generators allow generating random, syntactically valid inputs. On the other hand, coverage-guided fuzzing tools such as AFL and libFuzzer randomly mutate known byte sequences to produce new byte sequences, and if the mutated byte sequences lead to new code coverage in the test program, they are saved for subsequent mutation.

What are the innovation(s) proposed in this paper?

The paper proposes Zest, a technique for automatically guiding QuickCheck-like random-input generators to exercise various code paths in the semantic analysis stages of programs. It first converts a QuickCheck-like random-input generator to a parametric generator, which can generate a syntactically valid input from a byte sequence. It then uses a coverage-guided fuzzing technique with the parametric generator in order to produce syntactically valid input that can increase code coverage in the semantic analysis stages.

How are those innovations evaluated? How does the paper's evaluation match with the proposed problem statement?

The authors integrated Zest into the open-source JQF framework and evaluated Zest on five real-world Java benchmarks, comparing it to QuickCheck and AFL. They evaluated the three techniques on two fronts:

The amount of code coverage achieved in the semantic analysis stage after a fixed amount of time.
Their effectiveness in triggering bugs in the semantic analysis stage.

QuickCheck and Zest make use of generators for synthesizing syntactically valid input, and do not exercise code paths corresponding to parse errors in the syntax analysis stage. In contrast, AFL performs mutations directly on raw input strings, and spends most of its time testing error paths within the syntax analysis stages.

The experimental results suggest that when given QuickCheck-like random-input generators, Zest excels at exercising semantic analyses and is very effective at discovering semantic bugs.

The paper's evaluation matches well with the proposed problem statement, as the experimental design accurately assesses factors directly correlated with the problem of "generating random, syntactically valid inputs to exercise various code paths in the semantic analysis stages of programs and leveraging feedback to generate new inputs via mutations", and the experimental results support the effectiveness of the proposed approach.

Which technical innovations are most compelling to you?

The most compelling technical innovation is Zest's design of generating a syntactically valid input from a byte sequence given a QuickCheck-like random-input generator, by using bytes from the byte sequence to "fill in" randomly generated primitive data types of various length (bool, char, int, etc.) required within the random-input generator. This allows bit-level mutations on byte sequences to correspond to high-level structural mutations in the space of syntactically valid inputs, enabling Zest to leverage the mature coverage-guided fuzzing algorithm originally designed for byte sequence inputs.

What remains unclear after reading the paper? Are there any clarification questions whose answers would substantially change your opinion of the paper?

The author states that the Zest algorithm "extends the CGF algorithm by keeping track of the coverage achieved by semantically valid inputs", and that "we hypothesize that this biases the search towards generating even more valid inputs and in turn increases code coverage in the semantic analysis stage". However, how semantically valid inputs are used is not stated in the description of the algorithm.

Which problems remain unsolved after this paper? Do you foresee any barriers to the applicability of the technique proposed in the paper? If so, how could these barriers be overcome?

Zest assumes the availability of QuickCheck-like random-input generators to exercise the semantic analysis classes and find semantic bugs, which may be unavailable for specialized data structures. There has also been some recent interest in automatically generating input grammars from existing inputs, using machine learning and language inference algorithms. These techniques are complementary to Zest - the grammars generated by these techniques could be transformed into parametric generators for Zest.

Paper Reading > Topics in Programming Languages: Automated Testing, Bug Detection, and Program Analysis

Paper Reading: Semantic Fuzzing with Zest

https://abbaswu.github.io/2022/10/11/Paper-Reading-Semantic-Fuzzing-with-Zest/

Author

Jifeng Wu

Posted on

October 11, 2022

Licensed under

Paper Reading: QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing Previous

Paper Reading: How We Refactor, and How We Know It Next