LAVA: Large Scale Automated Vulnerability Addition
Evaluating and improving bug-finding tools is currently difficult due to a shortage of ground truth corpora (i.e., software that has known bugs with triggering inputs). LAVA attempts to solve this problem by automatically injecting bugs into the software. Every LAVA bug is accompanied by an input that triggers it whereas normal inputs are extremely unlikely to do so. These vulnerabilities are synthetic but, we argue, still realistic, in the sense that they are embedded deep within programs and are triggered by real inputs. Our work forms the basis of an approach for generating large ground-truth vulnerability corpora on demand, enabling rigorous tool evaluation and providing a high-quality target for tool developers.
LAVA is the product of a collaboration between MIT Lincoln Laboratory, NYU, and Northeastern University.
At a high level, LAVA adds bugs to programs in the following manner. Given an execution trace of the program
on some specific input, we:
1) Identify execution trace locations where input bytes are available that do not determine control flow and have not
been modified much. We call these quantities DUAs, for Dead, Uncomplicated and Available data.
2) Find potential attack points that are temporally after a DUA in the program trace. Attack points are source code locations where a DUA might be used, if only it were available there as well, to make a program vulnerable.
3) Add code to the program to make the DUA value available at the attack point and use it to trigger the vulnerability