VDiscover: predict vulnerability discovery of binary only programs
VDiscover is a tool designed to train a vulnerability detection predictor. Given a vulnerability discovery procedure and a large enough number of training test cases, it extracts lightweight features to predict which test cases are potentially vulnerable. This repository contains an improved version of a proof-of-concept used to show experimental results in our technical report (available here).
apt-get install python-numpy python-matplotlib python-setup python-scipy
git clone https://github.com/CIFASIS/VDiscover.git
python setup.py install –user
By default, the local installation of the command line utilities of VDiscover is performed inside ~/.local/bin, so it is recommended to add this directory into the PATH variable. Our tool is composed of two main components:
- fextractor: to extract dynamic and static features from test cases.
- vpredictor: to train a new vulnerability prediction model or predict using a previously trained one. It can be used to cluster and visualize a set of test cases.
fextractor is a Python script to perform static and dynamic feature extraction from a testcase. To know more about features, you can consult either the technical report or the source code of VDiscover to understand exactly how they are extracted.
Static features are supposed to capture information relevant to a whole program, and they should be obtained without running the code on particular inputs. The lightweight approach we implemented randomly walks the approximate control flow graph of the binary collecting sequences of potential C standard library calls. Different parameters like –max-subtraces-collected and –max-subtraces-explored are defined to control the number of call sequences sampled from the control flow graph.
Dynamic features are supposed to capture a sample of the behaviour of a program in terms of its concrete sequential calls to the C standard library. Additionally, the final state of the execution is included. Such features are extracted by executing for a limited time a testcase and hooking program events, collecting them in a sequence. Since an execution can involve several modules, we included some command line options to include or ignore them during the extraction process.
Including or ignoring modules
By default, dynamic features will be extracted by analyzing all the libraries used by a program. Nevertheless, it is possible to include or ignore modules using –inc-mods and –ign-mods. Modules should be listed in a file using different lines. Every line will be matched against all the linked library in a file. For example, to include or ignore libjpeg.so.8, it should be enough to add the following line:
Dynamically loaded libraries (e.g. dlopen) are not supported (but it should be relatively easy to implement).
vpredictor is Python script to train a new vulnerability prediction model or predict using a previously trained one.
In order to use this utility either for training or prediction, some input data should be provided in csv format (delimited by “\t”). Fortunately, fextractor automatically outputs in such a format.
By default, it works in prediction mode. Therefore, a previously trained model is mandatory and should be specified using the –model command line option.
fextractor --dynamic bc
And the resulted extracted features are:
/usr/bin/bc isatty:0=Num32B0 isatty:0=Num32B8 setvbuf:0=Ptr32 setvbuf:1=NPtr32 setvbuf:2=Num32B8 setvbuf:3=Num32B0 …
This raw data can be used to train a new vulnerability prediction model or predict using a previously trained one. Additionally, more detailed (but outdated) documentation is available here.
Copyright (C) 2013 Neuromancer