CodeAlchemist: Semantics-aware Code Generation for Finding JS engine Vulnerabilities

CodeAlchemist

CodeAlchemist

CodeAlchemist is a JavaScript engine fuzzer that improves classic grammar-based JS engine fuzzers by a novel test case generation algorithm, called a semantics-aware assembly. The details of the algorithm are in our paper, “CodeAlchemist: Semantics-Aware Code Generation to Find Vulnerabilities in JavaScript Engines”, which appeared in NDSS 2019. This is a stable version of CodeAlchemist and it currently supports ChakraCore, V8, SpiderMonkey, and JavaScriptCore.

CodeAlchemist Architecture

At a high level, it takes in as input a JS engine under test, a set of JS seed files, and a set of user-configurable parameters, and it outputs a set of bugs found in the engine.
CodeAlchemist consists of three major components: SEED PARSER, CONSTRAINT ANALYZER, and ENGINE FUZZER.

The SEED PARSER module breaks given JS seeds into a set of code bricks. The CONSTRAINT ANALYZER module then infers assembly constraints for each code brick and annotates them
with the computed assembly constraints, which ultimately constitute a code brick pool. Finally, the ENGINE FUZZER module assembles the code bricks from the pool based on their assembly constraints to generate test cases and to execute the generated test cases against the target JS engine.
1) SEED PARSER: This module first parses each JS seed down to an AST based on the ECMAScript language specification [9]. The Parse function returns an AST from a given seed as long as it is syntactically correct. To filter out semantically unique code bricks, the Split function breaks the ASTs into code bricks and normalizes the symbols in them. All the broken code bricks should represent a valid AST, although they are not tagged with assembly constraints yet.
2) CONSTRAINT ANALYZER: This module figures out an assembly constraint for each of the fragmentized code bricks. First, the Analyze function recognizes which symbols are used and defined in each code brick using a classic data-flow analysis [1]. The Instrument function then traces types of the variables by dynamically instrumenting code bricks. As a result, CONSTRAINT ANALYZER returns a set of code bricks, each of which is tagged with an assembly constraint. We call such a set as a code brick pool, which is later used to generate test cases, i.e., JS code snippets, for fuzzing.
3) ENGINE FUZZER: Now that we have code bricks to play with, the ENGINE FUZZER module uses them to fuzz the target JS engine. Specifically, the Generate function iteratively assembles code bricks based on their assembly constraints in order to generate test cases. It also takes a set of user-configurable parameters which adjusts the way of combining the code bricks (see §V-F). Finally, the Execute function executes the target JS engine with the generated test cases. If the engine crashes, it stores the corresponding test case (a JS file) on a file system

Install && Use

Copyright (c) 2019 HyungSeok Han, DongHyeon Oh, and Sang Kil Cha at SoftSec, KAIST