Datalog Disassembly v1.8 releases: A fast and accurate disassembler
Datalog Disassembly
A fast disassembler which is accurate enough for the resulting assembly code to be reassembled. The disassembler implemented using the datalog (souffle) declarative logic programming language to compile disassembly rules and heuristics. The disassembler first parses ELF file information and decodes a superset of possible instructions to create an initial set of datalog facts. These facts are analyzed to identify code location, symbolization, and function boundaries. The results of this analysis, a refined set of datalog facts, are then translated to the GTIRB intermediate representation for binary analysis and reverse engineering. The GTIRB pretty printer may then be used to pretty print the GTIRB to reassemblable assembly code.
The analysis contains two parts:
- The C++ files take care of reading an elf file and generating facts that represent all the information contained in the binary.
- src/datalog/*.dl contains the specification of the analyses in datalog. It takes the basic facts and computes likely EAs, chunks of code, etc. The results are represented in GTIRB or can be printed to assembler code using the gtirb-pprinter.
Changelog v1.8
- Prefer LOCAL symbols over GLOBAL ones when selecting symbols for symbolic
expressions for ISAs other than MIPS. - Support GTIRB sections with holes (byte intervals only covering part of the section).
- Use pre-existing code blocks as hints when disassembling a RAW binary.
- Better data access computation for MIPS binaries.
- Detect incremental linking regions in PE binaries.
- Create elfStackSize and elfStackExec auxdata from ELF PT_GNU_STACK segments.
- In PE binaries, every exported code symbol is considered a function entry.
- Fixed bug where
elfSymbolTabIdxInfo
aux data could refer to non-existent UUIDs. - Fixed unrecognized
tls_get_addr
pattern that could result in missed
symbolic expressions. - Binaries with zero-sized
OBJECT
symbols no longer produce missing code
blocks. $t
symbols in ARM binaries now force creation of Thumb-mode code blocks.- In PE binaries, duplicate imports no longer create duplicate symbols.
- Added pattern to match missed symbolic data in pointer arrays.
- Fix symbols associated to functions (Auxdata functionNames) for PE binaries
when Ddisasm is run with option-F
. - Requires gtirb >=2.0.0, gtirb-pprinter >=2.0.0
Install && Use
Copyright (C) 2019