reko: a binary decompiler
Reko (Swedish: “decent, obliging”) is a C# project containing a decompiler for machine code binaries. This project is freely available under the GNU General Public License.
The project consists of front ends, the core decompiler engine, and back ends to help it achieve its goals. A command-line, a Windows GUI, and an ASP.NET front end exist at the time of writing. The decompiler engine receives inputs from the front ends in the form of either individual executable files or decompiler project files. Reko project files contain additional information about a binary file, helpful to the decompilation process or for formatting the output. The decompiler engine then proceeds to analyze the input binary.
Reko has the ambition of supporting decompilation of various processor architectures and executable file formats with minimal user intervention.
Reko consists of a central .NET assembly Reko.Decompiler.dll which contains the central core logic. Leaving aside the user interface for a moment, the Reko can at a glance be considered a pipeline. The first stage of the pipeline loads the executable we wish to decompile. Later stages perform different kinds of analyses, extracting information from the machine language where they can and aggregating it into structured information (such as Procedures and data types). The final stage is the output stage, where the source code is emitted into files.
A central tenet is that Reko is extensible: wherever possible, we strive to avoid hard-coding knowledge about specific platforms, processors, or file formats in the core decompiler. Instead, such special knowledge is farmed out in separate assemblies. Examples:
- Reko.Arch.X86.dll – provides support for disassembling Intel X86 binaries.
- Reko.ImageLoaders.MzExe.dll – understands how to load MS-DOS executable files and all related formats
- Reko.ImageLoaders.Elf.dll – understands the ELF executable file format.
The ImageMap data structure keeps track of the address space of a binary that Reko is analyzing. A binary, after being loaded into memory, consists of one or more ImageSegments, each of which has a starting address and size (or extent). Each image segment also has a reference to the MemoryArea that contains its bytes. You can think of the ImageSegments as functioning like “windows” onto a MemoryArea. Note that there one-to-many relation between MemoryAreas and ImageSegments. A typical PE or ELF executable will have logically distinct segments and memory areas:
while an MS-DOS executable will typically be loaded into a single memory area, and all segments will refer to the same memory area
In addition to maintaining a map of all the segments, the ImageMap also maintains in its Items dictionary the locations of any identified items as Reko performs its analysis. Some executable image file formats contain information like the entry points or symbols identifying machine language procedures or data locations. Reko can use this information to populate the image map before starting its scanning phase. The scanning phase will discover more items and add them to the ImageMap. The user interface also allows the user to add items that Reko can’t discover itself.
- John Källén
- Artur Kuptel
- Павел Томин (Pavel Tomin)
- Robin Eklind
- Lukas Dresel
- Andrew Allen
- Mikael Kalms
- Christian Hostelet
- Vladislav Rassokhin
- Дмитрий Макаров (Dmitri Makarov)