diaphora v3.1 releases: IDA Python BinDiffing plugin
diaphora is the most advanced program diffing tool (working as an IDA plugin) available as of today (2023). It was released first during SyScan 2015 and has been actively maintained since this year: it has been ported to every single minor version of IDA since 6.8 to 8.3.
Diaphora supports versions of IDA >= 7.4 because the code only runs in Python 3.X (Python 3.11 was the last version being tested).
Diaphora has many of the most common program diffing (bindiffing) features you might expect, like:
- Diffing assembler.
- Diffing control flow graphs.
- Porting symbol names and comments.
- Adding manual matches.
- Similarity ratio calculation.
- Batch automation.
- Call graph matching calculation.
- Dozens of heuristics based on graph theory, assembler, bytes, functions’ features, etc…
However, Diaphora has also many features that are unique, not available in any other public tool. The following is a non extensive list of unique features:
- Ability to port structs, enums, unions and typedefs.
- Support for compilation units (finding and diffing compilation units).
- Microcode support.
- Parallel diffing.
- Pseudo-code based heuristics.
- Pseudo-code patches generation.
- Diffing pseudo-codes (with syntax highlighting!).
- Scripting support (for both the exporting and diffing processes).
This is a bug fix release. It fixes the following issues since the previous version:
BUG: It seems that the directory name “database” might conflict with an IDA supplied module also called database.py
BUG: The WHERE clause for the Microcode SPP heuristic was wrong.
BUG: Do not crash in patch diffing sessions if there is no pseudo-code available.
GUI: BUG: The main UI dialog might not be 100% visible with some screen resolutions.
GUI: BUG: Adding manual matches was partially wrong.
GUI: BUG: After closing the “Interesting matches” tab there was no way to show it again (like F3 does with other choosers).
WORKAROUND: Fix for issue #263. It seems IDA might crash if the decompiler is not available and Diaphora tries to use it.
git clone https://github.com/joxeankoret/diaphora.git
To run Diaphora, simply, unpack the compressed distribution file wherever you prefer and directly execute “diaphora.py” from the IDA Pro menu File → Script file. Once the script diaphora.py is executed, a dialog like the following one will be opened:
This dialog, although it can be a bit confusing at first, is used for both exporting the current IDA database to SQLite format as well as for performing diffing against another SQLite exported format database. The first field is the path of the SQLite file format database that will be created with all the information extracted from the current database. The 2nd field is the other SQLite format database to diff the current database against. If this field is left empty, Diaphora will just export the current database to SQLite format. If the 2nd field is not empty, it will diff both databases. The other fields, the check-boxes, are explained below:
- Use the decompiler if available. If the Hex-Rays decompiler is installed with IDA and IDA Python bindings are available, Diaphora will use the decompiler to get much interesting information that will help during the bindiffing process.
- Export only non-IDA generated functions. Self-explanatory, only functions with non-IDA autogenerated names will be exported.
- Do not export instructions and basic blocks. Export only function summaries. When exporting huge databases, it may help speed up operations. However, the diffing capabilities will be more limited.
- Use probably unreliable methods. Diaphora uses many heuristics to try to match functions in both databases being compared. However, some heuristics are not really reliable or the ratio of similarity is very low. Check this box if you want to see also the likely unreliable matches Diaphora my find. Unreliable results are shown in a specific list, it doesn’t mix the “Best results” (results with a ratio of 1.00) with the “Partial results” (results with a ratio of 0.50 or higher) or “Unreliable results”.
- Use slow heuristics. Some heuristics can be quite expensive and take longer. For medium to big databases, it’s disabled by default and is recommended to leave unchecked unless the results from an execution with this option disabled are not good enough. It will likely find better matches than the normal, not that slow, heuristics, but it will take significantly longer.
- Relaxed calculations of different ratios. Diaphora uses, by default, a kind of aggressive method to calculate difference ratios between matches. It’s possible to relax that aggressiveness level by checking this option. Under the hood, the function SequenceMatcher.quick_ratio is used when this option is unchecked and SequenceMatcher.real_quick_ratio when this option is checked. Also, when the option is checked, Diaphora will use to the different ratio of the primes numbers calculated from the AST of the pseudo-code of the 2 functions, calculating the highest ratio from the AST, assembly and pseudo-code comparisons.
- Use experimental heuristics. It says it all: experimental heuristics are enabled only if this check-box is marked. Disabled by default as they are likely not useful.
- Ignore automatically generated names. When performing the comparison between databases, it tells Diaphora to ignore in the “Same name” heuristic functions with the same IDA’s autogenerated name (i.e., when there are two function sub_01020304 in both databases but they aren’t actually the same function). Used only when comparing.
- Ignore all function names. Just disable the “Same name” heuristic. Used only when comparing.
- Ignore small functions. Ignore functions with less than 6 assembly instructions. Used only when comparing.
This is a screenshot of Diaphora diffing the PEGASUS iOS kernel Vulnerability fixed in iOS 9.3.5:
And this is an old screenshot of Diaphora diffing the Microsoft bulletin MS15-034:
Copyright (C) 2015 joxeankoret