diaphora v3.1.1 releases: IDA Python BinDiffing plugin

Diaphora

diaphora is the most advanced program diffing tool (working as an IDA plugin) available as of today (2023). It was released first during SyScan 2015 and has been actively maintained since this year: it has been ported to every single minor version of IDA since 6.8 to 8.3.

Diaphora supports versions of IDA >= 7.4 because the code only runs in Python 3.X (Python 3.11 was the last version being tested).

Unique Features

Diaphora has many of the most common program diffing (bindiffing) features you might expect, like:

  • Diffing assembler.
  • Diffing control flow graphs.
  • Porting symbol names and comments.
  • Adding manual matches.
  • Similarity ratio calculation.
  • Batch automation.
  • Call graph matching calculation.
  • Dozens of heuristics based on graph theory, assembler, bytes, functions’ features, etc…

However, Diaphora has also many features that are unique, not available in any other public tool. The following is a non extensive list of unique features:

  • Ability to port structs, enums, unions and typedefs.
  • Support for compilation units (finding and diffing compilation units).
  • Microcode support.
  • Parallel diffing.
  • Pseudo-code based heuristics.
  • Pseudo-code patches generation.
  • Diffing pseudo-codes (with syntax highlighting!).
  • Scripting support (for both the exporting and diffing processes).

Changelog v3.1.1

This is mainly a bug fixes release that, however, includes 2 new heuristics and some experimental enhancements to try to find patched vulnerabilities when doing patch diffing. Here is the whole change log:

DIFF: Added a ratios cache to speed up comparison operations.
EXPORT: Added a column to save how long it took to export a single function.
EXPORT: Use cur.executemany() instead of cur.execute() whenever it’s possible.
GUI: Added menu item “Show assembly patch”.
HEUR: Added heuristic “Related compilation unit” to find functions by matching potential compilation units.
HEUR: Added heuristic “Same constants related matches” to find functions using the same constants in different places.
MISC: Refactored the code for finding potentially fixed vulnerabilities.
MISC: Replace multiple “SELECT *” appearances with just the required fields, where appropriate.
VULN: Added a few new patterns to try to find potentially fixed vulnerabilities.
VULN: Added heuristic to try to find fixed signedness issues for x86 and ARM.
BUG: Diaphora was calling ida_lines.get_srcline() for every assembly line. Fixed by doing it once per basic block.
BUG: The code for calculating the primes assigned to a compilation unit was terribly slow.
BUG: The microcode instructions list was built a lot of times instead of being done only once.
BUG: When importing pseudo-code comments, do not set the treeloc_t.item_preciser_t member itp when the stored value is None.

Download

git clone https://github.com/joxeankoret/diaphora.git

Running

To run Diaphora, simply, unpack the compressed distribution file wherever you prefer and directly execute “diaphora.py” from the IDA Pro menu File → Script file. Once the script diaphora.py is executed, a dialog like the following one will be opened:

This dialog, although it can be a bit confusing at first, is used for both exporting the current IDA database to SQLite format as well as for performing diffing against another SQLite exported format database. The first field is the path of the SQLite file format database that will be created with all the information extracted from the current database. The 2nd field is the other SQLite format database to diff the current database against. If this field is left empty, Diaphora will just export the current database to SQLite format. If the 2nd field is not empty, it will diff both databases. The other fields, the check-boxes, are explained below:

  1. Use the decompiler if available. If the Hex-Rays decompiler is installed with IDA and IDA Python bindings are available, Diaphora will use the decompiler to get much interesting information that will help during the bindiffing process.
  2. Export only non-IDA generated functions. Self-explanatory, only functions with non-IDA autogenerated names will be exported.
  3. Do not export instructions and basic blocks. Export only function summaries. When exporting huge databases, it may help speed up operations. However, the diffing capabilities will be more limited.
  4. Use probably unreliable methods. Diaphora uses many heuristics to try to match functions in both databases being compared. However, some heuristics are not really reliable or the ratio of similarity is very low. Check this box if you want to see also the likely unreliable matches Diaphora my find. Unreliable results are shown in a specific list, it doesn’t mix the “Best results” (results with a ratio of 1.00) with the “Partial results” (results with a ratio of 0.50 or higher) or “Unreliable results”.
  5. Use slow heuristics. Some heuristics can be quite expensive and take longer. For medium to big databases, it’s disabled by default and is recommended to leave unchecked unless the results from an execution with this option disabled are not good enough. It will likely find better matches than the normal, not that slow, heuristics, but it will take significantly longer.
  6. Relaxed calculations of different ratios. Diaphora uses, by default, a kind of aggressive method to calculate difference ratios between matches. It’s possible to relax that aggressiveness level by checking this option. Under the hood, the function SequenceMatcher.quick_ratio is used when this option is unchecked and SequenceMatcher.real_quick_ratio when this option is checked. Also, when the option is checked, Diaphora will use to the different ratio of the primes numbers calculated from the AST of the pseudo-code of the 2 functions, calculating the highest ratio from the AST, assembly and pseudo-code comparisons.
  7. Use experimental heuristics. It says it all: experimental heuristics are enabled only if this check-box is marked. Disabled by default as they are likely not useful.
  8. Ignore automatically generated names. When performing the comparison between databases, it tells Diaphora to ignore in the “Same name” heuristic functions with the same IDA’s autogenerated name (i.e., when there are two function sub_01020304 in both databases but they aren’t actually the same function). Used only when comparing.
  9. Ignore all function names. Just disable the “Same name” heuristic. Used only when comparing.
  10. Ignore small functions. Ignore functions with less than 6 assembly instructions. Used only when comparing.

Tutorial

Example

This is a screenshot of Diaphora diffing the PEGASUS iOS kernel Vulnerability fixed in iOS 9.3.5:

 

And this is an old screenshot of Diaphora diffing the Microsoft bulletin MS15-034:

 

These are some screenshots of Diaphora diffing the Microsoft bulletin MS15-050, extracted from the blog post Analyzing MS15-050 With Diaphora from Alex Ionescu.

Copyright (C) 2015 joxeankoret

Source: https://github.com/joxeankoret/