yabin: A Yara rule generator for finding related samples and hunting
Yabin creates Yara signatures from executable code within malware. Given one sample of malware, you can then find other samples that share code.
It does this by looking for rare functions in a given malware sample. It identifies functions by looking for common function “prologs” which define the start of functions (eg; 55 8B EC will often indicate the start of a function in software compiled by Microsoft Visual Studio). A whitelist taken from 100 Gb of non-malicious software is used to ignore common library functions.
Yabin is a prototype testing out an approach – rather than intended for production use.
Download
git clone https://github.com/AlienVault-OTX/yabin.git
Use
A whitelist is included in the repository, but it’s recommended you download the larger one (140Mb) from here and replace db.db.
More detailed information is below, but the help command provides an overview:
Generate Yara rules for malware
Yabin can create signatures for malware based on the rare functions they have.
Creates the following Yara rule:
Hunt for code re-use amongst malware
It is common to want to find malware samples that share code. Perhaps you are researching a malware family and want to find more sample you don’t know about yet. Perhaps you want to hunt for suspicious binaries that exist on your network, and want to look at any file that shares code with a set of malware.
Example
I generated a hunt Yara rule for a sample of WannaCry like so:
Running these against a database of malware samples shows a match for a sample of Contopee. Contopee is a family of malware associated with a very interesting group of attackers called Lazarus that are likely based out of North Korea.
However, I didn’t find this – that credit goes to Neel Mehta of Google and further findings by Symantec. The same group of attackers were also linked to attacks against the global SWIFT banking network by other code re-uses.
Hunting for related samples in practice
Even with a large whitelist, it’s likely some of the re-used functions you’re searching for are non-malicious libraries. I’d recommend running the Yara hunt rule against a small data-set, then prune any false positives, before running across a large malware corpus.
When used with VirusTotal Intelligence this means running the rules for a brief period, then pruning ones that false positive before either leaving them running or running a retro-hunt.
If you’re interested in identifying code re-use, you may also like Binarly (now owned by Crowdstrike) / VXClass / Intezer / MalTindex / IceWater.
Clustering Malware
Yabin does an “ok” job of clustering malware based on code re-use. I generated Yara rules with Yabin for 300 samples from the attackers known as APT1.
Below you can see the results of running the Yara rules against the set when displayed in Maltego:
Tight Yara rules – python yabin.py -y
This shows samples with significant overlaps. For example, the group at the top left are all from the malware family “Starsypound”. Many files (not shown) don’t match any other files.
Hunt Yara rules python yabin.py -yh
This shows samples with any overlapping code. The malware samples are significantly more interconnected, and clusters contain a number of different malware families.
You can view this data in the ./examples/ folder.
Copyright 2017 chrisdoman