What does yarGen do?
yarGen is a generator for YARA rules
The main principle is the creation of Yara rules from strings found in malware files while removing all strings that also appear in goodware files. Therefore yarGen includes a big goodware strings and opcode database as ZIP archives that have to be extracted before the first use.
The rule generation process also tries to identify similarities between the files that get analyzed and then combines the strings to so-called super rules. The super rule generation does not remove the simple rule for the files that have been combined in a single super rule. This means that there is some redundancy when super rules are created. You can suppress a simple rule for a file that was already covered by super rule by using –nosimple.
- feat: more regular expressions for better string extraction
- Make sure you have at least 4GB of RAM on the machine you plan to use yarGen (8GB if opcodes are included in rule generation, use with –opcodes)
- Download the latest release from the “release” section
- Install all dependencies with sudo pip install scandir lxml naiveBayesClassifier pefile (@twpDone reported that in case of errors try sudo pip install pefile and sudo pip install scandir lxml naiveBayesClassifier)
- Run python yarGen.py –update to automatically download the built-in databases. They are saved into the ‘./dbs’ subfolder. (Download: 913 MB)
- See help with python yarGen.py –help for more information on the command line parameters
Warning: yarGen pulls the whole goodstring database to memory and uses at least 3 GB of memory for a few seconds – 6 GB if opcodes evaluation is activated (–opcodes).
I’ve already tried to migrate the database to sqlite but the numerous string comparisons and lookups made the analysis painfully slow.
Command Line Parameters
See the following blog posts for a more detailed description of how to use yarGen for YARA rule creation:
yarGen – Yara Rule Generator, Copyright (c) 2015, Florian Roth
All rights reserved.