subparse: Modular malware analysis artifact collection and correlation framework
subparse
Subparse is a modular framework developed by Josh Strochein, Aaron Baker, and Odin Bernstein. The framework is designed to parse and index malware files and present the information found during the parsing in a searchable web-viewer. The framework is modular, making use of a core parsing engine, parsing modules, and a variety of enrichers that add additional information to the malware indices. The main input values for the framework are directories of malware files, which the core parsing engine or a user-specified parsing engine parses before adding additional information from any user-specified enrichment engine all before indexing the information parsed into an elasticsearch index. The information gathered can then be searched and viewed via a web-viewer, which also allows for filtering of any value gathered from any file. There are currently 3 parsing engines, the default parsing modules (ELFParser, OLEParser, and PEParser), and 4 enrichment modules (ABUSEEnricher, CAPEEnricher, STRINGEnricher, and YARAEnricher).
General Information Collected
Before any parser is executed general information is collected about the sample regardless of the underlying file type. This information includes:
- MD5 hash of the sample
- SHA256 hash of the sample
- Sample name
- Sample size
- Extension of sample
- Derived extension of sample
Parser Modules
Parsers are ONLY executed on samples that match the file type. For example, PE files will by default have the PEParser executed against them due to the file type corresponding with those the PEParser is able to examine.
Default Modules
ELFParser
This is the default parsing module that will be executed against ELF files. Information that is collected:
- General Information
- Program Headers
- Section Headers
- Notes
- Architecture Specific Data
- Version Information
- Arm Unwind Information
- Relocation Data
- Dynamic Tags
OLEParser
This is the default parsing module that will be executed against OLE and RTF formatted files, this uses the OLETools package to obtain data. The information that is collected:
- Meta Data
- MRaptor
- RTF
- Times
- Indicators
- VBA / VBA Macros
- OLE Objects
PEParser
This is the default parsing module that will be executed against PE files that match or include the file types: PE32 and MS-Dos. Information that is collected:
- Section code and count
- Entry point
- Image base
- Signature
- Imports
- Exports
Enricher Modules
These modules are optional modules that will ONLY get executed if specified via the -e | –enrichers flag on the command line.
Default Modules
ABUSEEnricher
This enricher uses the [Abuse.ch](https://abuse.ch/) API and [Malware Bazaar](https://bazaar.abuse.ch) to collect more information about the sample(s) subparse is analyzing, the information is then aggregated and stored in the Elastic database.
CAPEEnricher
This enricher is used to communicate with a CAPEv2 Sandbox instance, to collect more information about the sample(s) through dynamic analysis, the information is then aggregated and stored in the Elastic database utilizing the Kafka Messaging Service for background processing.
STRINGEnricher
This enricher is a smart string enricher, that will parse the sample for potentially interesting strings. The categories of strings that this enricher looks for include: Audio, Images, Executable Files, Code Calls, Compressed Files, Work (Office Docs.), IP Addresses, IP Address + Port, Website URLs, and Command Line Arguments.
YARAEnricher
This ericher uses a pre-compiled yara file located at: parser/src/enrichers/yara_rules. This pre-compiled file includes rules from VirusTotal and YaraRulesProject
Install & Use
Copyright (c) 2022 Josh Stroschein