python-iocextract v1.14 releases: Advanced Indicator of Compromise (IOC) extractor
iocextract
This library extracts URLs, IP addresses, MD5/SHA hashes, email addresses, and YARA rules from text corpora. It includes some encoded and “defanged” IOCs in the output, and optionally decodes/refangs them.
This library currently supports the following IOCs:
-
- IP Addresses
-
- IPv4 fully supported
- IPv6 partially supported
-
- URLs
-
- With protocol specifier: http, https, tcp, udp, ftp, sftp, ftps
- With
[.]
anchor, even with no protocol specifier - IPv4 and IPv6 (RFC2732) URLs are supported
- Hex-encoded URLs with protocol specifier: http, https, ftp
- URL-encoded URLs with protocol specifier: http, https, ftp, ftps, sftp
- Base64-encoded URLs with protocol specifier: http, https, ftp
-
- Emails
-
- Partially supported, anchoring on
@
orat
- Partially supported, anchoring on
-
- YARA rules
-
- With imports, includes, and comments
-
- Hashes
-
- MD5
- SHA1
- SHA256
- SHA512
-
- Custom regex
-
- With exactly one capture group
The Problem
It is common practice for malware analysts or endpoint software to “defang” IOCs such as URLs and IP addresses, in order to prevent accidental exposure to live malicious content. Being able to extract and aggregate these IOCs is often valuable for analysts. Unfortunately, existing “IOC extraction” tools often pass right by them, as they are not caught by standard regex.
For example, the simple defanging technique of surrounding periods with brackets:
127[.]0[.]0[.]1
Existing tools that use a simple IP address regex will ignore this IOC entirely.
The Solution
By combining specially crafted regex with some custom postprocessing, we are able to both detect and deobfuscate “defanged” IOCs. This saves time and effort for the analyst, who might otherwise have to manually find and convert IOCs into a machine-readable format.
Changelog v1.14
- Fixed issue where defanging automatically defaulted to the http/https protocol. Now allows the user to define this functionality (#32, #34)
- Added the ability to extract IP addresses (IPv4) with a 4th octet (i.e. 10.10.10.10.4444) (#31)
- Updated email regex to now extract emails addresses with a first + last name structure (i.e. first[.]last@domain[.]com) (#36)
Features
- Added easier argparse options to allow a simpler version of pre-existing options
- Minor improvements to IPv6 extraction
Install
pip install iocextract
Use
Tutorial
Copyright (c) 2018, InQuest
Source: https://github.com/inquest/