msticpy v2.8 releases: Microsoft Threat Intelligence Security Tools
MSTIC Jupyter and Python Security Tools
Microsoft Threat Intelligence Python Security Tools.
The msticpy package was initially developed to support Jupyter Notebooks authoring for Azure Sentinel. Many of the included tools can be used in other security scenarios for threat hunting and threat investigation.
There are three main sub-packages:
- sectools – Python security tools to help with data enrichment, analysis, or investigation.
- nbtools – Jupyter-specific UI tools such as widgets, plotting, and other data display.
- data – data layer and pre-defined queries for Azure Sentinel, MDATP, and other data sources.
Security Tools Sub-package – sectools
This subpackage contains several modules helpful for working on security investigations and hunting:
base64unpack
Base64 and archive (gz, zip, tar) extractor. Input can either be a single string or a specified column of a pandas dataframe. It will try to identify any base64 encoded strings and decode them. If the result looks like one of the supported archive types it will unpack the contents. The results of each decode/unpack are rechecked for further base64 content and will recurse down up to 20 levels (default can be overridden). Output is to a decoded string (for single string input) or a DataFrame (for dataframe input).
iocextract
Uses a set of built-in regular expressions to look for Indicator of Compromise (IoC) patterns. Input can be a single string or a pandas dataframe with one or more columns specified as input.
The following types are built-in:
- IPv4 and IPv6
- URL
- DNS domain
- Hashes (MD5, SHA1, SHA256)
- Windows file paths
- Linux file paths (this is kind of noisy because a legal Linux file path can have almost any character)
You can modify or add to the regular expressions used at runtime.
The output is a dictionary of matches (for single string input) or a DataFrame (for dataframe input).
tiproviders
The TILookup class can lookup IoCs across multiple TI providers. built-in providers include AlienVault OTX, IBM XForce, VirusTotal, and Azure Sentinel.
The input can be a single IoC observable or a pandas DataFrame containing multiple observables. Depending on the provider, you may require an account and an API key. Some providers also enforce throttling (especially for free tiers), which might affect performing bulk lookups.
For more details see TIProviders and TILookup Usage Notebook
vtlookup
Wrapper class around Virus Total API. Input can be a single IoC observable or a pandas DataFrame containing multiple observables. Processing requires a Virus Total account and API key and processing performance is limited to the number of requests per minute for the account type that you have. Support IoC Types:
- Filehash
- URL
- DNS Domain
- IPv4 Address
geoip
Geographic location lookup for IP addresses.
This module has two classes for different services:
- GeoLiteLookup – Maxmind Geolite (see https://www.maxmind.com)
- IPStackLookup – IPStack (see https://ipstack.com)
Both services offer a free tier for non-commercial use. However, a paid tier will normally get you more accuracy, more detail, and a higher throughput rate. Maxmind geolite uses a downloadable database, while IPStack is an online lookup (API key required).
eventcluster
This module is intended to be used to summarize large numbers of events into clusters of different patterns. High volume repeating events can often make it difficult to see unique and interesting items.
This is an unsupervised learning module implemented using SciKit Learn DBScan.
The module contains functions to generate clusterable features from string data. For example, an administration command that does some maintenance on thousands of servers with a commandline like the following
install-update -hostname {host.fqdn} -tmp:/tmp/{GUID}/rollback
can be collapsed into a single cluster pattern by ignoring the character values of the host and guides in the string and using delimiters or tokens to group the values. This allows you to more easily see distinct patterns of activity.
outliers
Similar to the eventcluster module, but a little bit more experimental (read ‘less tested’). It uses SkLearn Isolation Forest to identify outlier events in a single data set or using one data set as training data and another on which to predict outliers.
auditdextract
Module to load and decode Linux audit logs. It collapses messages sharing the same message ID into single events, decodes hex-encoded data fields, and performs some event-specific formatting and normalization (e.g. for process start events it will re-assemble the process command-line arguments into a single string).
This is still a work-in-progress.
syslog_utils
Module to support an investigation of a Linux host with only syslog logging enabled. This includes functions for collating host data, clustering logon events, and detecting user sessions containing suspicious activity.
cmd_line
A module to support the detection of known malicious command line activity or suspicious patterns of command line activity.
nbtools
Notebook tools sub-package – This is a collection of display and utility modules designed to make working with security data in Jupyter notebooks quicker and easier.
- nbwidgets – groups common functionality such as list pickers, time boundary settings, saving and retrieving environment variables into a single line callable command.
- nbdisplay – functions that implement the common display of things like alerts, events in a slightly more consumable way than print()
- entityschema – implements entity classes (e.g. Host, Account, IPAddress) used in Log Analytics alerts and in many of these modules. Each entity encapsulates one or more properties related to the entity.
Notebook Tools Notebook and Event Timeline Visualization
Data sub-package – data
These components are currently still part of the nbtools sub-package but will be refactored to separate them into their own package.
- QueryProvider – extensible query library targeting Log Analytics or OData endpoints. Built-in parameterized queries allow complex queries to be run from a single function call. Add your own queries using a simple YAML schema.
- security_alert and security_event – encapsulation classes for alerts and events.
- entity_schema – definitions for multiple entities (Host, Account, File, IPAddress, etc.)
Each has a standard ‘entities’ property reflecting the entities found in the alert or event. These can also be used as meta-parameters for many of the queries. For example, the following query will extract the value for the hostname
query parameter from the alert:
qry.list_host_logons(query_times, alert)
Changelog v2.8
A few bugs had crept in over the last couple of releases: some due to buggy coding, some due the world moving forward. So, many items in this release are to address these.
Among the feature improvements are the following:
- Documentation and scripts from @ccianelli22 for creating a MSTICPy install for use in isolated (no Internet) environments. This is super useful for customers operating in sovereign clouds or other air-gapped high-security environments.
- Added Splunk authentication method using security token rather than username/password – thanks @Tatsuya-hasegawa
- Query yaml file validation by @FlorianBracq
- Paging for large CyberReason queries by @FlorianBracq
- Modern method to obtain cloud-specific URL endpoints for Azure services. Previously, we were relying on msrestazure, which is now deprecated for this purpose. Many thanks to @ccianelli22 for the work to do this.
- Fix (by me) for a bug I’d introduced with the switch to using Azure-monitor-query library for MS Sentinel. When using a connection string with this new driver, the logic failed to parse and extract details from this correctly. Many thanks to @cindraw for reporting this bug.
What’s Changed
- Update mde_proc_pub.pkl by @FlorianBracq in #709
- Update Introduction.rst by @praveenjutur in #700
- Update methodology of getting endpoints for cloud environment by @ccianelli22 in #704
- Validation of the YAML structure of query files by @FlorianBracq in #660
- Intsights api update by @FlorianBracq in #710
- Fix m365d/mde hunting query options by @Tatsuya-hasegawa in #702
- Cybereason pagination support + multi-threading by @FlorianBracq in #707
- Add bearer token auth to splunk driver by @Tatsuya-hasegawa in #708
- fix wl bug when creating a new wl when wl count is 0 by @ccianelli22 in #719
- Update installation docs to include installation for isolated envs by @ccianelli22 in #715
- Fixing regular expression error for connection string in WorkspaceConfig by @ianhelle in #706
- Fix documentation formatting, update steps for downloading msticpy by @ccianelli22 in #720
Install & Use
Copyright (c) Microsoft Corporation. All rights reserved.