Columbo
Columbo is a computer forensic analysis tool used to simplify and identify specific patterns in compromised datasets. It breaks down data to small sections and uses pattern recognition and machine learning models to identify adversaries behaviour and their possible locations in compromised Windows platforms in a form of suggestions. Currently, Columbo operates on the Windows platform.
Dependencies & High-Level Architecture
Columbo depends on volatility 3, autorunsc.exe, and sigcheck.exe to extract data. Therefore users must download these dependent tools and place them under \Columbo\bin folder. Please Make sure you Read and Understand the license section (or License.txt file) before you download anything. The output (data) generated by these tools are automatically piped to Columbo’s main engine. It breaks it down to small sections, pre-processes it, and applies machine learning models to classify the location of the compromised system, executable files, and other behaviours.
Columbo and Machine Learning
Columbo uses data preprocessing to organise the data and machine learning models to identify suspicious behaviours. Its outputs are either 1 (suspicious) or 0 (genuine) -in a form of suggestions purely to assist digital forensic examiners in their decision-making. We have trained the models with different examples to maximise accuracy and used different approaches to minimise false positives. However, false positives (false detection) are still experienced and therefore we are committed to update the models periodically.
False Positive
It’s not easy to reduce false positives (false detection), especially when we deal with machine learning. The output generated by machine learning models might be false positive depending on the quality of the data used to train the models. However, to assist forensic examiners in their investigation, Columbo generates percentage scores for each 1 (suspicious) and 0 (genuine). Such approach helps the examiners to pick and choose the path, command or processes that Columbo classifies them as suspicious.
Options to Select
Option 2
Live analysis -files and process traceability. This option analyses running Windows processes to identify running malicious activities if any. Columbo uses autorunsc.exe to extract the data from the machine, the outputs are piped to Machine Learning models and pattern recognition engines to classify suspicious activities. Later the outputs are saved under \Columbo\ML\Step-2-results in a form of excel files for further analysis. Furthermore, users are given options to examine running processes. The result contains information such as process traceability, commands that are associated with each process -if applicable and whether or not, the processes are responsible for executing new processes.
Option 3
Scan and analyse Hard Disk Image File (.vhdx): This option takes paths of mounted Hard Disk Image of Windows. It uses sigcheck.exe to extract the data from the file systems. Then the results are piped into Machine Learning models to classify suspicious activities. Further, the outputs are saved under \Columbo\ML\Step-3-results in a form of excel files.
Option 4
Memory Forensics. In this option, Columbo takes the path of the memory image and the following options are produced for users to select.
- Memory Information: Volatility 3 is used to extract information about the image.
- Processes Scan: Volatility 3 is used to extract process, dll, and handle information of each process. Then, Columbo uses grouping and clustering mechanisms to group each process according to their mother processes. This option is later used by the process traceability under the Anomaly Detection option.
- process Tree: Volatility 3 is used to extract the process tree of the processes.
- Anomaly Detection and Process Traceability: Volatility 3 is used to extract a list of Anomaly Detection processes. However, Columbo gives an option called Process Traceability to separately examine each process and collectively produce the following information.
- Paths of the executable files and associated commands.
- Using Machine Learning models to determine the legitimacy of the identified processes.
- Trace each process all the way back to their root processes (complete path) and their execution dates and time.
- Identify if the process is responsible for executing other processes i.e. is it going to be a mother process of new processes or not.
- It extracts, handles, and dlls information of each process and presents them with the rest of the information.
Changelog v0.2.2.1
- Added new features
Install & Use
Copyright 2020 Visma