dftimewolf: orchestrating forensic collection, processing and data export
DFTimewolf
A framework for orchestrating forensic collection, processing and data export.
dfTimewolf consists of collectors, processors, and exporters (modules) that pass data on to one another. How modules are orchestrated is defined in predefined “recipes”.
Architecture
Three main objects
The main concepts you need to be aware of when digging into dfTimewolf’s codebase are:
- Modules
- Recipes
- The state attribute
Modules are individual Python objects that will (for the most part) take some kind of input and produce some kind of output. Recipes are instructions that define how modules are chained, essentially defining which Module’s output becomes another Module’s input. Input and output are all stored in a State object that is attached to each module.
Modules
Modules all extend the BaseModule class, and implement the setup, process and cleanup methods.
setup is what is called with the recipe’s modified arguments. Actions here should include things that have low overhead and can be accomplished sequentially with no big delay, like checking for permissions on a cloud project, creating an analysis VM, verifying that a file exists, etc.
process is where all the magic happens – here is where you’ll want to parallelize things as much as possible (copying a disk, running plaso, etc.). You’ll be adding information to the state (e.g. processed plaso files) in the module’s output as you go. You can access a previous module’s output (i.e. your input) using self.state.input and manipulate the current module’s output using self.state.output.
cleanup is mostly optional, in case you manipulated the state in a way that needs post-processing (e.g. adding a “# out of #” description to the module’s output)
Recipes
Recipes are a Python dictionary that describe how Modules are chained, and which parameters can be ingested from the command-line. These dictionaries have a few specific keys:
- name: This is the name with which the recipe will be invoked (e.g. local_plaso)
- short_description: This is what will show up in the help message when invoking dftimewolf -h
- modules: A list of dicts describing modules and their corresponding arguments.
- name: The name of the module class that will be instantiated
- args: A list of (argument_name, argument) tuples that will be passed on to the module’s setup() method. If argument starts with an @, it will be replaced with its corresponding value from the command-line or the ~/.dftimewolfrc file.
Recipes need to describe the way arguments are handled in a global args variable. This variable is a list of (switch, help_message, default_value) tuples that will be passed to the argparse.add_argumentmethod for later parsing.
State
The State object is an instance of the DFTimewolfState class. It has a couple of useful methods:
- add_error: Used by modules to indicate that an error occurred during execution (e.g. missing file, unauthorized access).
- check_errors: Display any errors that have been added. If any critical errors were added, dftimewolf will stop the execution of the recipe and exit. Non-critical errors will just be displayed and execution will continue.
- cleanup: Resets the state: moves the output data to the input attribute and clears the output for the next Module. Moves remaining (and therefore non-critical) errors to global_errors for later processing.
What happens when you run a recipe
The dftimewolf cycle is as follows:
- The recipe is parsed, and the first Module is instantiated
- Command-line arguments are taken into account and passed to Module’s setup method.
- Errors are checked
- The module’s process method is called
- Errors are checked
- Cleanup occurs; the output becomes input and the process is repeated with the next module in the recipe.
Installation
$ pip install pipenv
$ git clone https://github.com/log2timeline/dftimewolf.git && cd dftimewolf
$ pipenv install -e .
Use
dfTimewolf is typically run by specifying a recipe name and any arguments the recipe defines. For example:
$ dftimewolf local_plaso /tmp/path1,/tmp/path2 –incident_id 12345
This will launch the local_plaso recipe against path1 and path2 in /tmp. In this recipe –incident_id is used by Timesketch as a sketch description.
Details on a recipe can be obtained using the standard python help flags:
To get more help on a recipe’s specific flags, specify a recipe name before the -h flag: