streamingphish: uses supervised machine learning to detect phishing domains

by do son · Published September 28, 2018 · Updated September 28, 2018

StreamingPhish

This is a utility that uses supervised machine learning to detect phishing domains from the Certificate Transparency log network. The firehose of domain names and SSL certificates are made available thanks to the certstream network (certstream.calidog.io). All of the data required for training the initial predictive model is included in this project as well.

Also included is a Jupyter notebook to help explain each step of the supervised machine learning lifecycle (as it pertains to this project).

This application consists of three main components:

Jupyter notebook
- Demonstrates how to train a phishing classifier from start to finish.
CLI utility
- Trains classifiers and evaluates domains in manual mode or against the Certificate Transparency log network (via certstream).
Database
- Stores trained classifiers, performance metrics, and code for feature extraction.

Each segment has been functionally decomposed into its own Docker container. The application is designed to be built and operated via Docker Compose.

Install

Linux

git clone https://github.com/wesleyraptor/streamingphish.git

 cd streamingphish/

 sudo ./install_streamingphish.sh

Windows or Mac OSX

Installation on other platforms works if docker and docker-compose are already installed. Run the following command to build and start the containers: