ThePhish: automated phishing email analysis tool
ThePhish
ThePhish is an automated phishing email analysis tool based on TheHive, Cortex, and MISP. It is a web application written in Python 3 and based on Flask that automates the entire analysis process starting from the extraction of the observables from the header and the body of an email to the elaboration of a verdict which is final in most cases. In addition, it allows the analyst to intervene in the analysis process and obtain further details on the email being analyzed if necessary. In order to interact with TheHive and Cortex, it uses TheHive4py and Cortex4py, which are the Python API clients that allow using the REST APIs made available by TheHive and Cortex respectively.
The following diagram shows how ThePhish works at high-level:
- An attacker starts a phishing campaign and sends a phishing email to a user.
- A user who receives such an email can send that email as an attachment to the mailbox used by ThePhish.
- The analyst interacts with ThePhish and selects the email to analyze.
- ThePhish extracts all the observables from the email and creates a case on TheHive. The observables are analyzed thanks to Cortex and its analyzers.
- ThePhish calculates a verdict based on the verdicts of the analyzers.
- If the verdict is final, the case is closed and the user is notified. In addition, if it is a malicious email, the case is exported to MISP.
- If the verdict is not final, the analyst’s intervention is required. He must review the case on TheHive along with the results given by the various analyzers to formulate a verdict, then it can send the notification to the user, optionally export the case to MISP and close the case.
Implementation
ThePhish is a web application written in Python 3. The web server is implemented using Flask, while the front-end part of the application, which is the dynamic page written in HTML, CSS, and JavaScript, is implemented using Bootstrap. Apart from the webserver module, the back-end logic of the application is constituted by three Python modules that encapsulate the logic of the application itself and a Python class used to support the logging facility through the WebSocket protocol. If you want to see a graphical representation of the application logic, click here. Moreover, there are several configuration files used by the aforementioned modules that serve various purposes.
When the analyst navigates to the base URL of the application, the web page of ThePhish is loaded and a bi-directional connection is established with the server. This is done by using the Socket.IO JavaScript library in the web page that enables real-time, bi-directional, and event-based communication between the browser and the server. This connection is established with a WebSocket connection whenever possible and will use HTTP long polling as a fallback. For this to work, the server application uses the Flask-SocketIO Python library, which provides a Socket.IO integration for Flask applications. This connection is then used by ThePhish to display the progress of the analysis on the web interface.
Every time the analyst performs an action on the web interface, an AJAX request is sent to the server, which is an asynchronous HTTP request that permits to exchange of data with the server in the background and updates the page without reloading it. This allows the analyst both to visualize the list of emails to analyze and to make the analysis start.
ThePhish interacts with TheHive and Cortex thanks to TheHive4py and Cortex4py. Moreover, it interacts with an IMAP server to retrieve the emails to analyze.
Install & Use
Copyright (C) 2021 emalderson