ShellSweep: detect potential webshell files in a specified directory
ShellSweep
“ShellSweep” is a PowerShell/Python/Lua tool designed to detect potential webshell files in a specified directory.
ShellSheep and its suite of tools calculate the entropy of file contents to estimate the likelihood of a file being a webshell. High entropy indicates more randomness, which is a characteristic of encrypted or obfuscated codes often found in webshells.
- It only processes files with certain extensions (.asp, .aspx, .asph, .php, .jsp), which are commonly used in webshells.
- Certain directories can be excluded from scanning.
- Files with certain hashes can be ignored during the scan.
How does ShellSweep find the shells?
Entropy, in the context of information theory or data science, is a measure of the unpredictability, randomness, or disorder in a set of data. The concept was introduced by Claude Shannon in his 1948 paper “A Mathematical Theory of Communication“.
When applied to a file or a string of text, entropy can help assess the randomness of the data. Here’s how it works: If a file consists of completely random data (each byte is just as likely to be any value between 0 and 255), the entropy is high, close to 8 (since log2(256) = 8).
If a file consists of highly structured data (for example, a text file where most bytes are ASCII characters), the entropy is lower. In the context of finding webshells or malicious files, entropy can be a useful indicator:
- Many obfuscated scripts or encrypted payloads can have high entropy because the obfuscation or encryption process makes the data look random.
- A normal text file or HTML file would generally have lower entropy because human-readable text has patterns and structure (certain letters are more common, words are usually separated by spaces, etc.). So, a file with unusually high entropy might be suspicious and worth further investigation. However, it’s not a surefire indicator of maliciousness — there are plenty of legitimate reasons a file might have high entropy, and plenty of ways malware might avoid causing high entropy. It’s just one tool in a larger toolbox for detecting potential threats.
ShellSweep includes a Get-Entropy function that calculates the entropy of a file’s contents by:
- Counting how often each character appears in the file.
- Using these frequencies to calculate the probability of each character.
- Summing -p*log2(p) for each character, where p is the character’s probability. This is the formula for entropy in information theory.
ShellScan
ShellScan provides the ability to scan multiple known bad webshell directories and output the average, median, minimum, and maximum entropy values by file extension.
Pass ShellScan.ps1 some directories of webshells, any size set. I used:
- https://github.com/tennc/webshell
- https://github.com/BlackArch/webshells
- https://github.com/tarwich/jackal/blob/master/libraries/
This will give a decent training set to get entropy values.
Output example:
ShellCSV
First, let’s break down the usage of ShellCSV and how it assists with identifying entropy of the good files on disk. The idea is that defenders can run this on web servers to gather all files and entropy values to better understand what paths and extensions are most prominent in their working environment.
See ShellCSV.csv as an example output.
Download & Use
Copyright (C) 2024 MHaggis