Web Shell Analyzer: Web shell scanner and analyzer

by do son · October 7, 2020

Web Shell Analyzer

Web shell analyzer is a cross-platform stand-alone binary built solely for the purpose of identifying, decoding, and tagging files that are suspected to be web shells. The web shell analyzer is the bigger brother to the web shell scanner project, which only scans files via regex, no decoding, or attribute analysis.

Disclaimer

The regex and its built-in decoding routines supplied with the scanner are not guaranteed to find every web shell on disk and maybe identify some false positives. It’s also recommended you test the analyzer and assess its impact before running on production systems. The analyzer has no warranty, use at your own risk.

Features

Cross-platform, statically compiled binary.
JSON output
Currently supports most PHP, ASP/X web shells. JSP/X, CFM, and other types are in the works.
Recursive, multi-threaded scanning capable of iterating through nested directories quickly
Ability to handle multiple layers of obfuscated web shells such as base64, gzinflate, and char code.
Supports PRE/POST actions which powers layered de-obfuscated and decoding for the analysis engine
Tunable regex logic with modular interfaces to easily extend the capabilities of the analyzer
Tunable attribute tagging
Raw content captures upon the match
System Info
Tested against the web shell repo

PRE/POST Actions

Every file that is scanned can be run through PRE and/or POST action:

PRE-Decoding: Functions invoked BEFORE matching is performed, such as base64 decoding or string replacement.
POST-Decoding: Functions invoked AFTER matching is performed, such as url defanging.

The idea behind PreDecodeActions functions was to use regex to identify a matching string or pattern, acquire its raw match contents, perform defined decoding/cleanup steps, and send the final output back to the analysis engine for re-scanning/processing. A very simple example of this is Base64 decoding. In order to check for any detection logic against a base64 encoded web shell, we must first remove any/all layers of base64. Todo this, we could use the following PreDecodeAction:

{

    Name:        "PHP_Base64Decode",

    Regex:       *regexp.MustCompile(`(?i)(?:=|\s+)(base64_decode\('('?\"?[A-Za-z0-9+\/=]+'?\"?))`),

    DataCapture: *regexp.MustCompile(`(?i)((?:'|")[A-Za-z0-9+\/=]+(?:'|"))`),

    PreDecodeActions: []cm.Action{

        {Function: cm.StringReplace, Arguments: []interface{}{"\"", "", -1}},

        {Function: cm.StringReplace, Arguments: []interface{}{"'", "", -1}},

    },

    Functions: []cm.Base_Func{cm.DecodeBase64},

},

Looking at the block above, we first have the name of the function, the regex used to match, data capture regex (sometimes you may want to tweak what to capture vs what matches), and PreDecodeActions. In this case, BEFORE the function cm.DecodeBase64 is applied to the matching text, the system will first remove the following items ” and ‘. PostDecodeActions works the opposite, where the output is checked AFTER decoding is performed. Using this model, we can make multiple custom decoders that have infinite PRE/POST and decoding functions to handle most web shell analysis needs.

Detections

Detection is a regex accompanied by a name and description. The idea behind this model was to make detections modular and scalable and kept context with the actual detection. Detections share the same format as attributes, minus attributes cannot generate a detection, they can only add context to an existing detection. Let’s look at the example detect logic block below:

{

    Name:        "Generic_Embedded_Executable",

    Description: "Looks for magic bytes associated with a PE file",

    Regex:       *regexp.MustCompile(`(?i)(?:(?:0x)?4d5a)`),

},

Based on the regex, we can see it’s looking for an embedded Windows PE file based off the magic header bytes 4D 5A. If found, this would lead to detection and a JSON report would be generated for the file. Currently, detections are applied based on the file extension or generically for all file types. For example, decoding routines for PHP are defined under cm.GlobalMap.Function_Php and tags for either attribute are defined under cm.GlobalMap.Tags_Php. The functions cm.GlobalMap.Function_Generics and tags under cm.GlobalMap.Tags_Generics apply to ALL web shell extensions as a catch-all.

Attributes

Attribute tagging is a new concept I created that adds “context” to an existing web shell detection. Attributes alone cannot currently generate detection on their own. In a traditional scan engine, a scanner would only alert if a web shell was detected but provide little to no additional context into what capabilities (attributes) the web shell potentially has. Attribute tags work the same as detection logic, however, they only show after detection has been identified and cannot generate detections on their own. Looking at the example logic below:

cm.GlobalMap.Tags_Php = []cm.TagDef{

    {

        Name:        "PHP_Database_Operations",

        Description: "Looks for common PHP functions used for interacting with a database.",

        Regex:       *regexp.MustCompile(`(?i)(?:'mssql_connect\|mysql_exec\()`),

        Attribute: 	 true,

    },

}

We see that under the struct Tags_Php, we have created a new PHP tag. When a match is found during the scanning, the Attribute flag is checked and if set to True, the detected web shell will have the tag PHP_Database_Operations appended to its JSON report along with the frequency and matching text block, as shown in the example output below:

{

   "filePath": "/testers/1.php",

   "size": 66109,

   "md5": "6793d8ebab93e5a0f91e5a331221f331",

   "timestamps": {

      "birth": "2019-02-03 02:02:22",

      "created": "2020-07-29 02:50:15",

      "modified": "2019-02-03 02:02:22",

      "accessed": "2020-07-29 02:51:07"

   },

   "matches": {

      "FilesMAn": 5,

      "FilesMan": 29,

      "cmd": 20,

      "eval(": 4,

      "exec(": 2,

      "ipconfig": 1,

      "netstat": 2,

      "passthru(": 1,

      "shell_exec(": 1

   },

   "decodes": {

      "Generic_Base64Decode": 40,

      "Generic_Multiline_Base64Decode": 165

   },

   "tags": {

      "Generic_Embedding_Code_C": {

         "bind(": 2,

         "listen(": 2

      },

      "PHP_Banned_Function": {

         "exec(": 3,

         "get_current_user(": 1,

         "getmyuid(": 1,

         "link(": 7,

         "listen(": 2,

         "passthru(": 1,

         "realpath(": 1,

         "set_time_limit(": 1

      },

      "PHP_Database_Operations": {

         "mysql_query(": 1

      },

      "PHP_Disk_Operations": {

         "@chmod(": 1,

         "@filegroup(": 4,

         "@fileowner(": 4,

         "@rename(": 2,

         "fopen(": 7,

         "fwrite(": 6

      }

   }

}

These tags not only help define what a web shell can do, but it helps teams such as IR consultants performing live response engagements a pivot point into where to potentially look next.