ugforum analysis: Tools for Automated Analysis of Cybercriminal Markets
ugforum analysis: Tools for Automated Analysis of Cybercriminal Markets
Underground forums are widely used by criminals to buy and sell a host of stolen items, datasets, resources, and criminal services. These forums contain important resources for understanding cybercrime. However, the number of forums, their size, and the domain expertise required to understand the markets makes manual exploration of these forums unscalable. In this work, we propose an automated, top-down approach for analyzing underground forums. Our approach uses natural language processing and machine learning to automatically generate high-level information about underground forums, first identifying posts related to transactions, and then extracting products and prices. We also demonstrate, via a pair of case studies, how an analyst can use these automated approaches to investigate other categories of products and transactions. We use eight distinct forums to assess our tools: Antichat, Blackhat World, Carders, Darkode, Hack Forums, Hell, L33tCrew and Nulled. Our automated approach is fast and accurate, achieving over 80% accuracy in detecting post category, product, and prices.
Download
git clone https://github.com/ccied/ugforum-analysis.git
- annotation_tools
This directory contains a set of scripts that assist in data annotation. - extract-currency-exchange
This contains several systems for extracting information from currency exchange posts. The target information is:- The cash format/currency being provided
– The cash format/currency being requested
– The amount provided
– The amount requested
– The rate - extract-price
- extract-product
This system identifies products being bought and sold in forum posts. It has several modes of operation, but most commonly identifies one most prominent product in a given post, represented by choosing a particular word or noun phrase from that post. - predict-post-type
Training/testing data and ground truth can be found in sample-data.
Source: