Scavenger
Just the code of my OSINT bot searching for sensitive data leaks on different paste sites.
Search terms:
- credentials
- private RSA keys
- WordPress configuration files
- MySQL connect strings
- onion links
- links to files hosted inside the onion network (PDF, DOC, DOCX, XLS, XLSX)
The bot can be run in two major modes:
- API mode
- Scraping mode (using TOR)
I highly recommend using API mode. It is the intended method of scraping pastes from Pastebin.com and it is just fair to do so. The only thing you need is a Pastebin.com PRO account and whitelist your public IP on their site.
To start the bot in API mode just run the program in the following way:
python run.py -0
However, it is not always possible to use this intended method, as you might be in NAT mode and therefore you do not have an IP exclusively (whitelisting your IP is not reasonable here). That is the reason I also implemented a scraping mode where fast TOR cycles in combination with reasonable user agents are used to avoid IP blocking and Cloudflare captchas.
To start the bot in scraping mode run it in the following way:
python run.py -1
Important note: you need the TOR service installed on your system listening on port 9050. Additionally, you need to add the following line to your /etc/tor/torrc file.
MaxCircuitDirtiness 30
This sets the maximum cycle time of TOR to 30 seconds.
Download
git clone https://github.com/rndinfosecguy/Scavenger.git
pip install -r requirements.txt
Use
Example
Just start the Pastebin.com module separately (first module I implemented)…
python P_bot.py
Pastes are stored in data/raw_pastes until they are more than 48000. When they are more than 48000 they get filtered, zipped and moved to the archive folder. All pastes which contain credentials are stored in data/files_with_passwords
Keep in mind that at the moment only combinations like USERNAME:PASSWORD and other simple combinations are detected. However, there is a tool to search for proxy logs containing credentials.
You can search for proxy logs (URLs with a username and password combinations) by using getProxyLogs.py file
python getProxyLogs.py data/raw_pastes
If you want to search the raw data for specific strings you can do it using searchRaw.py (really slow).
python searchRaw.py SEARCHSTRING
To see statistics of the bot just call
python status.py
The file findSensitiveData.py searches a folder (with pastes) for sensitive data like credit cards, RSA keys or mysqli_connect strings. Keep in mind that this script uses grep and therefore is really slow on a big amount of paste files. If you want to analyze a big amount of pastes I recommend an ELK-Stack.
python findSensitiveData.py data/raw_pastes
There are two scripts stalk_user.py/stalk_user_wrapper.py which can be used to monitor a specific Twitter user. This means every tweet he posts gets saved and every containing URL gets downloaded. To start the stalker just execute the wrapper.
python stalk_user_wrapper.py
Copyright (C) 2018 rndinfosecguy
Source: https://github.com/rndinfosecguy/