ODBParser
ODBParser is a tool to search for PII being exposed in open databases.
ONLY to be used to identify exposed PII and warn server owners of irresponsible database maintenance
OR to query databases, you have permission to access!
PLEASE USE RESPONSIBLY
What is this?
Wrote this as wanted to create a one-stop OSINT tool for searching, parsing and analyzing open databases in order to identify leakages of PII on third-party servers. Other tools seem to either only search for open databases or dump them once you’ve identified them and then will grab data indiscriminately. Grew from function or two into what’s in this repo, so code isn’t as clean and pretty as it could be.
Features
To identify open databases you can:
- query Shodan and BinaryEdge using all possible parameters (filter by country, port number, whatever)
- specify the single IP address
- load up file that has a list of IP addresses
- paste list of IP addresses from the clipboard
Dumping options:
- parses all databases/collections to identify data you specify
- grab everything hosted on the server
- grab just one index/collection
Post-Processing:
- convert JSON dumps to CSV
- remove useless columns from CSV
Other features:
- keeps track of all the IP addresses and databases you have queried along with info about each server.
- maintains stats file with number of IP’s you’ve queried, number of databases you’ve parsed and number of records you’ve dumped
- convert JSON dumps you already have to CSV
- for every database that has total number of records above your limit, script will create an entry in a special file along with 5 sample records so you can review and decide whether the database is worth grabbing
- Output is JSON. You can convert the files to CSV on the fly or you can convert only certain files after run is complete (I recommend latter). Converted JSON files will be moved to folder called “JSON backups” in same directory. NOTE: When converting to CSV, script drops exact duplicate rows and drops columns and rows where all values are NaN, because that’s what I wanted to do. Feel free to edit function if you’d rather have exact copy of JSON file.
- Windows ONLY If script pulls back huge number of indices that have field you care about, script will list names of the dbs, pause and give you ten seconds to decide whether you want to go ahead and pull all the data from every index as I’ve found if you get too many databases returned even after you’ve specified fields you want, there is a good chance data is fake or useless logs and you can usually tell from name whether either possibility is the case. If you don’t act within 10 seconds, script will go ahead and dump every index.
- as you may have noticed, lot of people have been scanning for MongoDB databases and holding them hostage, often changing name to something like “TO_RESTORE_EMAIL_XXXRESTORE.COM.” The MongoDb scraper will ignore all databases and collections that have been pwned by checking name of DB/collection against list of strings that indicate pwnage
- script is pretty verbose (maybe too verbose) but I like seeing what’s going on. Feel free to silence print statements if you prefer.