ScrapPY: PDF Scraping Made Easy
ScrapPY is a Python utility for scraping manuals, documents, and other sensitive PDFs to generate targeted wordlists that can be utilized by offensive security tools to perform brute force, forced browsing, and dictionary attacks. ScrapPY performs word frequency, entropy, and metadata analysis, and can run in full output modes to craft custom wordlists for targeted attacks. The tool dives deep to discover keywords and phrases leading to potential passwords or hidden directories, outputting to a text file that is readable by tools such as Hydra, Dirb, and Nmap. Expedite initial access, vulnerability discovery, and lateral movement with ScrapPY!
Install
$ cd ScrapPY/
$ sudo git clone https://github.com/RoseSecurity/ScrapPY.git
$ pip3 install -r requirements.txt
Use
Output metadata of document:
$ python3 ScrapPY.py -f example.pdf -m metadata
Output the top 100 frequently used keywords to a file name Top_100_Keywords.txt:
$ python3 ScrapPY.py -f example.pdf -m word-frequency -o Top_100_Keywords.txt
Output all keywords to the default ScrapPY.txt file:
$ python3 ScrapPY.py -f example.pdf
Output the top 100 keywords with the highest entropy rating:
$ python3 ScrapPY.py -f example.pdf -m entropy
ScrapPY Output:
Integration with Offensive Security Tools:
Easily integrate with tools such as Dirb to expedite the process of discovering hidden subdirectories:
root@RoseSecurity:~# dirb http://192.168.1.123/ /root/ScrapPY/ScrapPY.txt
Utilize ScrapPY with Hydra for advanced brute force attacks:
Enhance Nmap scripts with ScrapPY wordlists: