pagodo v2.6 releases: Automate Google Hacking Database scraping

pagodo

pagodo (Passive Google Dork) – Automate Google Hacking Database scraping

The goal of this project was to develop a passive Google dork script to collect potentially vulnerable web pages and applications on the Internet. There are 2 parts. The first is ghdb_scraper.py that retrieves Google Dorks and the second portion is pagodo.py that leverages the information gathered by ghdb_scraper.py.

What are Google Dorks?

The awesome folks at Offensive Security maintain the Google Hacking Database (GHDB) found here: https://www.exploit-db.com/google-hacking-database. It is a collection of Google searches, called dorks, that can be used to find potentially vulnerable boxes or other juicy info that is picked up by Google’s search bots.

Changelog v2.6

  • Bumped yagooglesearch to version 1.9.0

Installation

git clone https://github.com/opsdisk/pagodo.git
pip install -r requirements.txt

Usage

ghdb_scraper.py

To start off, pagodo.py needs a list of all the current Google dorks. Unfortunately, the entire database cannot be easily downloaded. A couple of older projects did this, but the code was slightly stale and it wasn’t multi-threaded…so collecting ~3800 Google Dorks would take a long time. ghdb_scraper.py is the resulting Python script.

ghdb_scraper.py Execution Flow

The flow of execution is pretty simple:

  • Fill a queue with Google dork numbers to retrieve based off a range
  • Worker threads retrieve the dork number from the queue, retrieve the page using urllib2, then process the page to extract the Google dork using the BeautifulSoup HTML parsing library
  • Print the results to the screen and optionally save them to a file (to be used by pagodo.py for example)

ghdb_scraper.py Switches

The script’s switches are self-explanatory:

-n MINDORKNUM     Minimum Google dork number to start at (Default: 5).
-x MAXDORKNUM Maximum Google dork number, not the total, to retrieve
(Default: 5000). It is currently around 3800. There is no
logic in this script to determine when it has reached the
end.
-d SAVEDIRECTORY Directory to save downloaded files (Default: cwd, ".")
-s Save the Google dorks to google_dorks_<TIMESTAMP>.txt file
-t NUMTHREADS Number of search threads (Default: 3)

 

To run it

python ghdb_scraper.py -n 5 -x 3785 -s -t 3

pagodo.py

Now that a file with the most recent Google dorks exists, it can be fed into pagodo.py using the -g switch to start collecting potentially vulnerable public applications. pagodo.py leverages the google python library to search Google for sites with the Google dork, such as:

intitle:”ListMail Login” admin -demo

The -d switch can be used to specify a domain and functions as the Google search operator:

site:example.com

pagodo.py Switches

The script’s switches are self-explanatory:

-d DOMAIN       Domain to search for Google dork hits.
-g GOOGLEDORKS File containing Google dorks, 1 per line.
-j JITTER jitter factor (multipled times delay value) added to
randomize lookups times. Default: 1.50
-l SEARCHMAX Maximum results to search (default 100).
-s Save the html links to pagodo_results__<TIMESTAMP>.txt file.
-e DELAY Minimum delay (in seconds) between searches...jitter (up to
[jitter X delay] value) is added to this value to randomize
lookups. If it's too small Google may block your IP, too big
and your search may take a while. Default: 30.0

 

 

To run it

python pagodo.py -d example.com -g dorks.txt -l 50 -s -e 35.0 -j 1.1

Copyright (C) opsdisk 

Source: https://github.com/opsdisk/