crawlergo v0.4.4 releases: powerful browser crawler for web vulnerability scanners

crawlergo

crawlergo is a browser crawler that uses chrome headless mode for URL collection. It hooks key positions of the whole web page with the DOM rendering stage, automatically fills and submits forms, with intelligent JS event triggering, and collects as many entries exposed by the website as possible. The built-in URL de-duplication module filters out a large number of pseudo-static URLs still maintains a fast parsing and crawling speed for large websites, and finally gets a high-quality collection of request results.

crawlergo currently supports the following features:

chrome browser environment rendering
Intelligent form filling, automated submission
Full DOM event collection with automated triggering
Smart URL de-duplication to remove most duplicate requests
Intelligent analysis of web pages and collection of URLs, including javascript file content, page comments, robots.txt files, and automatic Fuzz of common paths
Support Host binding, automatically fix and add Referer
Support browser request proxy
Support pushing the results to passive web vulnerability scanners

Parameters

Required parameters

--chromium-path Path, -c Path The path to the chrome executable. (Required)

Basic parameters

--custom-headers Headers Customize the HTTP header. Please pass in the data after JSON serialization, this is globally defined and will be used for all requests. (Default: null)
--post-data PostData, -d PostData POST data. (Default: null)
--max-crawled-count Number, -m Number The maximum number of tasks for crawlers to avoid long crawling time due to pseudo-static. (Default: 200)
--filter-mode Mode, -f Mode Filtering mode, simple: only static resources and duplicate requests are filtered. smart: with the ability to filter pseudo-static. strict: stricter pseudo-static filtering rules. (Default: smart)
--output-mode value, -o value Result output mode, console: print the glorified results directly to the screen. json: print the json serialized string of all results. none: don’t print the output. (Default: console)
--output-json filepath Write the result to the specified file after JSON serializing it. (Default: null)
--request-proxy proxyAddress socks5 proxy address, all network requests from crawlergo and chrome browser are sent through the proxy. (Default: null)

Expand input URL

--fuzz-path Use the built-in dictionary for path fuzzing. (Default: false)
--fuzz-path-dict Customize the Fuzz path by passing in a dictionary file path, e.g. /home/user/fuzz_dir.txt, each line of the file represents a path to be fuzzed. (Default: null)
--robots-path Resolve the path from the /robots.txt file. (Default: false)

Form auto-fill

--ignore-url-keywords, -iuk URL keyword that you don’t want to visit, generally used to exclude logout links when customizing cookies. Usage: -iuk logout -iuk exit. (default: “logout”, “quit”, “exit”)
--form-values, -fv Customize the value of the form fill, set by text type. Support definition types: default, mail, code, phone, username, password, qq, id_card, url, date and number. Text types are identified by the four attribute value keywords id, name, class, type of the input box label. For example, define the mailbox input box to be automatically filled with A and the password input box to be automatically filled with B, -fv mail=A -fv password=B.Where default represents the fill value when the text type is not recognized, as “Cralwergo”. (Default: Cralwergo)
--form-keyword-values, -fkv Customize the value of the form fill, set by keyword fuzzy match. The keyword matches the four attribute values of id, name, class, type of the input box label. For example, fuzzy match the pass keyword to fill 123456 and the user keyword to fill admin, -fkv user=admin -fkv pass=123456. (Default: Cralwergo)

Advanced settings for the crawling process

--incognito-context, -i Browser start incognito mode. (Default: true)
--max-tab-count Number, -t Number The maximum number of tabs the crawler can open at the same time. (Default: 8)
--tab-run-timeout Timeout Maximum runtime for a single tab page. (Default: 20s)
--wait-dom-content-loaded-timeout Timeout The maximum timeout to wait for the page to finish loading. (Default: 5s)
--event-trigger-interval Interval The interval when the event is triggered automatically, generally used in the case of slow target network and DOM update conflicts that lead to URL miss capture. (Default: 100ms)
--event-trigger-mode Value DOM event auto-triggered mode, with async and sync, for URL miss-catching caused by DOM update conflicts. (Default: async)
--before-exit-delay Delay exit to close chrome at the end of a single tab task. Used to wait for partial DOM updates and XHR requests to be captured. (Default: 1s)

Other

--push-to-proxy The listener address of the crawler result to be received, usually the listener address of the passive scanner. (Default: null)
--push-pool-max The maximum number of concurrency when sending crawler results to the listening address. (Default: 10)
--log-level Logging levels, debug, info, warn, error and fatal. (Default: info)
--no-headless Turn off chrome headless mode to visualize the crawling process. (Default: false)

Changelog v0.4.4

Bug fix: #129 #128

Install

Tags: crawlergo web vulnerability scanners