webpalm
WebPalm is a command-line tool that enables users to traverse a website and generate a tree of all its web pages and their links. It uses a recursive approach to enter each link found on a webpage and continues to do so until all levels have been explored. In addition to generating a site map, WebPalm can extract data from the body of each page using regular expressions and save the results in a file. This feature can be useful for web scraping or extracting specific information.
Features
- Generate a palm tree struct of web urls
- Dump data from body pages using regular expressions
- live output mode
- Export the web-tree to json, xml, txt
- Fast and easy to use
- Colorized output and error handling
Changelog v2.0.16
Installation
From source
From binary
You can download the binary from Releases
Via go
go install github.com/Malwarize/webpalm/v2@latest
Usage
Example
get the palm tree of a website:
webpalm -u https://google.com -l1 –live
get palm tree of a website and exclude some status codes:
webpalm -u https://google.com -l1 -x 404,500
get the palm tree of a website and dump data from the body of the pages:
webpalm -u https://google.com -l1 –regexes comments=“\<\!–.*?–>“ -o result.json“
this will dump the comments of each page in the body of the page
webpalm -u https://google.com -l1 –regexes comments=“\<\!–.*?–>“,emails=“([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+)“
this will dump the comments and emails of each page in the body of the page
get the palm tree of a website and export it to xml,txt:
webpalm -u https://google.com -l3 -o result.xml
webpalm -u https://google.com -l2 -o result.txt
get the palm tree of a website and include only some domains:
webpalm -u https://google.com -l2 -i google.com,facebook.com
this will crawl only the urls that contain google.com or facebook.com
treading and concurrency
get the palm tree of a website and use only 5 concurrent tasks:
webpalm -u https://google.com -l2 -m 5
📝 Note that the live mode is working with only 1 thread so you can’t use it with the live mode
Copyright (C) 2023 MahdiAw
Source: https://github.com/Malwarize/