OnionSearch v1.3 releases: scrapes urls on different .onion search engines
OnionSearch
OnionSearch is a Python3 script that scrapes urls on different “.onion” search engines.
Currently supported Search engines
- ahmia
- darksearchio
- Onionland
- notevil
- darksearchenginer
- Phobos
- onionsearchserver
- torgle
- onionsearchengine
- tordex
- tor66
- tormax
- haystack
- multivac
- evosearch
- deeplink
Install
pip3 install onionsearch
Usage
Multi-processing behavior
By default, the script will run with the parameter mp_units = cpu_count() – 1. It means if you have a machine with 4 cores, it will run 3 scraping functions in parallel. You can force mp_units to any value but it is recommended to leave to default. You may want to set it to 1 to run all requests sequentially (disabling multi-processing feature).
Please note that continuous writing to csv file has not been heavily tested with a multiprocessing feature and therefore may not work as expected.
Please also note that the progress bars may not be properly displayed when mp_units is greater than 1. It does not affect the results, so don’t worry.
Examples
To request all the engines for the word “computer”:
onionsearch "computer"
To request all the engines excepted “Ahmia” and “Candle” for the word “computer”:
onionsearch "computer" --exclude ahmia candle
To request only “Tor66”, “DeepLink” and “Phobos” for the word “computer”:
onionsearch "computer" --engines tor66 deeplink phobos
The same as previously but limited to 3 the number of pages to load per engine:
onionsearch "computer" --engines tor66 deeplink phobos --limit 3
Please kindly note that the list of supported engines (and their keys) is given in the script help (-h).
Output
Default output
By default, the file is written at the end of the process. The file will be csv formatted, containing the following columns:
"engine","name of the link","url"
Customizing the output fields
You can customize what will be flush in the output file by using the parameters –fields and –field_delimiter.
–fields allows you to add, remove, re-order the output fields. The default mode is show just below. Instead, you can for instance choose to output:
"engine","name of the link","url","domain"
by setting –fields engine name link domain.
Or even, you can choose to output:
"engine","domain"
by setting –fields engine domain.
These are examples but there are many possibilities.
Finally, you can also choose to modify the CSV delimiter (comma by default), for instance: –field_delimiter “;”.
Changing filename
The filename will be set by default to output_$DATE_$SEARCH.txt, where $DATE represents the current datetime and $SEARCH the first characters of the search string.
You can modify this filename by using –output when running the script, for instance:
(Note that it might be necessary to escape the dollar character.)
In the csv file produced, the name and url strings are sanitized as much as possible, but there might still be some problems…
Write progressively
You can choose to progressively write to the output (instead of everything at the end, which would prevent losing the results if something goes wrong). To do so you have to use –continuous_write True, just as-is:
onionsearch "computer" --continuous_write True
You can then use the tail -f (tail follow) Unix command to actively watch or monitor the results of the scraping.
Copyright (C) 2020 megadose
Source: https://github.com/megadose/