detectem v0.7.3 releases: detect software and its version on websites
detectem is a specialized software detector.
detectem uses Splash to render the URL and gets the list of requests/responses (as a list of HAR entries) that the browser sent and received to render the page completely.
Using a series of indicators, it’s able to detect software running on a site and extract accurately its version information. It uses Splash API to render the website and start the detection routine. It does the full analysis of requests, responses and even on the DOM!
There are two important articles to read:
- Detect software in modern web technologies.
- Browser support provided by Splash.
- Analysis of requests made and responses received by the browser.
- Get software information from the DOM.
- Great performance (less than 10 seconds to get a fingerprint).
- Plugin system to add new software easily.
- Test suite to ensure plugin result integrity.
- Continuous development to support new features.
- Support for input file (1st approach)
- Remove tests from deliverables
- Handling of JS errors
Install Docker and add your user to the docker group, then you avoid to use sudo.
- Pull the image:
$ docker pull scrapinghub/splash
- Create a virtual environment with Python >= 3.5.
$ pip install detectem
- Run it against some URL:
$ det http://domain.tld
Copyright (c) 2016 Claudio Salazar