diskover v1.5.0-rc29 release: File system crawler, storage search engine and analytics

diskover is an open source file system crawler and disk usage software that uses Elasticsearch to index and manage data across heterogeneous storage systems. Using diskover, you are able to more effectively search and organize files and system administrators are able to manage storage infrastructure, efficiently provision storage, monitor and report on storage use, and effectively make decisions about new infrastructure purchases.

As the amount of file data generated by the business’ continues to expand, the stress on expensive storage infrastructure, users and system administrators, and IT budgets continue to grow.

Using diskover, users can identify old and unused files and give better insights into data change, file duplication and wasted space.

It is written and maintained by Chris Park (shirosai) and runs on Linux and OS X/macOS using Python 2/3.

diskover diagram




  • faster finddupes
  • worker bot warnings output for finddupes for any io/os exceptions
  • restoretimes config setting to dupescheck section in diskover.cfg.sample, copy to your config – setting to True will try to restore atime and mtime for any files which get opened from byte check and md5 (useful for cifs which does not work with noatime mount option)


  • finddupes now uses threads setting in diskover.cfg dupescheck section, copy from diskover.cfg.sample and adjust for your env, prev. was 4 for threads, default is now 8
  • requirements.txt to support newer versions of rq and redis python modules


  • bots disappearing from redis rq (rqinfo and rq-dashboard), upgrade to redis 3.0.1 and rq 0.13.0 python modules using pip
  • diskover socket server Traceback Exception “TypeError: can’t concat JSONDecodeError to bytes” from sending non json data to socket server
  • export.json Kibana export missing some visualizations
  • UnicodeEncodeError: ‘ascii’ codec can’t encode character when running diskover-gource.sh using python 2
  • Traceback errors when running hotdirs or copytags


Optional Installs

  • diskover-web (diskover’s web file manager and analytics app)
  • Redis RQ Dashboard (for monitoring redis queue)
  • sharesniffer (for scanning your network for file shares and auto-mounting for crawls)
  • Kibana (for visualizing Elasticsearch data, tested on Kibana 5.4.2, 5.6.4)
  • X-Pack (Kibana plugin for graphs, reports, monitoring and http auth)
  • Gource (for Gource visualizations of diskover Elasticsearch data, see videos above)


git clone https://github.com/shirosaidev/diskover.git
pip install -r requirements.txt


Copyright 2017-2018 Chris Park

Source: https://github.com/shirosaidev/