GitPrey: Searching sensitive files and contents in GitHub
Sensitive info scan tool of Github
Function introduction and design
GitPrey is a tool for searching sensitive information or data according to company name or keyword something.The design mind is from searching sensitive data leaking in Github:
- Search code in file and path according to the keyword to get all related projects;
- Search code in every related project to find the matching file or content in PATTERN_DB;
- Output all matching file information, project information and user information;
By the way, there is some missing file or mistake file with using Gitprey, the reason is:
- Only the default branch is considered by Github. In most cases, this will be the master branch.
- Only files smaller than 384 KB are searchable by Github.
- Github only makes up to 1,000 results for each search.
Gitprey also provides the search level to adjust scanning deep, it’s between Level 1 to Level 5:
- Level 1: Only search 10 pages in recently indexed code results.
- Level 2: Only search 20 pages in recently indexed code results.
- Level 3: Only search 50 pages in recently indexed code results.
- Level 4: Only search 70 pages in recently indexed code results.
- Level 5: Only search 100 pages in recently indexed code results.
You can modify the Level in Config.py.To search as quick as you can, you must configure your own Github account username and password to avoid 429 ERROR which is too many requests.
Tech detail introduction
There are some hints to declare about technological details:
- Github API is not used in the searching code, because its rate limits up to 30 times per minute, even if you authenticate by the access token.
- Only user information crawler used Github API, it’s enough for scanning speed. You have to config FILE_DB/INFO_DB/PASS_DB/PATH_DB in config.py:
- PATH_DB is used to search the specific file in related projects when searching file leaking.
- FILE_DB and PASS_DB are used to searching sensitive content in related projects when searching content leaking, while INFO_DB and PASS_DB is used to output code line._
git clone https://github.com/repoog/GitPrey.git
GitPrey removed ACCESS_TOKEN, SEARCH_LEVEL and KEYWORDS configuration from v2.2: USAGE: -l Set search level for searching projects within 1-5, the default level is 1. -k Set keywords for searching projects. -h Show help information.
pattern file introduction
pattern is a directory putting db files:
- path.db: sensitive file path, such as htpasswd
- file.db: the files name which sensitive content may be in, such as .env
- info.db: sensitive content keywords for searching, such as password
Copyright (C) 2017 repoog