noseyparker v0.17 releases: finds secrets and sensitive information in textual data and Git history
5Nosey Parker: Find secrets in textual data
Nosey Parker is a command-line tool that finds secrets and sensitive information in textual data. It is useful both for offensive and defensive security testing.
Key features:
- It supports scanning files, directories, and the entire history of Git repositories
- It uses regular expression matching with a set of 60 patterns chosen for high signal-to-noise based on experience and feedback from offensive security engagements
- It groups matches together that share the same secret, further emphasizing signal over noise
- It is fast: it can scan at hundreds of megabytes per second on a single core and is able to scan 100GB of Linux kernel source history in less than 5 minutes on an older MacBook Pro
This open-source version of Nosey Parker is a reimplementation of part of the internal version in use at Praetorian, which has additional machine-learning capabilities. Read more in blog posts here and here.
Usage quick start
The datastore
Most Nosey Parker commands use a datastore. This is a special directory that Nosey Parker uses to record its findings and maintain its internal state. A datastore will be implicitly created by the scan command if needed. You can also create a datastore explicitly using the datastore init -d PATH command.
Scanning filesystem content for secrets
Nosey Parker has built-in support for scanning files, recursively scanning directories, and scanning the entire history of Git repositories.
For example, if you have a Git clone of CPython locally at cpython.git, you can scan its entire history with the scan command. Nosey Parker will create a new datastore at np.cpython and saves its findings there.
You can specify multiple inputs to scan at once in any combination of the supported input types (files, directories, and Git repos).
Summarizing findings
Nosey Parker prints out a summary of its findings when it finishes scanning. You can also run this step separately:
Reporting detailed findings
To see details of Nosey Parker’s findings, use the report command. This prints out a text-based report designed for human consumption:
Changelog v0.17
Changes
- The minimum supported Rust version has been changed from 1.70 to 1.76.
- The data model and datastore have been significantly overhauled:
- The rules used during scanning are now explicitly recorded in the datastore. Each rule is additionally accompanied by a content-based identifier that uniquely identifies the rule based on its pattern.
- Each match is now associated with the rule that produced it, rather than just the rule’s name (which can change as rules are modified).
- Each match is now assigned a unique content-based identifier.
- Findings (i.e., groups of matches with the same capture groups, produced by the same rule) are now represented explicitly in the datastore. Each finding is assigned a unique content-based identifier.
- Now, each time a rule matches, a single match object is produced. Each match in the datastore is now associated with an array of capture groups. Previously, a rule whose pattern had multiple capture groups would produce one match object for each group, with each one being associated with a single capture group.
- Provenance metadata for blobs is recorded in a much simpler way than before. The new representation explicitly records file and git-based provenance, but also adds explicit support for extensible provenance. This change will make it possible in the future to have Nosey Parker scan and usefully report blobs produced by custom input data enumerators (e.g., a Python script that lists files from the Common Crawl WARC files).
- Scores are now associated with matches instead of findings.
- Comments can now be associated with both matches and findings, instead of just findings.
- The JSON and JSONL report formats have changed. These will stabilize in a future release (#101).
- The
matching_input
field for matches has been removed and replaced with a newgroups
field, which contains an array of base64-encoded bytestrings. - Each match now includes additional
rule_text_id
,rule_structural_id
, andstructural_id
fields. - The
provenance
field of each match is now slightly different.
- The
- Schema migration of older Nosey Parker datastores is no longer performed. Previously, this would automatically and silently be done when opening a datastore from an older version. Explicit support for datastore migration may be added back in a future release.
- The
shell-completions
command has been moved from the top level to a subcommand ofgenerate
. - More…
Install
Copyright (C) 2022 praetorian-inc