noseyparker v0.17 releases: finds secrets and sensitive information in textual data and Git history

by do son · Published February 21, 2023 · Updated March 9, 2024

5Nosey Parker: Find secrets in textual data

Nosey Parker is a command-line tool that finds secrets and sensitive information in textual data. It is useful both for offensive and defensive security testing.

Key features:

It supports scanning files, directories, and the entire history of Git repositories
It uses regular expression matching with a set of 60 patterns chosen for high signal-to-noise based on experience and feedback from offensive security engagements
It groups matches together that share the same secret, further emphasizing signal over noise
It is fast: it can scan at hundreds of megabytes per second on a single core and is able to scan 100GB of Linux kernel source history in less than 5 minutes on an older MacBook Pro

This open-source version of Nosey Parker is a reimplementation of part of the internal version in use at Praetorian, which has additional machine-learning capabilities. Read more in blog posts here and here.

Usage quick start

The datastore

Most Nosey Parker commands use a datastore. This is a special directory that Nosey Parker uses to record its findings and maintain its internal state. A datastore will be implicitly created by the scan command if needed. You can also create a datastore explicitly using the datastore init -d PATH command.

Scanning filesystem content for secrets

Nosey Parker has built-in support for scanning files, recursively scanning directories, and scanning the entire history of Git repositories.

For example, if you have a Git clone of CPython locally at cpython.git, you can scan its entire history with the scan command. Nosey Parker will create a new datastore at np.cpython and saves its findings there.

$ noseyparker scan --datastore np.cpython cpython.git
Found 28.30 GiB from 18 plain files and 427,712 blobs from 1 Git repos [00:00:04]
Scanning content  ████████████████████ 100%  28.30 GiB/28.30 GiB  [00:00:53]
Scanned 28.30 GiB from 427,730 blobs in 54 seconds (538.46 MiB/s); 4,904/4,904 new matches

 Rule                      Distinct Groups   Total Matches
───────────────────────────────────────────────────────────
 PEM-Encoded Private Key             1,076           1,192
 Generic Secret                        331             478
 netrc Credentials                      42           3,201
 Generic API Key                         2              31
 md5crypt Hash                           1               2

Run the `report` command next to show finding details.

You can specify multiple inputs to scan at once in any combination of the supported input types (files, directories, and Git repos).

Summarizing findings

Nosey Parker prints out a summary of its findings when it finishes scanning. You can also run this step separately:

$ noseyparker summarize --datastore np.cpython

 Rule                      Distinct Groups   Total Matches
───────────────────────────────────────────────────────────
 PEM-Encoded Private Key             1,076           1,192
 Generic Secret                        331             478
 netrc Credentials                      42           3,201
 Generic API Key                         2              31
 md5crypt Hash                           1               2

Reporting detailed findings

To see details of Nosey Parker’s findings, use the report command. This prints out a text-based report designed for human consumption:

$ noseyparker report --datastore np.cpython
Finding 1/1452: Generic API Key
Match: QTP4LAknlFml0NuPAbCdtvH4KQaokiQE
Showing 3/29 occurrences:

    Occurrence 1:
    Git repo: clones/cpython.git
    Blob: 04144ceb957f550327637878dd99bb4734282d07
    Lines: 70:61-70:100

        e buildbottest

        notifications:
          email: false
          webhooks:
            urls:
              - https://python.zulipchat.com/api/v1/external/travis?api_key=QTP4LAknlFml0NuPAbCdtvH4KQaokiQE&stream=core%2Ftest+runs
            on_success: change
            on_failure: always
          irc:
            channels:
              # This is set to a secure vari

    Occurrence 2:
    Git repo: clones/cpython.git
    Blob: 0e24bae141ae2b48b23ef479a5398089847200b3
    Lines: 174:61-174:100

        j4 -uall,-cpu"

        notifications:
          email: false
          webhooks:
            urls:
              - https://python.zulipchat.com/api/v1/external/travis?api_key=QTP4LAknlFml0NuPAbCdtvH4KQaokiQE&stream=core%2Ftest+runs
            on_success: change
            on_failure: always
          irc:
            channels:
              # This is set to a secure vari
...

Changelog v0.17

Changes

The minimum supported Rust version has been changed from 1.70 to 1.76.
The data model and datastore have been significantly overhauled:
- The rules used during scanning are now explicitly recorded in the datastore. Each rule is additionally accompanied by a content-based identifier that uniquely identifies the rule based on its pattern.
- Each match is now associated with the rule that produced it, rather than just the rule’s name (which can change as rules are modified).
- Each match is now assigned a unique content-based identifier.
- Findings (i.e., groups of matches with the same capture groups, produced by the same rule) are now represented explicitly in the datastore. Each finding is assigned a unique content-based identifier.
- Now, each time a rule matches, a single match object is produced. Each match in the datastore is now associated with an array of capture groups. Previously, a rule whose pattern had multiple capture groups would produce one match object for each group, with each one being associated with a single capture group.
- Provenance metadata for blobs is recorded in a much simpler way than before. The new representation explicitly records file and git-based provenance, but also adds explicit support for extensible provenance. This change will make it possible in the future to have Nosey Parker scan and usefully report blobs produced by custom input data enumerators (e.g., a Python script that lists files from the Common Crawl WARC files).
- Scores are now associated with matches instead of findings.
- Comments can now be associated with both matches and findings, instead of just findings.
The JSON and JSONL report formats have changed. These will stabilize in a future release (#101).
- The matching_input field for matches has been removed and replaced with a new groups field, which contains an array of base64-encoded bytestrings.
- Each match now includes additional rule_text_id, rule_structural_id, and structural_id fields.
- The provenance field of each match is now slightly different.
Schema migration of older Nosey Parker datastores is no longer performed. Previously, this would automatically and silently be done when opening a datastore from an older version. Explicit support for datastore migration may be added back in a future release.
The shell-completions command has been moved from the top level to a subcommand of generate.
More…

noseyparker v0.17 releases: finds secrets and sensitive information in textual data and Git history

Search

Brilliantly

Content & Links

noseyparker v0.17 releases: finds secrets and sensitive information in textual data and Git history

5Nosey Parker: Find secrets in textual data

Usage quick start

The datastore

Scanning filesystem content for secrets

Summarizing findings

Reporting detailed findings

Changelog v0.17

Changes

Install

Search

Brilliantly

Content & Links