EarlyBird: sensitive data detection tool
EarlyBird is a sensitive data detection tool capable of scanning source code repositories for clear text password violations, PII, outdated cryptography methods, key files, and more. It can be used to scan remote git repositories, local files, or directories or as a pre-commit step.
Modules are configurable rule sets that define patterns and target areas for searching within source code files. The currently implemented modules include:
- File Names (filename): Scan the file list recursively, looking for filename patterns that would indicate credentials, keys, and sensitive PII. We’re looking for things like id_rsa, things that end in pem, etc.
- File Content Patterns (content): Looks for patterns within the contents of files, things like password: , and BEGIN RSA PRIVATE KEY will pop up here. Other types of sensitive PII data elements and secrets will be detected as well, such as IBAN, SSN, IP Addresses, Email Addresses, Phone Numbers, etc. This also looks for insecure cryptographic algorithms and pseudo-random number generation, as well as suspicious comments like “HACK” and “FIXME”.
- File Content Entropy (entropy): Scan files for strings with high (Shannon) entropy, which could indicate passwords or secrets stored in the files, for example: kwaKM@£rFKAM3(a2klma2d
- Credit Card Numbers (ccnumber): Scan files for strings that match major credit card number patterns. Any potential hits are passed through a Luhn/mod10 check to verify that they are valid card numbers, and all numbers that are identified as designated test values are ignored.
- Commonly Used / Default Passwords (common): Scan files for default and commonly used/abused passwords.
Copyright 2020 American Express Travel Related Services Company, Inc.