passphrase-wordlist: Passphrase wordlist and hashcat rules

passphrase-wordlist

passphrase-wordlist

Passphrase wordlist and hashcat rules for offline cracking of long, complex passwords
People think they are getting smarter by using passphrases. Let’s prove them wrong!

This project includes a massive wordlist of phrases (~18 million) and two hashcat rule files for GPU-based cracking.

Passphrase wordlist and raw data sources are available to download via the torrent files here. You only need the ‘passphrases’ file and the hashcat rules, but some researchers may want to take a look at the raw sources.

If you cannot download via the torrents, try here

Use both rules for best results.

Here is an example for NTLMv2 hashes: If you use the -O option, watch out for what the maximum password length is set to – it may be too short.

hashcat64.bin -a 0 -m 5600 hashes.txt passphrases.txt -r passphrase-rule1.rule -r passphrase-rule2.rule -w 3

 

Sources Used

So far, I’ve scraped the following:

  • IMDB dataset using the “primaryTitle” column from title.basics.tsv.gz file available here grabbed May 25.
  • From the Wikipedia pages-articles-multistream-index dump generated May-20-2019 here, article titles and category names.
  • From Wiktionary’s similar index dump here, the entries generated May-20-2019.
  • Urban Dictionary dataset pulled May 27, 2019, using this great script.
  • 15,000 Useful Phrases
  • Song lyrics for Rolling Stone’s “top 100” artists using my lyric scraping tool.
  • Meme titles from KnownYourMeme scraped using my tool here on July 15, 2019.
  • Movie titles and lines from this Cornell project.
  • Global POI dataset using the ‘allCountries’ file.
  • Quotables dataset on Kaggle.
  • 1,800 English Phrases
  • 2016 US Presidential Debates dataset on Kaggle.
  • Goodreads Book Reviews from Kaggle. I scraped the titles of over 300,000 books.
  • US & UK top album names, artists, and track names from the 1950s – 2018 using mwkling’s tool here.
    • Note: I modified that python script to download multiple charts, as opposed to just US Billboard

Download

git clone https://github.com/initstring/passphrase-wordlist.git

Cleaning sources

Check out the script cleanup.py to see how I’ve cleaned the raw sources.

It works like this:

$ python3.6 cleanup.py infile.txt outfile.txt
Reading from ./infile.txt: 505 MB
Wrote to ./outfile.txt: 250 MB
Elapsed time: 0:02:53.062531

Hashcat Rules

Given the phrase take the red pill the first hashcat rule will output the following

take the red pill
take-the-red-pill
take.the.red.pill
take,the,red,pill
take_the_red_pill
taketheredpill
Take the red pill
TAKE THE RED PILL
tAKE THE RED PILL
Taketheredpill
tAKETHEREDPILL
TAKETHEREDPILL
Take The Red Pill
TakeTheRedPill
Take-The-Red-Pill
Take.The.Red.Pill
Take,The,Red,Pill
Take_The_Red_Pill

Adding in the second hashcat rule makes things get a bit more interesting. That will return a huge list per candidate. Here are a couple examples:

T@k3Th3R3dPill!
T@ke-The-Red-Pill
taketheredpill2020!
T0KE THE RED PILL (unintentional humor)

Enjoy!

Copyright (c) 2018 InitString
Source: https://github.com/initstring/