bulk_extractor is a program that extracts functions such as e-mail addresses, credit card numbers, URLs, and other types of information from digital evidence files. It is a useful forensic survey tool that can be used for many tasks such as malware and intrusion surveys, identity surveys and web surveys, as well as image analysis and password cracking. The program provides several unusual features:
- Discover other tools that can not be found, such as e-mail addresses, URLs, and credit card numbers, as it can handle compressed data (such as ZIP, PDF, and GZIP files) as well as incomplete or partially corrupted data. It can extract JPEG files, office documents and other types of files from fragments of compressed data, and can automatically detect and extract encrypted RAR files.
- Build a list of words based on all the words found in the data, or even data in compressed files that are not allocated space. These word lists can be used for password cracking.
- multithreaded; fast time
- After the analysis, create a histogram that displays the e-mail address, URL, domain name, search keywords, and other types of information.
bulk_extractor can analyze disk images, files, or file directories and extract useful information without analyzing the file system or file system structure. The input is split into pages and processed by one or more scanners. The results are stored in the feature file and can be easily checked, parsed, or processed using other automation tools. bulk_extractor also creates a histogram of the features it finds. This is useful because functions such as email addresses and web search keywords are often common and important.
In addition to the above features, bulk_extractor also includes the following features:
- Bulk Extractor Viewer with the functions stored in the browsing feature file and the graphical user interface that started the bulk_extractor scan
- A small number of Python programs for extra analysis of feature files
Tutorial
Use
root@kali:~# bulk_extractor -o bulk-out xp-laptop-2005-07-04-1430.img
bulk_extractor version 1.6.0-dev
Hostname: kali
Input file: xp-laptop-2005-07-04-1430.img
Output directory: bulk-out
Disk Size: 536715264
Threads: 1
Phase 1.
13:02:46 Offset 0MB (0.00%) Done in n/a at 13:02:45
13:03:39 Offset 67MB (12.50%) Done in 0:06:14 at 13:09:53
13:04:43 Offset 134MB (25.01%) Done in 0:05:50 at 13:10:33
13:04:55 Offset 201MB (37.51%) Done in 0:03:36 at 13:08:31
13:06:01 Offset 268MB (50.01%) Done in 0:03:15 at 13:09:16
13:06:48 Offset 335MB (62.52%) Done in 0:02:25 at 13:09:13
13:07:04 Offset 402MB (75.02%) Done in 0:01:25 at 13:08:29
13:07:20 Offset 469MB (87.53%) Done in 0:00:39 at 13:07:59
All Data is Read; waiting for threads to finish...
Time elapsed waiting for 1 thread to finish:
(please wait for another 60 min .)
Time elapsed waiting for 1 thread to finish:
6 sec (please wait for another 59 min 54 sec.)
Thread 0: Processing 520093696
Time elapsed waiting for 1 thread to finish:
12 sec (please wait for another 59 min 48 sec.)
Thread 0: Processing 520093696
Time elapsed waiting for 1 thread to finish:
18 sec (please wait for another 59 min 42 sec.)
Thread 0: Processing 520093696
Time elapsed waiting for 1 thread to finish:
24 sec (please wait for another 59 min 36 sec.)
Thread 0: Processing 520093696
Time elapsed waiting for 1 thread to finish:
30 sec (please wait for another 59 min 30 sec.)
Thread 0: Processing 520093696
All Threads Finished!
Producer time spent waiting: 335.984 sec.
Average consumer time spent waiting: 0.143353 sec.
*******************************************
** bulk_extractor is probably CPU bound. **
** Run on a computer with more cores **
** to get better performance. **
*******************************************
Phase 2. Shutting down scanners
Phase 3. Creating Histograms
ccn histogram... ccn_track2 histogram... domain histogram...
email histogram... ether histogram... find histogram...
ip histogram... tcp histogram... telephone histogram...
url histogram... url microsoft-live... url services...
url facebook-address... url facebook-id... url searches...
Elapsed time: 378.5 sec.
Overall performance: 1.418 MBytes/sec.
Total email features found: 899