Digital Corpora's Gov Docs

Digital Corpora, operating under a grant from the National Science Foundation, has posted electronic files here, which can be used to test forensic and electronic discovery techniques.

A thousand separate directories, each with a thousand files, can be downloaded for review. A set of more than 100,000 jpegs is available. The metadata for the files includes search terms; search engines used to find the files; and SHA1 hash values. The files were collected from the United States government. Malware has been deliberately left in the data. The full set includes nearly one million files in a wide variety of formats. One possible drawback is that a very small number of the files (only about 2000) are email files.

Digital Corpora has also posted images made of cell phones, and disk images. PII data has been removed from the disk images. Forensics students can practice with disk images in the EnCase format. These contain information on how data was taken from a fictional businessperson's laptop, and the challenge is to find out if the data was taken by a malicious actor, or intentionally disclosed by the employee.

