top of page

For a good guide on how to collect data from a hard drive in a manner that is defensible under Minnesota state law, see Novacheck, Mary T.; Thornton, Molly B.; Beard, Jeffrey J.; and Burns, Mark (2014) "IT Technologies and How to Preserve ESI Cost Effectively," William Mitchell Law Review: Vol. 40: Iss. 2, Article 6, available at: http://open.mitchellhamline.edu/wmlr/vol40/iss2/6. See Appendix A, 'Sample Technologies for Preservation and Collection - Hard Drives'. The guide shows how data can be collected with EnCase. The authors are a partner at Bowman and Brooke LLP, a partner at Dorsey & Whitney LLP, an information governance consultant for IBM, an e-discovery manager for Boston Scientific, and a manager with KPMG's Forensic Technology Services practices.

Follow these basic steps:

1. First document each step in the process with a checklist.

2. The hard drive of the source computer is extracted, connected to a write blocker, which is then in turn connected to the forensic expert's PC.

3. EnCase allows for individual directories to be collected. When you're ready to proceed, click 'Acquire'.

4. The paper recommends the following settings for the output. A file output from from EnCase will have a .E01 file extension. EnCase outputs data in 640 MB image files by default.


 
 

Here's a continuation of my outline of the 2016 edition of Craig Ball's Electronic Discovery Workbook which I last posted about on May 5, 2017.

A. Single Instance Archival Solutions - can reduce the unnecessary replication of ESI.

1. Ball suggests that a third of messages are duplicates in such archives.

B. Problem of Replication in Attorney Review

1. Ball: "Failing to deduplicate substantial collections of ESI before attorney review is tantamount to cheating the client."

2. Case Study: New review platform couldn't tag emails that were already produced because when the same Outlook message is exported at different times as a .msg file each message will have a different hash value.

C. Mechanized De-depulication of ESI

1. Hashing files.

a. emails with the same displayable content may have different hash values because they traversed different paths to reach the same recipients.

b. hash values must be preserved throughout the process.

2. Hashing segments of a message (subject, to, cc, etc.) likely to match, and excluding those that may be different - message headers containing server paths and unique message IDs.

a. normalizing the message data, e.g., alphabetize addresses without aliases.

3. Textual comparison of segments of a message to determine if they are sufficiently identical for purposes of review.

4. $100 tool available to hash Outlook .pst files that are under 2 GB each. [What tool, Craig?]

D. Hashing

1. ESI is just a a series of numbers (byte encoding schemes) and algorithms can generate a smaller, fixed length value from them - a message digest or hash value.

a. Message Digest 5 (MD5) (32 hexadecimal characters, which represent 340 trillion, trillion, trillion values) and Secure Hash Algorithm One (SHA-1) are most common hash algorithms used in e-discovery.

2. In sets of duplicate files with the same hash values, the first file is called the pivot and the set of duplicates is the occurrence log.

3. System metadata is not contained in the file, and not included in the calculation when the file is hashed. E.g., file name.

E. Word .docx files

1. A Word .docx file is a mix of text and rich media encoded in XML then compressed with ZIP algorithm.

2. Encoding scheme will be completely different if written to TIFF or PDF.

3. No two optical scans will be the same.

4. Ball's testing showed that saving the same Word document to a PDF with same settings will sometimes generate files with the same hash values.


 
 
  • May 27, 2017

Blockchains are a new concern in electronic discovery. The basic technology was developed back in 2008. Its first use was for bitcoins as means to prevent double spending - or the use of a digital token more than once. Timestamped transactions are recorded in a public ledger in such way that they cannot be altered. Data in a blockchain cannot be changed once is is recorded. Blockchains use distributed databases - there are multiple storage devices - not linked to a common processor. So a blockchain is stored on many different computers simultaneously. Each block in the chain has a hash value from the previous block. The integrity of the block chain is confirmed by the network.

IBM, Ernst & Young, KPMG, PwC, and Deloitte are all developing private blockchains. It is also being used to store medical records.


 
 

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page