top of page

Structured is data that is contained in a field for a particular record - such as data in a spreadsheet or a relational database. Unstructured data. Most data is unstructured - electronic files, metadata, web pages.

Semi-structured data is data that is not organized in a relational database, but which contains tags that separate elements of the data. An object database uses semi-structured data. It does not use tables, but instead creates relationships between two data sets directly. Many-to-many relations are established, rather than one-to-many as in relational databases. Object databases work better with complex data.

JSON, email, electronic data interchange (EDI) files, and .xml files are examples of semi-structured files. A .xml file for the contents of a Word document uses hundreds of tags for character formatting, footnotes, etc. - which are nested together in different levels. A .docx file consists of XML files inside a ZIP archive.


 
 

Here's a summary a section of Craig Ball's Electronic Discovery Workbook, on '"Forms that Function".

A. Form

- ESI should be produced in forms that function - that preserve the integrity, efficiency, and functionality of digital evidence.

B. Federal Rules of Civil Procedure

1. Fed. R. Civ. P. 26(f)(3)(C) requires that the discovery plan address the form in which ESI is to be produced.

2. Fed. R. Civ. P. 34(b)(1)(C) permits requesting parties to specify the form of production.

3. Fed. R. Civ. P. 34(b)(2)(D) allows the producing party to object to the requested form.

4. Fed. R. Civ. P. 37(a)(1) if there is a dispute the parties must meet and confer to reach a resolution.

5. Fed. R. Civ. P. 34(b)(2)(E)(ii) requires parties to produce ESI in the form in which it is ordinarily maintained.

6. If the parties can't agree, the requesting party has to file a motion to compel.

C. Native Files

1. Multiple forms should be used in most productions: images; native; near native; or hosted. Near native productions are useful for enterprise email, databases and social media content.

2. Ball advises against converting native files to TIFF images because:

a. the expense for conversion and for load files

b. spreadsheets cannot be easily converted into images

c. TIFF images tend to be 5-40 times as large

d. the images can't be de-duped.

e. a request for re-production with natives may be made.

3. Ball rejects these four excuses for refusing to produce native files:

a. hard to Bates label.

b. evidence may be altered.

c. native productions require broader review.

d. native files can't be redacted.

D. Form

1. Don't ask for documents - ask for information in a useful and complete form. 'Information items'.

2. Specify a format, not just native files. E.g. .xlsx; pptx.

3. For email, ask, "Can the form produced be imported into common e-mail client or server applications?"

4. Specify the load file format - include:

a. hash values

b. UTC offset

c. deduplicated instances

d. email folder path

e. redaction flags

f. embedded content flag.

5. Include logical unitization data

E. De-duplication and Redaction

1. Vertically de-dupe by custodian.

2. Don't perform near de-duplication.

3. Redactions should not impair the ability to search through non-redacted content.


 
 
  • Jun 14, 2020

If you are using the current version of Nuix Workstation, released in May 2020, note that you will not be able to successfully process certain types of data without installing third party software. Nuix workstation is used to process data from both structured and unstructured data. Nuix Workstation can process, search, index, and analyze ESI. Nuix Workstation will identify email addresses; dollar amounts; company names; and phone numbers.

In order to process Lotus Notes archives with Nuix Workstation you will need to install Lotus Notes Client.

In order to extract data from video files, FFmpeg should be installed. FFmpeg includes codecs for decoding audio and video files. It can convert audio and video files into different formats.

Nuix uses ABBYY FineReader to OCR documents. The 'Nuix OCR Addon' must be installed for this feature to be enabled.

An plug-in is also needed to use Elasticsearch, which allows large data sets to be searched in realtime.


 
 

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page