top of page

Electronic Discovery Institute Course - Class 11 Data Processing and Management

Here's a continuation of my postings about the Electronic Discovery Institute's online e-discovery certification program, that you can subscribe to for just $1. I last blogged about this program on August 19, 2017. Go to to sign up for it.

Tonight I took the course on data processing and management. The course is taught by Amy DeCesare, a VP for litigation management at Allied World; Veronica Gromada, counsel for Walmart's litigation support group; and Ashish Prasad, counsel for eTERA Consulting.


Data processing refers to the export of potentially relevant files from a data set that has been collected and pre-processed. Pre-processing refers to the removal of unwanted files from a data set. Processing is a precondition for the hosting for active files in a format that can be reviewed. The processing of data turns it into something that attorneys can use for well managed review. Processing must be done in a legally defensible manner that is cost effective for clients.

Prasad said that these questions should be asked prior to beginning processing:

1. How should meta data be handled?

2. Can the chain of custody be proven on an individual file level?

3. How will exclusions be handled, as for example with system files?

4. Will de-duplication be performed? Will it be done on a custodian level or a global level?

5. What type of data culling will be done?

A large set of data must be brought down to management level. On an organization's systems, there are often multiple, duplicative data sources. Before processing, one should consider if one can cull the data by date range. One should only work with data pertinent to a matter. De-NISTing should be performed to remove unreviewable system files.

DeCesare said to perform Early Data Analytics before processing to see what one has. Are there encrypted files? Are there files in foreign languages? Data should be handled strategically because processing can be expensive.

Data Conversion

The point of processing is to index and format the data for attorney review. One has to consider how native files are to be produced. A careful record should be maintained showing how data was processed.

Prasad said there were five steps in data conversion:

1. Data arrives.

2. Data is pre-processed.

3. Data is imaged, or remains as native files.

4. Load files are prepared and data is loaded to Relativity or another review platform.

5. Production data is delivered.

Processing Funnel

The processing funnel is a graphic representation commonly used for processing.

During process data is extracted, filtered by target users; filtered by date ranges; keyword searching is performed; and then data is output into a review tool. Then privilege review takes places, before a production is sent to the opposing party.

The funnel begins with a vast set of possibly useful data that must be distilled for what is needed in discovery. One may begin with a custodian interview. Key data sources should be outlined, and one should prioritize their data sources. A top tier custodian may reveal smoking gun documents, that will let one glean key terms or concepts that will appear repeatedly. A well organized custodian may have retained documentation related to a legal dispute in an Outlook .pst file or a SharePoint site.


Culling is defensibly reducing the data that needs to move on for review. It may not be necessary to collect from every custodian that has been put on legal hold. An attorney must be able to defend the actions of the electronic discovery technician, and explain them in court. The level of culling will depend on what has been agreed upon by the parties. At the Rule 26(f) meet and confer the parties may agree on discovery beimg limited to certain subjects, date ranges, and custodians from which data may be collected. Information from custodian interviews should be used for the culling process. Data should be clustered together, and divided into workstreams in which one can prioritize the documents that are likely to be the most relevant hits. Attorneys can review the documents that seem to be most relevant and identify hot documents that can inform the manner of subsequent searches.

A 100 GB hard drive may hold 5 million pages of documents. Culling is necessary to reduce the number of documents to be reviewed. 80-90% of documents may be culled out prior to review in a typical seenario.

Selecting & Managing Processing Vendors

Consider what one's in-house capabilities are. Can an IT team assist with the collection of data? Subject matter experts may not be familiar with electronic discovery, and a special e-discovery expert may need to able to testify about the process. A firm or company should develop a relationship with an e-discovery vendor that has adequate cyber security measures in place. There are different models of pricing.

Prasad said to consider these factors when selecting a processing vendor:

1. Its experience and qualifications

2. Insurance capacities

3. Reputation and references

4. Reporting capabilities with respect to the processing workflow

5. Does the vendor have staff that can testify about best practices that were conducted in processing

6. Industry standard certification staff

7. Appropriate facilities and technology

The counsel is the overall project manager. Despite their lack of technical knowledge, they should know concepts such as the de-NISTing of files.

Model Rule 5.3 has been amended to provide that a lawyer must make reasonable efforts to ensure that a non-lawyers perform services in a manner that is compatible with the attorney's legal obligations. The attorney must confirm that the vendor is performing their tasks competently.

Master Services Agreement & Service Level Agreement

A master servicing agreement may accompanied by a project specific template, or statement of work. The MSA is a document that governs the professional relationship between the firm and its vendor. It discusses the purpose of the engagement, its time period, rights duties & obligations, and the services the parties expect to receive.

In-Sourced Processing Technology

Different technologies will been needed for collection and processing. It's important that the tools are easy to use. Gromada said that bots may be used to as automated data dictionaries to scope out what is available in an organization's system.

Processing Pricing Models

Prasad identified three pricing models:

1. Traditional - per unit or tiered pricing

2. Project base pricing.

3. Subscription model, which involves billing on a yearly basis

Per unit processing does not allow for cost certainty, as the subscription model does. Gromada noted that many companies will make flat fee agreements but there are many different arrangements. Volume is very important and key driver of costs. A volume discount pricing model may make sense for some companies. Data may not be need to be hosted over long periods of time.

The traditional model for processing is an a la carte model - where one can pay for OCRing, image conversion and other services per GB. Some arrangements may have a fixed fee for data ingestion.

The Goals of Processing

The essential goal is to index and format data for review. Legal professionals should confer and come up with a sound approach that allows all parties to confirm that their legal obligations have been fulfilled. Any processing project should have a robust validation and QC phase.

bottom of page