top of page

Electronic Discovery Institute Course - Class 16 - Early Data Analysis and Data Reduction Strategies

Here's a continuation of my postings about the Electronic Discovery Institute's online e-discovery certification program, that you can subscribe to for just $1. I last blogged about this program on December 23, 2017. Go to to sign up for it.

This course is taught by Taylor Hoffman, the global head of eDiscovery at Swiss Re; Stacey Blaustein, who manages eDiscovery for corporate litigation for IBM; and Ross Gotler, an eDiscovery counsel with Paul Weiss.

What is Early Data Analysis?

EDA involves using technology tools to get information about a data set. You can map out how many email messages there are for particular custodian, and also identify gaps in the emails that have been collected. EDA can help organize the full document review, pinpointing key custodians by showing important communication patterns. Key topics and phrases can be found in the data set.

Why is EDA Important?

When the attorneys attend a rule 26(f) conference the attorneys should be transparent about the time, cost, and difficulty of electronic discovery.


Early case assessment predates electronic discovery. Early case assessment helps a client decide whether or not they should settle a case. The two types of assessment do overlap however. Early data analysis is the first step before one can perform early case assessment. Without an early data assessment, ECA cannot be conducted on the merits.

Goals & Benefits of EDA Preservation is very important and starts very early on. When a complaint is received the preservation obligations begin. Preserving data and processing data with a vendor is very costly.

EDA can help one reduce the data that needs to be reviewed. ECA cannot be conducted unless all of the relevant data has been considered. EDA can help validate collection processes and find date gaps. It can be used as part of the predictive coding process to help with the development of seed sets. It helps garner intelligence to be used in settlement negotiations.

Hoffman believes that EDA is vital for meet and confer conferences. If outside counsel have not asked the right questions of in-house counsel, the meet and confer will go poorly because it won't be possible to make representations about what data exists. This will force a company to over preserve.

Judges are requiring parties to be educated about the documentation in a case. At the Rule 26(f) conference, parties should be aware of the number of depositions that will taken, how many custodians will be allowed in production, and what deadlines will be needed in the scheduling order.

EDA should foster cooperation between opposing parties. A thorough EDA helps to get parties to agree upon custodians and reduce the likelihood of spoliation sanctions.

Traditional Ways to Process ESI

Early technical tools simply facilitated the process of manual review. Today data is loaded into a review platform and searches are run to determine the amount of relevant data. The results will show the number of hits per custodian or data group.

There should be a good understanding of the nature of the data in order to structure the responsiveness review.

Using Keywords for Filtering

Lawyers are used to using keywords in Lexis and document management systems, but using them to review a large document set is not necessarily the best approach. Keyword filters can be helpful, but there needs to be an overall understanding of the nature of the data set. Search terms may miss slang or code words. In William A. Gross Const. v. Am. Mfg.Mutual Ins. Judge Peck icriticized attorneys for not preparing sufficient keyword lists. He instructed the parties to question the email custodians in order to get better keyword lists.

Blaustein pointed out that the development of keyword lists is an interative process.

EDA Tools

The analysis of meta data can tell you what data has been collected from whom, and who communicated with who else during what time periods. Some tools may visually group together certain types of documents. Email threading eliminates the earlier emails, leaving behind a single inclusive email. Documents can be clustered together or organized according to a chronology.

Blaustein said that indexing will provide a structure to the data that might be helpful to an attorney at the start of a case. Concept clustering lets the computer bring back documents that are similar in nature to those designated by an attorney. Email communication between employees can be graphically represented to help hone in on particular topics discussed by the employees. It also helps ensure that all relevant custodians are being included.

Hoffman always starts with a data map of a company, so he knows what tools to use on what buckets of data.

Predictive coding uses algorithms that a provide a mathematical probability that a document will be relevant. Some judges have required sua sponte that predictive coding be used.

Electronic discovery may take up 80% of the cost of the case according to Blaustein.

Hoffman uses predictive coding for the prioritization of review. Computers can suggest other potentially relevant documents based on the relevant documents designated by an attorney.

EDA tools can also be used on data received from opposing counsel. The data may contain metadata that can be loaded into a review tool.

When to Start EDA?

EDA should begin immediately after a complaint is filed. EDA will help parties assess their position in a matter; if they have a defensible position; if a case has any merit. A document retention order should be sent out shortly after a complaint is received.

There may be a digital team room that keeps all of the information about a particular subject.

Hoffman said that he thinks a company needs to be prepared to conduct EDA globally - search all possible sources for data and determine who has access to it to possibly alter or delete it.

EDA as an Iterative Process

EDA should not be done just once. New issues may arise as part of a document review. EDA should be done on a non-case specific basis. The company should be aware of where all of its data is.

Claims may be added or dropped from a suit, necessitating a modification of the EDA.

Who Should Be Involved in the EDA Process?

Attorneys, electronic discovery professionals and vendor experts should all be involved in the eDiscovery process. eDiscovery experts will now how to use all of the tools that are available in a platform.

EDA does not only involve internal parties, but should also involve outside counsel. You should expect them to ask about your data sources. They know the merits of a matter and help identify which custodians should be interviewed.

Why is Asking EDA Questions Important?

It's important to ask questions that a specific to a particular case. One should expect to be asked questions by the custodians themselves. It's important that custodians understand that ESI can't be deleted and that they understand the duty to preserve. Spoliation motions are becoming more and more frequent.

When to Stop EDA

The right amount of EDA depends on the case. A law firm efforts should not be too much of a burden on a company. Blaustein said that when one can answer questions about the scope and volume of data in a case, it's possible to stop EDA and feel satisfied that a diligence analysis has been performed.

Downsides to EDA

EDA takes up a lot of time at the beginning of matter, and this time may simply not be available. In Blaustein's experience some courts have a rocket docket and there may not be enough time to do a thorough EDA.

There must be buy-in from the stakeholders - IT, business partners, etc., which is a heavy political lift. Weighed against the risks of not conducting EDA - running afoul of data protection requirements for example, however the effort is worth it.

When there is a compressed time frame in a case, there will be resistance to anything that the reduces the time for document review.

When EDA May Not Be Appropriate

The specifics of a case may not call for EDA. If there is an agreement on keyword terms early on, EDA may not be necessary. A company with a good system for tracking all of its data may not find it necessary to perform EDA.

bottom of page