top of page

Electronic Discovery Institute Course - Class 12 - Search and Review of ESI

Here's a continuation of my postings about the Electronic Discovery Institute's online e-discovery certification program, that you can subscribe to for just $1. I last blogged about this program on September 8, 2017. Go to to sign up for it.

Tonight I took the course entitled, "Search and Review of Electronically Stored Information". The course is taught by Judge Craig Shaffer, a magistrate judge with the District of Colorado; Maura Grossman, an attorney with Wachtell LIpton LLP in New York, who's an expert in technology assisted review; Nan Nolan, a retired judge from the Northern District of California; James Francis and Andrew Peck, magistrate judges with the Southern District of New York; Jerone English, the director of electronic discovery at Intel; and Dan Kulakofsy, general counsel with the Travelers Group.

Why is Search & Review Important?

The amount of ESI has grown greatly, changing the nature of litigation, and requiring better ways to search through the data. Both the volume of data and the cost of reviewing have grown greatly. Judge Francis said that in the old days a complex case may have been a product liability case, which by today's standards would be regarded as simple. Today, a really complex case could be a securities case involving data that is scattered on different servers and in many different applications.

Processing, De-NISTing & De-Duplication

Grossman noted that processing helps get ESI ready to be loaded in a review database. It involves de-NISTing which involves removing file types which are not user created. English noted that some files with the .doc extension are generated by the Microsoft Word software itself upon installation and should be excluded from Review. Grossman stressed the importance of removing daily blast emails which are sent to a large group on a regular basis and de-duping across custodian data sets, which doesn't involve losing track of which custodian had access to which electronic file.

Keyword Search

Grossman said that one should try out keywords, and see what results turn up in samples. English said the problem with keyword searching is its being limited to the words that are agreed upon. He likes to consult subject matter experts to see what terms they use in the actual documentation that is being reviewed.

Judge Peck said the problem with keyword searching is manifold: - misspellings are rampant, (in the In re Seroquel Prods. Liab. Litig. case, -an analysis of dictionary word list showed there were at least a half dozen misspellings of Seroquel); synonyms are very common; keywords return a great deal of false positives; and lawyers are terrible with figuring out what keywords to use and are in effect playing go fish with little knowledge of what an opposing party's ESI contains.

Kulakofsy suggested an iterative approach should be used - since a keyword returns a lot of non-responsive information. Keywords must be put through a process that sees whether or not the terms are effective in turning up useful information.

Should Keywords Be Negotiated?

Judge Francis said that a court would not require agreement between the parties on what terms to use in searches, but would usually encourage it. It provides protection to the producing party against later disputes as to the adequacy of their production. Judge Shaffer also said there was no absolute requirement that parties agree on keywords, but did think it was beneficial.

Are Keywords Subject to Attorney-Client Privilege or Work-Product Protection? Kulakofsy said that he through there was a good argument to be made that keywords are attorney work product. Judge Shaffer said that the majority view was that they are not work product, even though there is some case law that supports the view that they are.

Boolean Search

Grossman said that such searches with connectors can help narrow and focus a search for responsive ESI.

Fuzzy Search

Grossman discussed the use of root extenders as a way to make up for the misspellings that some people make.

Concept Search

Grossman noted that conceptual search can bring in irrelevant data, and over expand a search.


Grossman described clustering as unsupervised learning - the computer will group documents by word frequencies.

Visualization Tools

Grossman said visuaization will help a user see patterns in documents, in terms of frequencies of terms in documents

Relevance Ranking Tools

Grossman described these tools as being similar to Google in that they help the reviewer find the most relevant documents first.

Custodian Search

Judge Francis mentioned the debate on whether or not custodians should self-collect. He said that it depends on the reliability of the custodian. If a custodian has a vested interest in a case they should not be conducting the search. One should also ask about the relationship between a custodian's duty and the issues in the litigation, and their technical ability to find data.

Judge Peck said that custodians do know best where they keep data, so it's a good idea for them to be involved in a case, but their searches should be supervised. Care should be taken to make sure they don't alter the metadata associated with the ESI.

Judge Shaffer said that custodians can search their own data in cases where the facts and issues at hand are quite narrow. In a big class action issue with lots of plaintiffs, it would be less beneficial to have a custodian to search their own data.

Contract Attorneys & Managed Review Grossman noted that contract attorneys can work either in a law office, or off-site at a distant location. Kulakofsy said that managed review was more likely to take place in larger cases. Managed review involves outsourcing the review process to a difffernt company that will also supervise the reviewers, even though the attorney bears ultimate responsbility for the review

Limitations of Manual Review Grossman noted that it's very difficult for attorneys to acknowledge that their review process is not perfect. She has often seen tagging structures with fifteen or more issues, and said it was very difficult for contract attorney to keep so many issues in mind during manual review.

Technology-Assisted Review Grossman said that in technology-assisted review, a subject matter expert should review a small set of documents to help train a system to find responsive documentation. She characterized predictive coding as machine learning technique, which is included in the broader TAR umbrella term.

English noted that TAR can't handle images and certain types of Excel spreadsheets.

How Much Disclosure is Necessary When Using TAR? Judge Francis said that the law doesn't require disclosure of TAR methodologies, but it is advantageous to come to an agreement on what kinds of TAR to use to avoid future disputes. Judge Nolan said she thought greater transparency would lead to less disputes about whether or not TAR could be used.

Is TAR Held to a Higher Standard than Manual Review?

Judge Francis said judges would hold TAR to a higher standard of defensibility initially as it comes to be used widely for the first time. Kulakofsky noted that TAR technology is actually several decades old. He said that a lack of understanding of the technology explains the failure of many firms to adopt it. Grossman thought that TAR was being held to too high of a standard, and criticized too strongly when it made mistakes in identifying responsive documents that were more likely to occur in manual review.

Judge Shaffer thought that TAR and other search methods should be held to a standard of reasonableness.

Judicial Intervention in Search & Review Disputes Judge Peck said TAR disputes should come before the court in some cases, but that he was a believer in Sedona Principle Six which states that the responding party is in the best position to decide how to find its own responsive documents. In federal cases with the requirement for the 26(f) conference (where it is not a drive by, as it often is, according to Peck), the fact that a party is using TAR is likely to come out.

Judge Francis said that courts believe fewer disputes will arise when a judge becomes involved in the discovery process early on. The producing party may be required to defend the methodology it uses. The court is unlikely to inquire about precision and recall stats. Expert testimony may be required however as to the reliability of a search.

The Daubert Standard & Search Disputes

Judge Shaffer said that he even if Daubert / FRE 702 are not applied as a matter of law, and there is no official hearing, he will still look to those standards to measure the reliability of searches. Judge Nolan said that Daubert at its core was about reliability. Parties will have to defend the processes they used.

Quality Control & Quality Assurance Quality control is required for 26(g) certification on discovery results. Grossman recommended doing judgmental sampling with keyword searching to find documents that may have been missed by a review process.

Judge Francis noted that recall means the percentage of responsive documents that are obtained, and precision the percentage of retrieved documents that are responsive. He said that parties should be concerned with both quantity and quality.

Does the "Perfect" Search & Review Exist? Judge Francis said that in the real world there is no such thing as a perfect search. The court expects the parties to engage in a reasonable search guided by the principles of proportionality. English noted that reasonable steps for a search would differ from case to case.

Judge Shaffer said that discovery has never been held to a standard of perfection. Judge Peck said a FRCP 26(g) certification is the equivalent of FRCP 11. It requires that the results are reasonable - not that discovery is complete. It certifies that disclosure is complete.

Judge Shaffer said that 26(g) does not require a lawyer take extreme steps to make sure all potentially responsible ESI has been searched.

Judge Shaffer said that he would in part evaluate a search methodology to the extent to which it generated a lot of motion practice.

bottom of page