Here's another installment of my outline of Electronic Discovery and Digital Evidence in a Nutshell, the second edition of the West Academic guide to electronic discovery law in the United States authored by Judge Shira Scheindlin (the judge in the Zubulake v. UBS Warburg) and members of the Sedona Conference. An outline of the previous chapter was posted on December 20, 2016.
CHAPTER V – SEARCH AND REVIEW OF ESI
A. SEARCH METHODS
a. Filtering on select criteria – search terms, search terms with Boolean operators, date ranges, file sizes, file types, etc.
b. Victor Stanley v. Creative Pipe, (D. Md. 2008) – where party did not take reasonable steps to identify privileged documents where they could not identify keywords used, qualifications of person who designed the search, and whether results were analyzed to check reliability
c. William A. Gross Constr. v. American Mfrs. Mut. Ins. Co. (S.D.N.Y. 2009) court ordered parties to meet-and-confer where they could not agree on keywords to search party data on non-party server, rather than designating on its own.
B. USE OF TECHNOLOGY FOR SEARCH AND REVIEW
a. Reasonable process for searching large volumes of data:
i. Collect data from client using filtering process.
ii. Sample data to determine proper scope of data to be processed vis-à-vis likelihood of responsive documents and corresponding costs and burden.
iii. Load into electronic review platform for analysis.
iv. Establish selection criteria.
v. Filter data using selection criteria
vi. Perform review for responsive, privilege and production determinations.
b. Craig Ball – Step by Step of Smart Search
i. Start with Request for Production
ii. Seek input from key players – what event triggered conversation?
iii. Look at what you’ve got and the tools you’ll use. Search tool should be able to review content of container files, and identify exceptional files that can’t be searched.
iv. Communicate and Collaborate – tell other side the tools and terms you’re using. Do two rounds of keyword search giving the other party the chance to review the first production before proposing additional searches.
v. Incorporate misspellings, variants and synonyms.
1. Fuzzy searching.
vi. Filter and deduplicate first
vii. Test, test, test – false hits may turn up in system files. – test keywords against data that is clearly irrelevant.
viii. Review the hits – preview search hits in context in a spreadsheet with 20-30 words on each side of the hit.
ix. Tweak queries and retest – achieve better precision without effecting recall.
x. Check the discards – examine closely when refining queries; more random later.
c. Search Tips
i. Tier ESI and have keywords for each tier.
ii. When searching email for recipients each by the email address, not the name.
C. TECHNOLOGY ASSISTED REVIEW
a. How does it work?
i. Computerize system that harnessing human judgments of subject matter experts on a smaller set of documents, then extrapolates judgement to the remaining document collection.
ii. Knowledge engineering – rule based or linguistic approach. Capture expertise of SME. Rules discriminate between responsive and non-responsive documents.
iii. Machine learning – predictive coding – need training set of responsive and non-responsive documents. Can be passive or active.
1. Passive expert select training documents using judgmental or random selection.
2. Active learning – after SME selects seed set, computer selects documents for SME to review and add to training set. Finds documents whose relevance is least certain.
iv. Grossman-Cormack TREC study. Used F1 scores – harmonic mean of recall and precision – average weighted to the lower of the two measures. Two best TAR teams did twice as good as human judges.
b. Have Courts Accepts TAR as a Search Methodology?
i. Da Silva Moore v. Publicis Groupe (S.D.N.Y. 2012) Use of predictive coding appropriate given
1. Parties agreement
2. Large amount of ESI - >3M documents.
3. Superior to manual review or keyword searches
4. Need for cost effectiveness and proportionality under 26(b).
5. Transparent process.
Staging of discovery by most relevant sources good way to control costs, must judges willing to grant discovery extension to allow for staging. Parties should consider ‘strategic proactive disclosure of information’ – tell other side who your key custodians are. Helpful to have ediscovery vendors at hearings
ii. FHFA v. HSBC (S.D.N.Y. 2014) party permitted to use predictive coding even though opposing party showed a document was not produced that was in a parallel case. No one should expect perfection from the document review process.
c. Can Responding Parties Be Required to Use TAR?
i. Kleen Prods. V. Packaging Corp. (N.D. Ill. 2012) Sedona Principle 6, parties are best situated to evaluate procedures to preserve and produce their own ESI. Court urged parties to come up with way to refine Boolean search rather than ordering a TAR search.
ii. EORHB v. HOA Holdings, (Del. Ch. 2012) court ordered both parties to use TAR sua sponte. but later withdrew the order.
d. How Involved Should the Court Get in the TAR Process?
i. Most of the courts that have addressed TAR issues have either entered orders stipulated by the parties, advised the parties to negotiate a protocol, or allowed a party to proceed with their own protocol, without prejudice to the opposing party challenging the adequacy of the results.
ii. Independent Living Ctr. of S. Cal. v. City of Los Angeles (C.D. Cal. 2014) Should predictive coding protocol include a quality assurance phase? Plaintiff could insist it be performed if they paid one half of costs.
iii. Judicial Modesty: Not an Oxymoron: Case for restraint in the electronic age Judge Francis – parties are entitled to use TAR, but whether the tool will produce reliable results is something that judges are ill equipped to determine. But judges should try to resolve e-discovery issues at pretrial conferences; phase discovery and order sampling.
D. NEGOTIATION AND AGREEMENT ON SEARCH PROTOCOLS
a. Paul/Baron Information Inflation Process:
i. Meet and Confer to identify methods to narrow scope of searches.
ii. Conduct initial searches on meet and confer parameters.
iii. Share initial search results and adjust parameters.
iv. Repeat process in an iterative fashion until mutually agreed time or mutually agreed cap on responsive documents reached.
b. In re Biomet Hip Implant (N.D. Ind. 2013) Biomet through keyword culling and de-duping narrows documents from 19M to 2.5M then used predictive coding on the set of 2.5M. Plaintiffs move for predictive coding on all 19M documents. Court rules likely benefits outweighed burden and expense to Biomet. Court also refused disclosure of seed set.
c. I-Med Pharma v. Biomatrix (D.N.J. 2011) party relieved of obligation to produce documents recovered from unallocated space after searching all data on system without limiting to certain custodians or time periods.
d. Progressive Cas. v. Delaney (D. Nev. 2014) after search as per party agreement with search terms found 565K in 1.8M document set, party not allowed to use TAR to reduce set to 55K documents, had to produce all original hits.
e. Edwards v. National Milk (N.D. Cal. 2013) Model Joint Stipulation and Order
1. Document Review Corpus – document remaining after exclusion of document types, dupes, system files, documents outside of date range.
2. Confidence level likelihood that sampling is accurate.
3. Estimation Interval – statistical error rate of a measured confidence level.
1. Document Collection
2. Control Set - a random, statistically valid sampling of documents to estimate the number of responsive documents in the corpus. The control set sample shall be determined using a 95% confidence level and 2% estimation interval.
3. Seed Set and Initial Training – seed set includes responsive document identified by party and found with Boolean search terms.
4. Iterative Review and Further Training
5. Review - Review and iterative training will proceed until all identified documents have been reviewed and the system indicates that the remainder of the document review corpus is not likely to be responsive.
6. Validation - perform a validation test by reviewing a statistically valid and random sampling of unreviewed documents to confirm that the number of potentially responsive documents in the unreviewed corpus is statistically insignificant. the review process will continue until the validation test achieves a 1% or less responsiveness rate.
E. VALIDATION OF SEARCH EFFORTS
a. Must conduct a reasonable search, the cost and effectiveness of which meets the proportionality requirement of Rule 26(g).
b. Recall is more important than precision. Precision is only a factor if it is very low – in the case of a data dump.
c. Recall is difficult to measure. For many matters, it is neither feasible nor necessary to expend the effort to estimate recall with a margin of error of ± 5%, at a 95% confidence level— the standard required for scientific publication.
d. As per Grossman-Cormack – even a perfect review by an expert with a recall of 100% is unlikely to achieve measured recall of 70% if performed by a second expert.