top of page
  • Jun 11, 2016

A prior Tip of the Night discussed the definition of precision and recall. An F1 score calculates the accuracy of a search by showing a weighted average of the precision (the percentage of responsive documents in your search results, as opposed to non-responsive documents) and the recall (the percentage of total responsive documents that show up in the results) scores. The F1 formula is calculated this way:

F1 Score = 2 * (Precision * Recall)

(Precision + Recall)

So if you recall all of the responsive documents, and no non-responsive documents, the F1 score would be 1. Search results containing only non-responsive documents would get an F1 score of zero. The F1 formula weights precision and recall equally.

When the precision percentage is listed in column A, and the recall percentages is give in column B, you can use this Excel formula to calculate the F1 score.

=2*((A2*B2)/(A2+B2))


 
 

This past February, the Chancery Division of the High Court of England and Wales approved the use of predictive coding in the United Kingdom for the first time in Pyrrho Investments, Ltd. v. MWB Property, Ltd. . The High Court made its decision with respect to a matter in which the 'e-disclosure' involved the review of more than 3 million electronic files recovered from back-up tapes - culled down from an original set of 17.6 million files. Predictive coding was approved based on the following 10 factors:

1. Other jurisdictions showed that predictive coding can be useful in the right kind of cases.

2. Evidence shows that predictive coding is just as accurate, if not more so, than manual review.

3. The use of a single, more experienced lawyer to create the seed set leads to more consistent results.

4. The Civil Procedure Rules of England don't prohibit the use of predictive coding.

5. A high volume of electronic documents must be reviewed.

6. Manual review of the data set would be very expensive, and hence unreasonable where an automated alternative exists.

7. Whether manual review must still be carried out after the predictive coding process has been run.

8. The amount at issue was high enough to justify the cost of the predictive coding software review.

9. If the predictive coding software yielded poor results, the parties would still have time to use other methods of review.

10. Whether or not the parties agreed on the use of predictive coding.


 
 

Yesterday I attended a discussion on "Defending and Defeating TAR" in New York at the 2016 ACEDS eDiscovery Conference. The conversation was moderated by retired Judge Ronald Hedges (formerly a magistrate judge with District of New Jersey) and included Gina Sansome, an eDiscovery counsel at the firm, Axinn, Veltrop & Harkrider LLP; Adam Strayer a specialist in Technology Assisted Review at BDO; and Bill Speros a consultant on evidence management. My notes posted here are somewhat discursive because that was the nature of the wide ranging discussion. I have tried to provide detail on some references made by the participants that many (including myself at first) may be unfamiliar with.

Judge Hedges began the conversation by noting that the FRCP and the FRE don't address TAR, and that parties are the driving forces in how TAR is used. But he noted the importance of Model Rule of Professional Conduct 5.3 with respect to TAR.

The rule provides that when supervising nonlawyers, attorneys must make "reasonable efforts to ensure that the person's conduct is compatible with the professional obligations of the lawyer" . The onus is on lawyers to ensure that Technology Assisted Review is defensible. Judge Hedges admonished that the audience that if attorneys are wise they will work out agreements with respect to TAR among themselves and not involve the judge. He referred to the decision in Progressive Cas. Ins. Co. v. Delaney, No. 2:11-cv-00678-LRH-PAL (D. Nev. July 18, 2014) where great expense did not relieve a party from running keyword searches agreed to in an ESI protocol, where it sought to use predictive coding instead; Chief Justice John Roberts' year end report on the federal judiciary emphasizing the need for cooperation among parties in agreeing to creative solutions to dealing with increasing amounts of ESI and reducing the amount of civil litigation; and the cooperation requirements of FRCP 26(f), 26(c); and 37(a). There is an overall trend to get parties to agree on ways to deal with the review of ESI, and stick with their agreements.

Ms Sansome emphasized the need to understand how TAR works to achieve certain results - lawyers must understand the terminology for TAR (see the tip of the Night for June 4, 2015) and investigate by asking questions such as inquiring into whether or not seed set size is adequate. Attorneys need an understanding of a data set so they can see where new workflows are needed.

The nature of particular cases may determine if TAR is an appropriate approach for Review. They should assess their clients' tolerance for transparency, since the exchange of seeds sets has become standard practice. She mentioned the decision in In re Takata Airbag Prods. Liab. Litig., MDL No. 2599 (S.D. Fla. March 1, 2016) which held that the amended version of FRCP 26(b) allow parties to withhold or redact nonresponsive information.

Speros said that the disclosure of seed sets could often give a false set of comfort to the opposing party. He noted that clients often have a problem distinguishing upstream from downstream costs - TAR is usually charged for based on the number of GBs that are reviewed - the client incurs upstream costs in the processing, and saves on the downstream (in the production phase). He noted that there are regulations which encourage the use of TAR on clean data sets, rather than those from which subsets have been culled through the use of search terms.

Judge Hedges asked the panel if they thought that an expert on TAR would qualify as a Daubert witness. Strayer from BDO felt that this would be true with respect to the statistics of TAR but not to the workflow. Speros observed that Judge Peck n Da Silva Moore v. Publicis Groupe, 868 F. Supp. 2d 137 (S.D.N.Y. 2012) dismissed the idea of a Daubert TAR expert, but Judge Waxse of the U.S. District Court for the District of Kansas endorsed the idea in a law review article. Judge Hedges said he thought Judge Peck was wrong, and Judge Waxse was right.

Extending the conversation further, Strayer noted that the subject matter experts used for TAR are not necessarily those that know the issues in a case very well, or the documentation from the case very well. He considered the importance of the example documents presented to reviewers - should they be given sample documents that are only relevant to a particular issue, or sample documents that are only partially relevant?

Judge Hedges then discussed the split in the Circuit Courts on whether or not costs can be awarded for copying fees. 28 U.S.C. 1920 contains a cost shifting provision that allows for copying and 'exemplification' fees to be awarded to a winning party. Race Tires Am., Inc. v. Hoosier Racing Tire Corp., 674 F.3d 158 (3d Cir. 2012) reduced a lower court award under 28 U.S.C. 1920 of ESI vendor costs limiting the definition of 'making copies' to the conversion of native files to TIFF images; the scanning of hard copies; and the digitization of VHS tapes. However courts in Northern District of California have not followed Race Tires and have construed 28 U.S.C. 1920 more broadly to include various electronic discovery fees. Judge Hedges thought that it would be unlikely the the Supreme Court would resolve the differences among the federal courts.

Speros noted that the nature of TAR is that it provides similarity rankings for documents - not relevancy rankings. He cautioned against measuring the results of TAR through recall and random sampling, and noted that the distinction between responsive and non-responsive documents doesn't recognize the qualitative differences between aggregates. Random samples can pull in the wrong units of interests.

The below slide from Speros is particularly instructive on how there can be different kinds of responsive and non-responsive documents.

A member of the audience asked whether or not it would be appropriate to put a TAR protocol in a discovery request. Judge Hedges and Ms. Sansome thought this was an interesting idea and didn't see a reason why it could not be done.

Judge Hedges closed the conversation by asking the group about how the Rules of Professional Conduct 5.1 and 5.3 require lawyers to supervise the review of multiple TBs of data. The panel responded that it was necessary to take a hands on approach to the documents and periodically sample them. FRCP 26(g) and FRCP 11 necessitate a good faith attempt at certifying the reliability of the Review process.


 
 

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page