Technology Assisted Review 11/13

Important S.D.N.Y. Ruling on Ordering the Use of TAR

Aug 17, 2016

This month, Magistrate Judge Andrew Peck of the S.D.N.Y. issued an important decision on the use of Technology Assisted Review, or predictive coding. In Hyles v. New York City, 10 Civ. 3119 (S.D.N.Y.), Judge Peck denied the plaintiff's application to force the defendant to use TAR even though he acknowledged that, "TAR is cheaper, more efficient and superior to keyword searching". As previously noted on this blog, Judge Peck issued Da Silva Moore v. Publicis Groupe & MSL Group, one of the first judicial decisions to approve the use of TAR. However in this recent decision, Judge Peck rules that Sedona Principle 6, which states that parties are best able to judge for themselves which technologies are best for the preservation and production of ESI, should supersede the obvious advantages of TAR.

It should be noted that this is an employment discrimination suit. The Review was staged, beginning first with only 9 custodians, and then if necessary, only expanding to an additional 6. Presumably very large amounts of ESI are not involved.

Judge Peck closed his decision by noting that it's possible as TAR becomes more widely used, it will be unreasonable for a party to decline to use it in the electronic discovery process.

Receiver Operating Characteristic Curve (ROC)

Jun 13, 2016

A ROC (Receiver Operating Characteristic) curve plots the True Positive Rate (the percentage of correct hits in the results as a percentage of total correct hits - True Positive / (True Positive + False Negative), or Recall - also known as Sensitivity) against the Fall Positive Rate (the percentage of non-responsive documents identified as such : 1- True Negative / (False Positive + True Negative) , or 1-Specificity) . [Precision is different and is defined as True Positive / (True Positive + False Positive).}

The graph below has two curves. The red curve shows documents which were not actually responsive, and the green curve shows those documents which are in fact responsive. The X axis is showing the percentage of probability that a document is responsive, and the Y axis show the document count.

A cut-off rate needs to be chosen - a rate of probability of responsiveness after which under the TAR model, it is decided to assume that a document is not responsive. So for example, with this model, the black line shows the cut-off rate of 50% - and we have 15 documents incorrectly judged to be responsive (false positives) and 2 documents which were responsive which were incorrectly determined to be non-responsive. This example shows that 242 documents have been reviewed in total, so the accuracy rate would be about 93%.

The graph showing the ROC Curve will have the False Positive Rate listed on the X axis and the True Positive Rate in listed on the Y axis. In our example the True Positive Rate is 115/(115 + 2), or 98.3% and the False Positive Rate is 88%, when the cut off rate is set at 50%. When we change the cut off rate to 80% the true positive rate becomes 75/(75+42), or 64%, and the false positive rate is 1-(125/(0+135). Base on these data points we can plot an ROC curve (shown in purple) that visualizes all of the possible thresholds.

When predictive coding software does a good job at separating responsive documents from non-responsive documents (i.e., it doesn't assign the same probability of mid-range (30 to 70% or whatever) responsiveness to similar numbers of responsive and non-responsive documents) the ROC curve will be in the upper left of the graph as shown in this example. If the software doesn't assign divergent responsiveness probabilities to responsive and non-responsive documents the ROC curve will be closer to an imaginary diagonal line drawn across the graph from the top right to the bottom left.

F 0.5 and F2 Scores

Jun 12, 2016

As explained in last night's tip, an F-score is a combined measure of precision and recall. An F1 score gives equal weight to precision and recall. A version of the equation, which allows different weights to be assigned to precision or recall would be expressed this way:

Fß = (1+ß²) Precision * Recall

(ß² Precision) + Recall

The beta symbol, ß , is used in mathematics to indicate when a variable can be entered. The term F2 score will be used when twice the weight is given to recall as opposed to precision. When giving twice as much weight to precision, an F 0.5 score is used.

In Excel we can use these formulas to calculate these two types of equations:

F 0.5 score Excel =((1.25)*((A9*B9)/((0.25*A9)+B9)))

F2 score Excel =((5)*((A9*B9)/((4*A9)+B9)))

. . . and a formula for the general Fß equation, allowing the user to grant varying weights to precision or recall would be:

=((1+(C9^2))*((A9*B9)/((C9^2*A9)+B9)))

. . . where column C gives the ß value.

See a demonstration of these formulas in an Excel spreadsheet on my YouTube channel:

LITIGATION SUPPORT TIP OF THE NIGHT

New tips for paralegals and litigation support profesionals are posted to this site each week. Click on the blog headings for better detail.

See How-To Videos on my YouTube channel.

Important S.D.N.Y. Ruling on Ordering the Use of TAR

Receiver Operating Characteristic Curve (ROC)

F 0.5 and F2 Scores