*Receiver Operating Characteristic Curve (ROC)*

*Receiver Operating Characteristic Curve (ROC)*

June 13, 2016

A ROC (Receiver Operating Characteristic) curve plots theÂ True Positive Rate (the percentage of correct hits in the results as a percentage of total correct hits - True Positive / (True Positive + False Negative), or Recall - also known as Sensitivity) against the Fall Positive Rate (the percentage of non-responsive documents identified as such : 1- Â True Negative / (False Positive + True Negative) , or 1-Specificity) . Â [Precision is different and is defined as True Positive / (True Positive + False Positive).} Â Â

Â

Â The graph below has two curves. Â The red curve shows documents which were not actually responsive, and the green curve shows those documents which are in fact responsive. Â Â The X axis is showing the percentage of probability that a document is responsive, and the Y axis show the document count.Â

Â

Â

Â

A cut-off rate needs to be chosen - a rate of probability of responsiveness after which under the TAR model, it is decided to assume that a document is not responsive. Â So for example, with this model, the black line shows the cut-off rate of 50% - and we have 15 documents incorrectly judged to be responsive (false positives) and 2 documents which were responsive which were incorrectly determined to be non-responsive. Â Â This example shows that 242 documents have been reviewed in total, so the accuracy rate would be about 93%. Â Â

Â

The graph showing the ROC Curve will have the False Positive Rate listed on the X axis and the True Positive Rate in listed on the Y axis. Â Â In our example the True Positive Rate is 115/(115 + 2), or 98.3% and the False Positive Rate is 88%, when the cut off rate is set at 50%. Â When we change the cut off rate to 80% the true positive rate becomes 75/(75+42), or 64%, and the false positive rate is 1-(125/(0+135). Â Â Base on these data points we can plot an ROC curve (shown in purple)Â that visualizes all of the possible thresholds. Â Â

Â

When predictive coding software does a good job at separating responsive documents from non-responsive documents (i.e., it doesn't assign the same probability of mid-range (30 to 70% or whatever)Â responsiveness to similar numbers of responsive and non-responsive documents) the ROC curve will be in the upper left of the graph as shown in this example. Â Â If the software doesn't assign divergent responsiveness probabilities to responsive and non-responsive documents the ROC curve will be closer to an imaginary diagonal line drawn across the graph from the top right to the bottom left.Â

Â