Receiver Operating Characteristic Curve (ROC)

A ROC (Receiver Operating Characteristic) curve plots the True Positive Rate (the percentage of correct hits in the results as a percentage of total correct hits - True Positive / (True Positive + False Negative), or Recall - also known as Sensitivity) against the Fall Positive Rate (the percentage of non-responsive documents identified as such : 1- True Negative / (False Positive + True Negative) , or 1-Specificity) . [Precision is different and is defined as True Positive / (True Positive + False Positive).}

The graph below has two curves. The red curve shows documents which were not actually responsive, and the green curve shows those documents which are in fact responsive. The X axis is showing the percentage of probability that a document is responsive, and the Y axis show the document count.

A cut-off rate needs to be chosen - a rate of probability of responsiveness after which under the TAR model, it is decided to assume that a document is not responsive. So for example, with this model, the black line shows the cut-off rate of 50% - and we have 15 documents incorrectly judged to be responsive (false positives) and 2 documents which were responsive which were incorrectly determined to be non-responsive. This example shows that 242 documents have been reviewed in total, so the accuracy rate would be about 93%.

The graph showing the ROC Curve will have the False Positive Rate listed on the X axis and the True Positive Rate in listed on the Y axis. In our example the True Positive Rate is 115/(115 + 2), or 98.3% and the False Positive Rate is 88%, when the cut off rate is set at 50%. When we change the cut off rate to 80% the true positive rate becomes 75/(75+42), or 64%, and the false positive rate is 1-(125/(0+135). Base on these data points we can plot an ROC curve (shown in purple) that visualizes all of the possible thresholds.

When predictive coding software does a good job at separating responsive documents from non-responsive documents (i.e., it doesn't assign the same probability of mid-range (30 to 70% or whatever) responsiveness to similar numbers of responsive and non-responsive documents) the ROC curve will be in the upper left of the graph as shown in this example. If the software doesn't assign divergent responsiveness probabilities to responsive and non-responsive documents the ROC curve will be closer to an imaginary diagonal line drawn across the graph from the top right to the bottom left.