Technology Assisted Review 3/13

Evidence of How Bad Manual Document Review Is

Jan 28, 2021

The Tip of the Night for December 6, 2018 discussed a study by Herbert Roitblat, Anne Kershaw, and Patrick Oot on to what degree the results of manual document review performed by three teams of individuals corresponded. See, Document Categorization in Legal Electronic Discovery:Computer Classification vs. Manual Review, 61(1) J. Assoc. Inf. Sci. Technol. 70–80 (2010). A law review article published by Maura Grossman and Gordon Cormack (the well-known authors of a glossary on TAR terms, as mentioned in the Tip of the Night for June 4, 2015) includes a table which does a good job of illustrating the disparity between the content of each review teams' results.

See, Maura R. Grossman & Gordon V. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, 17 Rich. J.L. & Tech 11, 14 (2011). Available at: http://scholarship.richmond.edu/jolt/vol17/iss3/5 . The review set consisted of 5,000 documents. Team A's production and the original production only shared 16.3% of the total 'responsive' documents. Team B and the original production had a 15.8% overlap. Team A and Team B only had 28.1% of their relevant documents in common.

In the same article, the authors cite the results of a study by the researcher Ellen Voorhees of three teams of human document assessors on a set of more than 13,000 documents.

Id. at 13. Each team's results only corresponded by between 40-50%.

Good evidence to show that manual document review cannot be relied upon to locate all of the relevant documents in a data set.

EDRM TAR Guidelines

Feb 7, 2019

Last month the EDRM published its Technology Assisted Review (TAR) Guidelines which it developed with Duke Law School.

Here's a brief outline of the 50 page long guidelines:

I. Defining Technology Assisted Review

A. The TAR Process

1. Assemble a team: Service provider; Software provider; Workflow expert; Case manager; Lead attorney; Human reviewers

2. Collection and Analysis

a. Algorithm analyzes the relationship between words and characters.

3. Train the Computer to Predict Relevancy

a. Synthetic documents may be used.

b. TAR software may find documents to be used to classify others.

4. Quality Control and Testing

a. Identity relevant documents and then see if TAR finds them.

5. Training Completion and Validation

a. Rank documents according to relevancy.

b. TAR 1.0 - software trained on a random subset of relevant and non-relevant documents selected at the beginning and then used on the remaining unreviewed documents.

c. TAR 2.0 - From the outset, the software continuously analyzes the entire document collection and ranks the population based on relevancy. Human coding decisions are submitted to the software, the software re-ranks the documents,and then presents back to the human reviewer additional documents for review that it predicts as most likely relevant.

d. Review stops when:

i. a certain recall rate is reached

ii. when software only returns non-relevant documents.

iii. when a certain number of relevant documents has been found.

e. Recall - percentage of relevant documents found.

f. Precision - percentage of actually relevant documents in set determined by TAR to be relevant.

II. TAR Workflow

A. Foundational Concepts & Understandings

1. Algorithms

a. Feature extraction algorithms - document content identified and related to other documents.

b. Supervised machine learning algorithms - human reviewer trains software to recognize relevance.

B. TAR Workflow

1. Select the Service and Software Provider

a. Have they provided affidavits in support of their workflow for past cases?

b. Do they have an expert that can discuss TAR with the opposing parties or the Court?

c. Will rolling productions effect the workflow?

2. Identify, Analyze, and Prepare the TAR Set

a. Culling criteria based on file types; custodians; date ranges; and search terms.

c. Index is based not on native files but on extracted text.

3. Human Reviewer Prepares for Engaging in TAR

a. A team of 15 human reviewers may produce more accurate results than two lead attorneys.

b. Software may allow the use of more than one relevance topic tag.

4. Human Reviewer Trains Computer to Detect Relevancy, and the Computer Classifies the TAR Set

a. Training sets created by random sampling may have to be larger than those formed by other methods.

5. Implement Review Quality Control Measures

a. Decision Log - record of relevancy decisions.

b. Sampling - human reviewers' decisions checked by lead attorney.

c. Reports - where coders disagree on relevancy

6. Determine When Computer Training is Complete and Validate

a. Training Completion based on:

i. Sample-based Effectiveness Estimates

ii. Observing sparseness of relevant documents returned during active learning.

iii. Compare different predictive model behaviors.

iv. Compare TAR 1.0 and TAR 2.0 processes.

b. Validation

i. Consider Rule 26(b) proportionality considerations when setting target recall level.

7. Final Identification, Review and Production of the Predicted Relevant Set

a. Separate review of documents with only numbers or illegible text.

b. Address privilege, need for redaction, and other issues.

8. Workflow Issue Spotting

a. Extremely low or high richness may indicate TAR is not appropriate

III. Alternate Tasks for Applying TAR

A. Early Case Assessment

1. find ESI that is needed for closer review.

B. Prioritization for Review

C. Categorization by Issues

D. Privilege Review

E. Review of Incoming Productions

1. Especially data dumps.

F. Deposition and Trial Preparation

G. Information Governance and Data Disposition

1. Find data subject to retention policy.

2. Find data for defensible deletion.

3. Segregate PII.

IV. Factors to Consider When Deciding Whether or Not to Use TAR

A. Should the Legal Team Use TAR?

1. Are the documents themselves appropriate? TAR does not work well with:

a. Exports from structured databases.

b. Audio/video/image files

c. Hard copies with poor OCR.

2. Is the cost and use reasonable?

B. Cost of TAR vs. Linear Review

1. Document review usually consists of 60-70% of discovery costs.

2. QC and privilege review may still be expensive when TAR is used.

Later this year the EDRM will publish its best practices for technology assisted review which will explain in what situations it is best to use TAR.

LITIGATION SUPPORT TIP OF THE NIGHT

New tips for paralegals and litigation support profesionals are posted to this site each week. Click on the blog headings for better detail.

See How-To Videos on my YouTube channel.

Evidence of How Bad Manual Document Review Is

In the same article, the authors cite the results of a study by the researcher Ellen Voorhees of three teams of human document assessors on a set of more than 13,000 documents.

Id. at 13. Each team's results only corresponded by between 40-50%.

Good evidence to show that manual document review cannot be relied upon to locate all of the relevant documents in a data set.

Overturned Documents

Relativity keeps track of overturned documents - documents whose responsiveness status is overturned either on a second pass manual review, or by a manual coding decision made before an assisted review or TAR round is performed. Subsequent rounds of assisted review may lead to overturned documents.

A document reviewer may see a field for a responsive overturn on a document layout.

The Assisted Review console includes options to 'View Overturn Summary' and 'View Overturned Documents'.

Relativity can use only excerpted text from a document to find conceptual matches to a control document.

EDRM TAR Guidelines