Today I participated in a webinar hosted by ACEDS and conducted Thomas Gricks and Jermey Pickens of Catalyst, entitled, Just Say No to Family Batching in Technology-Assisted Review. Gricks and Pickens are the author, along with Andrew Bye, of a paper entitled, Break up the Family: Protocols for Efficient Recall-Oriented Retrieval Under Legally-Necessitated Dual Constraints. Gricks, et al. challenge the standard notion that since families of documents are produced together they should also be reviewed together. The authors advocate a 'broken family' review protocol, and 'dual phase workflow' with an initial expedited review for relevancy.
TAR algorithms will be more effective when trained with individual documents rather than complete families. Catalyst employed a continuous active learning protocol. A Full Family continuous active learning protocol will pull all documents in a family into the review queue irrespective of whether or not they are highly ranked. This is the approach favored by most attorneys.
In a Positive Family protocol any time one relevant document from a family is found to be relevant, any documents from the same family found to be non-relevant are not re-reviewed.
In an Individual Padded continuous active learning protocol, once a relevancy level is determined for any one document in a family, the rest of the documents in the family are added to the review queue.
Catalyst got results from eight different e-discovery projects. Its study shows how much additional review is needed to achieve recall rates of 75% or 90%. Positive Family and Individual Padded continuous active learning are shown to be clearly more efficient than Full Family review.
Phased continuous active learning involves reviewing documents first only for relevance, and removing the entire family from the queue when any one document has been determined to be relevant. In the second phase every document family with a relevant document is reviewed for both relevancy and privilege This approach is superior to both Full Family and Individual Padded continuous active learning.