top of page

Optimizing a training set

When setting up an analytics index, the option to optimize the training set will perform the following operations:


1. remove conceptually irrelevant documents.

2. remove documents which are too long or too short to serve as good examples.

3. remove spreadsheets or documents which consist of predominately numeric data.

4. remove log files.

5. remove documents with text resulting from processing errors.


Word count, word uniqueness, punctuation, and words with a high character count are evaluated to determine what must be removed.

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page