To set up an optimized index in a Relativity workspace follow these steps:
1. Create a saved search for files between 0 and 30 MB.
2. Display only the extracted text field.
3. Under Indexing & Analytics . . . Structured Analytics, create a new structured analytics set. Enter a name and set prefix, select the saved search, and run a repeated content operation. Keep the default settings in the 'Repeated Content Identification' section, except for the setting for 'Minimum Number of Occurrences'. Enter a value equal to 0.5% of the documents in the saved search. So here we'll look for segments of between 10-100 words on 4 lines or less, 16 lines from the bottom, which appear more than 7 times. (.005 times 1446). [A different approach should be followed for a saved search of more than 100,000 documents.]
4. Click 'Run Structured Analytics' in the console. Make the appropriate selection if you are supplementing the search with new documents.
5. View the results of the repeated content operation.
6. Make note of the resulting text blocks which contain boilerplate language or non-authored content.
7. Go to Indexing & Analytics . . . Analytics Indexes, and create a new index. Use the saved search as both the training set and the searchable set.
8. Optimize the training set (to take out documents with only numbers, bad OCR etc.), remove English signatures and footers, and enable the email header filter.
9. Add selected repeated content filters to the index.
10. Finally click Populate Index: Full on the console, to populate and build the index.