Litigation Support Tip of the Night

April 2, 2020

You can improve the performance of Relativity analytics operations by increasing the amount of RAM available for Java.  The Java heap is the amount of memory allocated to the Java Virtual Machine (JVM).  The JVM is a program that executes other programs written in Java. 

Relativity recommends that servers which are used for indexing and structured analytics allocate about 50% of available RAM to Java.  This setting should be increased to 75% if only structured analytics is being performed.  Only a third of total RAM should be assigned to the Java heap if a server is used only for indexing.  

A server will need an amount of RAM in bytes equal to 6000 times the number of documents involved in an index build.  So, a training set of a million documents for an index build will require 6 GB of RAM.   The same equation applies for calculating the Java heap size needed for a structured analytics set. 

October 25, 2019

Relativity has a mass operation application, "Set Long Text Field Size" which you can use to calculate the size of long text field.    A decimal field must be set up to hold the value that this mass op generates. 

The text size will be listed in kilobytes - KB.   This mass operation works like others - you need to check off the items you want to process, or check all items in a set.    It will be possible to edit the documents in this set for the text field size even if you only have read only permissions to the documents.   

A new form will appear.  On this form will be a drop down menu of the existing long text fields.    Another drop down menu will give you access to all decimal fields that have been set up.  

October 14, 2019

To run name normalization in Relativity to identify email aliases, follow these steps:

1. Under Indexing & Analytics, create a new Structured Analytics set. 

2. Enter a name and prefix for the set. 

3. The document set to analyze should include all emails in the workspace.  Documents with more than 30 MB of extracted text will be automatically excluded. 

4. Check off the box for the 'name normalization' operation.

5. A profile must be selected which has each of the email header fields mapped to it. 

6. Note that you can choose to set the radial dial for the email header fields to 'No' and just run name normalization on the extracted text. 

7.  Save the set.  In the console on the right, you'll have the option to run the structured analytics set.   You will be prompted to either run on the full set or just newly added documents.  In a new set, all documents will be analyzed no matter what option is selected at this point. 

8. Tonight it only took Relativity about four minutes to find more than 2900 email aliases in a set of about 1600 emails. 

October 14, 2019

Structured Analytics in Relativity allows you to run name normalization in order to identify aliases of email addresses and the groups those emails addresses are from.   

Name normalization will parse aliases in email headers by semi-colons.    It will look for names listed with email addresses in these familiar formats:

"Lincoln, Abraham" <abraham.lincoln@whitehouse.gov>

'Lincoln, Abraham' <abraham.lincoln@whitehouse.gov>

Lincoln, Abraham <abraham.lincoln@whitehouse.gov>

'Lincoln, Abraham' [abraham.lincoln@whitehouse.gov]

Lincoln, Abraham [abraham.lincoln@whitehouse.gov]

So, in any one of these examples, Relativity will associate three aliases with the same entity.  E.g., 

1. "Lincoln, Abraham" <abraham.lincoln@whitehouse.gov> 

2. Lincoln, Abraham

3. abraham.lincoln@whitehouse.gov

Relativity will join email addresses listing an identical name to the entities that have already been set up for different email addresses.   So:

"Lincoln, Abraham" <abraham.lincoln@illinois.com>

. . . will be joined to the same entity as "Lincoln, Abraham" <abraham.lincoln@whitehouse.gov>. 

Relativity will also perform segment matching to help associate email aliases with one another.   Segment matching reviews emails sent on the same date, with the same body to see if email addresses in the header fields can be joined to the same entity. 

Relativity recommends using a separate structure analytics set for name normalization.   

October 13, 2019

When performing email threading in Relativity if the Analytics profile doesn’t specify email header fields and there is text present before the email headers of the most recent email, Relativity will identify that text as a reply and not correctly deterime the primary email. 

October 12, 2019

When creating a saved search for searchable set (as opposed to a training set) to be used with an analytics index in Relativity (whether a classification index for active learning, or a conceptual index for clustering or categorization) be sure to follow these guidelines:

1.  Index only the 'authored' parts of documents - not system metadata. 

2. If more than just the Extracted Text field is used as a column in a saved search, then try to use as few additional fields as possible.  It may also be appropriate to include translated text. 

3. Single choice, multiple choice, and multiple object fields should not be included.

4. Exclude zip files, system files, graphic and image files. 

5. Excel spreadsheets which mainly consist of numbers should excluded. 

Relativity analytics will group email addresses together in clusters if email to / from / cc fields are included in the searchable set's saved search.    Words beginning with a number, for example 1st, are excluded from an analytics index.  

August 10, 2019

Relativity recommends that its analytics tools be used in particular ways in different document review scenarios.

1. Document Review with Time and Subject Matter Constraints

Relativity suggests using clustering in a situation where there are a large number of documents (more than 40K); little time to conduct document review; and no subject matter expert.   Follow these basic steps:

A. Batch documents group in clusters and assign them to reviewers, having each reviewer work on documents from a single cluster.  

B. Bulk code clusters of documents. 

C. Eliminate clusters of clearly irrelevant documents - junk emails, etc.

2. Finding Hot Documents

 If a client's production has been fully reviewed, issue coded, and hot documents have been flagged, Relativity recommends using categorization to find hot documents in an opposing production quickly. 

A. Create a categorization set.   Use the issue field for the client production to generate categories and select example documents.

B. Set the synchronization option for the categorization set. 

C. Use the categorization set to categorize the opposing production - Synchronization will automatically create categories based on the issue field choices and automatically designate example records.  

D. Opposing production documents similar to those in the examples will be automatically grouped together. 

3. Finding Privileged Documents

You can use Analytics to find privileged documents prior to production, if you've already located some privileged documents and designated some documents as responsive. 

A. Set the view created for the privilege log, and then right click to select 'Find Similar Documents' in the document viewer.

B. Set the Privilege field to 'Not Set' and filter, when reviewing the similar documents. 

4. Finding Unknown Relevant Terms

Keyword expansion can be used to find unknown relevant terms, if other keywords for a document set have already been determined. 

A. Right click and select Keyword Expansion in the extracted text for a record in the document viewer to find conceptually related terms.

B. On the search panel select the Index Search as the condition, and then select an analytics index.  Enter one or more search terms and then click Expand to show a list of keywords which will each be assigned a rank value. 

The terms shown in the Conceptual Keyword Expansion dialog are hyperlinked and can be clicked on to run searches. 

August 4, 2019

Bear in mind that if you have 'Cluster' available in the mass operations menu in Relativity, it will not function if the admin has not enabled queries for at least one analytics index. 

If this is not the case, after selecting a subfolder, (or the top level folder in the browser for all the documents in the workspace) and choosing Cluster, Relativity will present you with a notice reading, "There are currently no indexes with queries enabled available for clustering". 

The analytics index must be edited and rerun with the queries enabled.   Refer to the console at the right when editing an individual index in the Indexing & Analytics . . . Analytics Index tab. 

 Queries are enabled after the index is populated and built, but before it is activated.   

May 20, 2019

In Relativity you can use the clustering conceptual analytics tool to create groups of conceptually similar documents without the need to define categories, or select a set of training documents.   Relativity will position documents in a conceptual index and then use a naming algorithm to label each node in groups of clusters.   Clustering is a good way to get an overview of a new data set. 

1. First go to the Documents tab and select the set you want to cluster.   A clustering operation can be run on a saved search, a folder, or the complete workspace.  Individual documents can be checked off.  

2. Choose 'Cluster' from the mass operations drop down menu at the bottom of the screen.  

3. You'll have the option to create a new cluster, or replace an existing cluster.  If you choose to create a new cluster, you will be prompted to name the cluster and select an index.  The index must have queries enabled, and all of the documents to be cluster must be covered by the index.  [Any documents that are not will be put in a cluster named 'Not Clustered'.  Documents without any searchable text will be placed in a separate group named, 'UNCLUSTERED'.  All documents in the workspace which are not submitted for clustering will be in a cluster named, 'Not set'.]

4. In the Title Format field, select one of the three options:

    a. Outline and Title - shows a number, title, document count in the cluster, and a coherence score. 

    b. Outline Only - number, document count in the cluster, and a coherence score. 

    c. Title Only - title, document count in the cluster, and a coherence score. 

The title will be limited to four words. 

5. Maximum Hierarchy Depth - this setting is for the number of cluster levels - between 1 and 5.  The default is 3.   When this value is greater than one, no more than 16 top level clusters will be created. 

6. Minimum Coherence - The lower the coherence value, the more loosely related the documents in a cluster will be.  When analytics finds documents below the coherence score, it will create subclusters.  The default setting is 0.7.  

7. Generality - determines the specificity of clusters at each level.   It should be set to a value between 0 and 1, 0.5 being the default.   A lower generality value will create 'tighter' and more numerous clusters.

8. The option for 'Create Cluster Score Field' will create a field storing a coherence score for each document.  If this option is checked, the operation will take significantly longer to complete.   

9. The cluster is created in the browser on the left.  Click on the asterisk icon.   

The numbers used for each cluster and subcluster show the total number of documents in a cluster (including its subclusters), followed by a second number listing the coherence score.   So, in this example we can see that the subcluster '6.1.2 party, letter, collateral, paragraph' has 28 documents with a coherence of .91 - a group of very conceptually similar documents.   

April 24, 2019

To set up an optimized index in a Relativity workspace follow these steps:

1. Create a saved search for files between 0 and 30 MB.

2.  Display only the extracted text field. 

3. Under Indexing & Analytics . . . Structured Analytics, create a new structured analytics set.    Enter a name and set prefix, select the saved search, and run a repeated content operation.    Keep the default settings in the 'Repeated Content Identification' section, except for the setting for 'Minimum Number of Occurrences'.  Enter a value equal to 0.5% of the documents in the saved search.    So here we'll look for segments of between 10-100 words on 4 lines or less, 16 lines from the bottom, which appear more than 7 times.  (.005 times 1446).    [A different approach should be followed for a saved search of more than 100,000 documents.]

4. Click 'Run Structured Analytics' in the console.  Make the appropriate selection if you are supplementing the search with new documents. 

5. View the results of the repeated content operation.

6. Make note of the resulting text blocks which contain boilerplate language or non-authored content.  

7. Go to Indexing & Analytics . . . Analytics Indexes, and create a new index.   Use the saved search as both the training set and the searchable set. 

8.  Optimize the training set (to take out documents with only numbers, bad OCR etc.), remove English signatures and footers, and enable the email header filter.  

9. Add selected repeated content filters to the index. 

10. Finally click Populate Index: Full on the console, to populate and build the index. 

Please reload

Please reload

Sean O'Shea has more than 15 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

 

All content provided on this blog is for informational purposes only. The owner of this blog makes no representations as to the accuracy or completeness of any information on this site or found by following any link on this site. The owner will not be liable for any errors or omissions in this information nor for the availability of this information. The owner will not be liable for any losses, injuries, or damages from the display or use of this information.

 

This policy is subject to change at any time.