Litigation Support Tip of the Night

July 2, 2020

Relativity has an annoying habit of appending numbers to the beginning of PDFs that you download from a workspace.  So, files can be named with either a 1, 2, or 3 digit number followed by an underscore, and the document identifier or field selected to name the file.  E.g.:

7_ABC002276411

17_IBM080414121

134_GE0005425444

Using Bulk Rename Utility you can use this regular expression search to remove the numbering, so the files are only named with the Bates numbers:

([0-9]{1}_|[0-9]{2}_|[0-9]{3}_)(.*)

In the replace box simply enter:

\2

The search for the numbers with the underscore are separated in parentheses from the second part also in parentheses which searches for any number of characters. 

June 11, 2020

Large Relativity databases can either be cached in active memory, or bring up records with high I/O throughput (data transfer speed in megabytes per second) from storage.   A SQL Server table for a Relativity workspace will take up approximately 50 GB of disk space for a million documents.   The SQL Server has RAM limits - the 2016 edition is limited to 128 GB of RAM.  Caching the database in active memory will lead to be better performance.  This is one of the key reasons why an admin may separate productions from a single case into multiple workspaces. 

Hardware called Fast Track Data Warehouse can be used to provide the necessary I/O throughput if an entire dataset cannot be cached, but it is expensive and difficult to implement.  It is also possible to store SQL's TempDB system on solid state or flash drives. TempDB is the system database of SQL Server which stores tables and indexes. 

May 12, 2020

Relativity Data Transfer uses integration points to transfer saved searches; folders; and production from a Relativity workspace to another workspace or a load file.  Integration Points also substitute for Relativity Desktop Client in importing load files or Office 365 directories into a workspace. 

More than one instance of Relativity can be accessed with an integration point.   It allows individual fields from the source and destination to be mapped to one another. 

May 3, 2020

In a recent Relativity webinar, it was contended that when performing clustering on a saved search, a generality setting of 0.5 creates 8 top-level clusters in the example data set.   The instructor noted that this setting is not guaranteed to generate 8 on all data sets.  Generality at 0.9 creates 4 top-level clusters in the same data set.  

In a very general way, this seems to borne out in my own Relativity sandbox workspace.   Clustering a saved search at 0.5 generality  . . . 

. . . doesn't create 8 top level clusters in my data set, but it does create six large top level clusters, plus several other top clusters which might be grouped together to form two additional top level clusters of similar size. 

A generality setting of .9 . . . .

 . .. . won't create 4 top level clusters in my completely different data set:

 . . . but it does create four top level clusters clearly larger than the others. 

It's a general rule of thumb but not an entirely unuseful one, however my test clusters seem to refute the general rule that high generality settings will lead to fewer top-level clusters.

April 21, 2020

Here's a follow-up to the Tip of the Night for April 19, 2020

After the alphabet file of a dtSearch index is updated, removing a symbol from the ignore subsection and adding it in the Letters section, you can search for many symbols without running a Regex search.   All of these symbols can be directly searched for:

$     @     \     /     &     +     ,     .     ;     -     '     `     !     <     >     {     }     ^     _     {     }     |

These characters can be searched for with a simplified Regex search that does not reference the unicode value. 

?     *     (     )     #    =

For example: "##why\?"

The unicode value need only be referenced for these characters:

"     %     &     :     ~

Some  symbols are not part of the ASCII code range.  Two common examples are the UK pound and section symbols: 

£    §

A special edit must be made to the alphabet file of a dtSearch index for these symbols.  A new AdditionalLetters subsection must be created and the Unicode value should be listed for each symbol.  

It will then be possible to search for these symbols directly. 

April 19, 2020

You can modify a dtSearch index to make it possible to search for symbols in a Relativity workspace.  

1. Under Index Admin, select Search Indexes.  Edit a dtSearch index. 

2. Scroll down to the Alphabet section and below where is says "[Letters] // Original letter, lower case, upper case, unaccented" enter the symbol you want to search for four times preceded by a space each time.  Begin with a leading blank space but do not end with a blank space.   For example: % % % %

3. Scroll down within the alphabet section and under the [Ignore] subsection remove the symbol:

. . . some symbols may be in the [Spaces] or [Hyphens] subsections. 

4. Rebuild the dtSearch index. 

If you're searching for a percent symbol % you need to run a RegEx search in the dtSearch index using this format:

"##75\u0025"

. . . the u0025 signifies the unicode character for the percent symbol.  %

April 17, 2020

Don't miss that in Relativity you can run a dtSearch which will exclude documents where a search term appears in a longer phrase.  Use the NOT operator together with the proximity operator set to zero.

Simply follow the search term with:  NOT w/0 

. . . and then enter the phrase or phrases you don't want to appear in the search results. 

This search will turn up documents where 'confidential' appears in the body of an email or document, but not in a disclaimer frequently found at the bottom of an email. 

April 2, 2020

You can improve the performance of Relativity analytics operations by increasing the amount of RAM available for Java.  The Java heap is the amount of memory allocated to the Java Virtual Machine (JVM).  The JVM is a program that executes other programs written in Java. 

Relativity recommends that servers which are used for indexing and structured analytics allocate about 50% of available RAM to Java.  This setting should be increased to 75% if only structured analytics is being performed.  Only a third of total RAM should be assigned to the Java heap if a server is used only for indexing.  

A server will need an amount of RAM in bytes equal to 6000 times the number of documents involved in an index build.  So, a training set of a million documents for an index build will require 6 GB of RAM.   The same equation applies for calculating the Java heap size needed for a structured analytics set. 

March 5, 2020

Relativity Collect is an application for RelativityOne, which can assist with the collection of email and documents from Office 365.  When collecting data directly from OneDrive or Outlook, the admin must first set a custodian. 

At the second stage search criteria are designated.  

 While wildcard and proximity searches cannot be run, any words which begin with a keyword will be returned in the results.  For example, the keyword 'court' will return files using 'courthouse' or 'courtroom'.  Multiple criteria can be used.  The Office 365 index of electronic files is searched rather than the files themselves.  

Searches in OneDrive can be set for specific file extensions and files created in a particular date range.   Searches can be set on the file name and file path fields as well.

The standard email metadata fields can also be searched, as well as the body of the email.  The attachment content cannot be searched, but there is a setting to return emails with or without attachments. 

February 7, 2020

Relativity announced this month that it expects to achieve the FedRAMP cloud security authorization later this year.   As discussed in the Tip of the Night for April 18, 2018,  the FedRAMP program provides a standardized approach to security and risk assessment. 

A new Relativity plaform, RelativityOne Government was designed for FedRAMP certification.  

Please reload

Please reload

Sean O'Shea has more than 15 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

 

All content provided on this blog is for informational purposes only. The owner of this blog makes no representations as to the accuracy or completeness of any information on this site or found by following any link on this site. The owner will not be liable for any errors or omissions in this information nor for the availability of this information. The owner will not be liable for any losses, injuries, or damages from the display or use of this information.

 

This policy is subject to change at any time.