top of page
  • Oct 13, 2019

Structured Analytics in Relativity allows you to run name normalization in order to identify aliases of email addresses and the groups those emails addresses are from.

Name normalization will parse aliases in email headers by semi-colons. It will look for names listed with email addresses in these familiar formats:

"Lincoln, Abraham" <abraham.lincoln@whitehouse.gov>

'Lincoln, Abraham' <abraham.lincoln@whitehouse.gov>

Lincoln, Abraham <abraham.lincoln@whitehouse.gov>

'Lincoln, Abraham' [abraham.lincoln@whitehouse.gov]

Lincoln, Abraham [abraham.lincoln@whitehouse.gov]

So, in any one of these examples, Relativity will associate three aliases with the same entity. E.g.,

1. "Lincoln, Abraham" <abraham.lincoln@whitehouse.gov>

2. Lincoln, Abraham

3. abraham.lincoln@whitehouse.gov

Relativity will join email addresses listing an identical name to the entities that have already been set up for different email addresses. So:

"Lincoln, Abraham" <abraham.lincoln@illinois.com>

. . . will be joined to the same entity as "Lincoln, Abraham" <abraham.lincoln@whitehouse.gov>.

Relativity will also perform segment matching to help associate email aliases with one another. Segment matching reviews emails sent on the same date, with the same body to see if email addresses in the header fields can be joined to the same entity.

Relativity recommends using a separate structure analytics set for name normalization.


 
 

When performing email threading in Relativity if the Analytics profile doesn’t specify email header fields and there is text present before the email headers of the most recent email, Relativity will identify that text as a reply and not correctly deterime the primary email. 


 
 

When creating a saved search for searchable set (as opposed to a training set) to be used with an analytics index in Relativity (whether a classification index for active learning, or a conceptual index for clustering or categorization) be sure to follow these guidelines:

1. Index only the 'authored' parts of documents - not system metadata.

2. If more than just the Extracted Text field is used as a column in a saved search, then try to use as few additional fields as possible. It may also be appropriate to include translated text.

3. Single choice, multiple choice, and multiple object fields should not be included.

4. Exclude zip files, system files, graphic and image files.

5. Excel spreadsheets which mainly consist of numbers should excluded.

Relativity analytics will group email addresses together in clusters if email to / from / cc fields are included in the searchable set's saved search. Words beginning with a number, for example 1st, are excluded from an analytics index.


 
 

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page