top of page

Keep in mind that when you set a regular expression filter for a structured analytics set in Relativity, the regex filter will not be run against the extracted text as you can see it for a document in the Viewer. While the extracted text is displayed with line breaks and whitespace, this text is transformed when it is in the Analytics pipeline. The pipeline text uses the regex \r\n markers [return and newline] in place of line breaks, and will consolidate multiple blank spaces to a single space.


Extracted text may look like this:

. . . but the pipeline text will look like this:


There's just one long string of text in the analytics pipeline. So if you want to search for an email footer reading, "Under the General Data Protection Regulation GDPR 2016 679 we have a legal duty to protect any information we collect from you", accounting for varying GDPR sections, a regex filter for a structured analytics set . . .

. . . should be set like this:

. . . without the additional spaces before and after the relevant GDPR section.


If you want to filter out multiple disclaimers added to email footers, and Bates numbers from more than one party's production, you'll need to craft a single regex search which can account for all these targeted terms. No more than one regex filter can be applied to a structured analytics set.


When running the Cluster mass operation in Relativity you'll see an option labeled, 'Create Cluster Score Field'.


If this box is checked off, the Analytics engine will generate a field which will show each document object's coherence score for a given cluster. This score shows a document's coherence score within the cluster. The coherence score used in the field tree for a cluster indicates how conceptually similar all of the documents in a cluster are. The document object cluster coherence score shows how far the document is from the middle of the cluster.


A coherence score will be some value between 0.0 and 1.0, with 1.0 indicating the greatest possible similarity, and a score of 0.0 for the lowest end of coherence for a cluster.


The cluster score field will be named in this format, 'Cluster :: [ClusterName] :: Score'. So in this below example we can see a set of documents which have scores of between 0.0 and 1.0 in the field named, 'Cluster :: custodians :: Score'. The document, AZIPPER_0007291 is in the cluster named, '02 gas, spread, heat, charts' and then down in the subcluster, '2.2 energy, power, trading, curve' and the sub-subcluster, '2.2.1 energy, curve, futures, nat ', and it has a score of 0.64, indicating how well it represents the cluster's overall concept. The coherence score for the cluster is 0.53 which shows how conceptually similar all 517 documents in the cluster are.





When creating an analytics index in Relativity, keep in mind that the option for the email header filter will only be enabled, when the setting to 'Remove english signatures and footers' to set to 'No'. If you want to remove the email header info, you'll have to also select the option to remove email signatures to 'No'. You can choose to both keep signatures and footers, and have the email header filter either set to 'Yes' or 'No'.


The email header filter will remove the to, from, date, and reply indicator fields, but not the subject line.






Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page