Analytics

set regex filters in Relativity structured analytics sets for pipeline text

Mar 31, 2023

Keep in mind that when you set a regular expression filter for a structured analytics set in Relativity, the regex filter will not be run against the extracted text as you can see it for a document in the Viewer. While the extracted text is displayed with line breaks and whitespace, this text is transformed when it is in the Analytics pipeline. The pipeline text uses the regex \r\n markers [return and newline] in place of line breaks, and will consolidate multiple blank spaces to a single space.

Extracted text may look like this:

. . . but the pipeline text will look like this:

There's just one long string of text in the analytics pipeline. So if you want to search for an email footer reading, "Under the General Data Protection Regulation GDPR 2016 679 we have a legal duty to protect any information we collect from you", accounting for varying GDPR sections, a regex filter for a structured analytics set . . .

. . . should be set like this:

. . . without the additional spaces before and after the relevant GDPR section.

If you want to filter out multiple disclaimers added to email footers, and Bates numbers from more than one party's production, you'll need to craft a single regex search which can account for all these targeted terms. No more than one regex filter can be applied to a structured analytics set.

Distinguishing between cluster coherence scores and document object cluster scores

Dec 26, 2021

When running the Cluster mass operation in Relativity you'll see an option labeled, 'Create Cluster Score Field'.

If this box is checked off, the Analytics engine will generate a field which will show each document object's coherence score for a given cluster. This score shows a document's coherence score within the cluster. The coherence score used in the field tree for a cluster indicates how conceptually similar all of the documents in a cluster are. The document object cluster coherence score shows how far the document is from the middle of the cluster.

A coherence score will be some value between 0.0 and 1.0, with 1.0 indicating the greatest possible similarity, and a score of 0.0 for the lowest end of coherence for a cluster.

The cluster score field will be named in this format, 'Cluster :: [ClusterName] :: Score'. So in this below example we can see a set of documents which have scores of between 0.0 and 1.0 in the field named, 'Cluster :: custodians :: Score'. The document, AZIPPER_0007291 is in the cluster named, '02 gas, spread, heat, charts' and then down in the subcluster, '2.2 energy, power, trading, curve' and the sub-subcluster, '2.2.1 energy, curve, futures, nat ', and it has a score of 0.64, indicating how well it represents the cluster's overall concept. The coherence score for the cluster is 0.53 which shows how conceptually similar all 517 documents in the cluster are.

to filter out email headers, don't remove footers

Dec 13, 2021

When creating an analytics index in Relativity, keep in mind that the option for the email header filter will only be enabled, when the setting to 'Remove english signatures and footers' to set to 'No'. If you want to remove the email header info, you'll have to also select the option to remove email signatures to 'No'. You can choose to both keep signatures and footers, and have the email header filter either set to 'Yes' or 'No'.

The email header filter will remove the to, from, date, and reply indicator fields, but not the subject line.

LITIGATION SUPPORT TIP OF THE NIGHT

New tips for paralegals and litigation support profesionals are posted to this site each week. Click on the blog headings for better detail.

See How-To Videos on my YouTube channel.

set regex filters in Relativity structured analytics sets for pipeline text

Distinguishing between cluster coherence scores and document object cluster scores

to filter out email headers, don't remove footers