top of page
  • Apr 10, 2019

As discussed in the Tip of the Night for March 2, 2019, regular expressions can be used in repeated content filters to find disclaimers in the footers of email messages and other sentences which are repeated throughout a data set. These are removed from indexes in order to facilitate searching. Keep in mind that there are several different syntaxes of the regular expression language. Some of the most common are Perl; POSIX; the syntax used for the open source ViM ; and the regex syntax used with Python.

So for example while a Word boundary in a Perl, Python, and POSIX regex search is written as:

\b

in a ViM regex search it is:

/ \> OR /string\>

We may be familiar with using the pipe character | as an OR operator in Regex searches, but in POSIX (BRE) and ViM it is: \|

Relativity uses the Java.util.regex.Pattern Java RegEx syntax which is similar to Perl. Be sure to check to check your Regex searches with an online tester such as https://www.regextester.com/ in order to confirm that they will find examples of what you hope to match.

PCRE stands for Perl Compatible Regular Expressions.

This is the fourth anniversary of Tip of the Night - Four Years of Tips - every single night!


 
 

Here's a demonstration of a RegEx search run in the grep utility, Power Grep, that will collect the complete line of text that any one of multiple search terms appears on.

In this regex search the strings are separated with pipes "|" and strings of multiple words are enclosed with quotes.

^.*\b("Information Governance".*|Identification.*|Preservation|Collection.*)\b.*$

In PowerGrep set the Action Type to 'Collect Data'. Do not filter files and do not section files. The search type should be set to 'Regular Expression'.

In the Collect box enter '\0' to get the results of the search then %FILENAME% (preceded by a delimiter like a ~) so the names of the source files are included in the collected text.

Be sure to have line breaks between collected text, and save the results to a single file. Make it a .csv file.

The resulting file can be separated into two columns in Excel, and you'll be able to easily parse out the data.


 
 

You can use the following Regular Expression search to find individual lines of text which contain all of the designated words:

^(?=.*?\bzealous\b)(?=.*?\badvocates\b)(?=.*?\bloyalty\b).*$

This search will look for any line in a text file that contains the words, 'zealous', 'advocates', and 'loyalty'. Here's a demonstration using NotePad++.


 
 

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page