Regular Expressions 13/18

RegEx Syntax

Apr 10, 2019

As discussed in the Tip of the Night for March 2, 2019, regular expressions can be used in repeated content filters to find disclaimers in the footers of email messages and other sentences which are repeated throughout a data set. These are removed from indexes in order to facilitate searching. Keep in mind that there are several different syntaxes of the regular expression language. Some of the most common are Perl; POSIX; the syntax used for the open source ViM ; and the regex syntax used with Python.

So for example while a Word boundary in a Perl, Python, and POSIX regex search is written as:

in a ViM regex search it is:

/ \> OR /string\>

We may be familiar with using the pipe character | as an OR operator in Regex searches, but in POSIX (BRE) and ViM it is: \|

Relativity uses the Java.util.regex.Pattern Java RegEx syntax which is similar to Perl. Be sure to check to check your Regex searches with an online tester such as https://www.regextester.com/ in order to confirm that they will find examples of what you hope to match.

PCRE stands for Perl Compatible Regular Expressions.

This is the fourth anniversary of Tip of the Night - Four Years of Tips - every single night!

Regex search for multiple strings, collecting complete line on which they appear

Aug 4, 2018

Here's a demonstration of a RegEx search run in the grep utility, Power Grep, that will collect the complete line of text that any one of multiple search terms appears on.

In this regex search the strings are separated with pipes "|" and strings of multiple words are enclosed with quotes.

^.*\b("Information Governance".*|Identification.*|Preservation|Collection.*)\b.*$

In PowerGrep set the Action Type to 'Collect Data'. Do not filter files and do not section files. The search type should be set to 'Regular Expression'.

In the Collect box enter '\0' to get the results of the search then %FILENAME% (preceded by a delimiter like a ~) so the names of the source files are included in the collected text.

Be sure to have line breaks between collected text, and save the results to a single file. Make it a .csv file.

The resulting file can be separated into two columns in Excel, and you'll be able to easily parse out the data.

Regex search for line of text containing multiple strings

Apr 5, 2018

You can use the following Regular Expression search to find individual lines of text which contain all of the designated words:

^(?=.*?\bzealous\b)(?=.*?\badvocates\b)(?=.*?\bloyalty\b).*$

This search will look for any line in a text file that contains the words, 'zealous', 'advocates', and 'loyalty'. Here's a demonstration using NotePad++.

LITIGATION SUPPORT TIP OF THE NIGHT

New tips for paralegals and litigation support profesionals are posted to this site each week. Click on the blog headings for better detail.

See How-To Videos on my YouTube channel.

RegEx Syntax

Regex search for multiple strings, collecting complete line on which they appear

Regex search for line of text containing multiple strings