top of page

You can run a regular expression search in a text file for any lines that do NOT begin with a particular string. In this example, we want to find any lines that do not begin with the string 'ins'. Structure your search this way:


^(?:(?!ins).)*$



ree

The exclamation point looks for lines that do not begin with the string which follows.


The caret marks the beginning of a string.


This part: (?: indicates that the Regex search should not capture what follows.




 
 

Regular expression searches can be designed to find both a specific alphanumeric pattern, and a given number of words before and after that pattern.


In this example we use the following regex pattern to look for dates in a text file:


(effective as of )(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\s+(\d{1,2})\,\s+(\d{4})


See the Tip of the Night for June 5, 2020 for an explanation of how this regex pattern works. The search finds dates preceded by the phrase, 'effective as of':


ree

It would be helpful to edit the regex search so it collects the data for each SEC form filing, including the description, form number, and filing date / period end date. We can modify it this way so the search includes six words before the searched for phrase and date, and five words afterwards:


((?:\S+\s*){0,6}effective as of )(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\s+(\d{1,2})\,\s+(\d{4}(?:\S+\s*){0,5})


The regular expression syntax added to the beginning and end, searches for both whitespace '\s' and non-whitespace '\S'. The second number in the curly brackets sets the number of words or digits before or after the regex pattern in between that is to be matched.



ree






 
 

AstroGrep is an open source grep utility, that you can use to run regular expression searches on multiple text files. It's a good alternative to an advanced grep utility such as PowerGrep (see the Tip of the Night for August 4, 2018) but it has some limitations.


AstroGrep has a simpler layout than PowerGrep, and may be easier to use for beginners. Simply select the search path to the folder containing the files you want to review, and then enter the search or regular expression search in the 'Search Text' box. [Be sure to select the 'Regular Expressions' check box.] I tested this evening, and confirmed that it can run complex regex searches accurately.



ree


Unfortunately, AstroGrep will not run searches through PDFs.


Another drawback of using AstroGrep is that it apparently will not export search results which only include the searched for pattern results. The user only has the option of exporting the full line of text on which the search result appears.

 
 

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page