top of page

Fidel from Brisbane has posted a very useful PowerShell script here, that you can use to run a regular expression search through a text file and extract only the matching hits.


When used with the RegEx search for Bates numbers discussed last night, it can be used to automatically extract a complete list of Bates numbers in any text file.


So if you start with a text file that looks like this:


. . . with Bates numbers at the end of each paragraph, you can run this PowerShell script:


select-string -Path C:\foofolder\input.txt -Pattern "(\b\w{1,10}(-|_|\s?)[0-9]{5,12}\b|\b\w{1,10}(-|_|\s?)\b\w{1,10}(-|_|\s?)[0-9]{5,12}\b)" -AllMatches | % { $_.Matches } | select-object Value -unique | sort-object Value > C:\foofolder\output2.txt


to pull out the Bates numbers. Note that you need to specify the path of your text file at:


select-string -Path C:\foofolder\input.txt


. . . put in the Regex in quotes at:


-Pattern "(\b\w{1,10}(-|_|\s?)[0-9]{5,12}\b|\b\w{1,10}(-|_|\s?)\b\w{1,10}(-|_|\s?)[0-9]{5,12}\b)" -AllMatches


. . . and then specify an output file at the end:


sort-object Value > C:\foofolder\output2.txt


You should end up with a text file that just lists the Bates numbers and has them sorted as well!








The world apparently needs a good RegEx search for Bates numbers in a variety of formats. When I tried today to find one by running a Google search, I only found a lame attempt in the Relativity Search Guide which requires the entry of specific Bates prefix:



Here's a first attempt at a RegEx pattern which will account for Bates numbers with different Bates prefixes that contain hyphens and underscores between segments of the letter prefix, and which contain between 5 and 12 digits.


(\b\w{1,10}(-|_|\s?)[0-9]{5,12}\b|\b\w{1,10}(-|_|\s?)\b\w{1,10}(-|_|\s?)[0-9]{5,12}\b)


The search is structured to search for between 1-10 letters at the beginning of a word boundary:

\b\w{1,10}


. . . it then searches for either a hyphen, underscore or zero or one whitespace:

(-|_|\s?)


. . . between the letter prefix and 5 to 12 digit number which is at the end of a word:

[0-9]{5,12}\b


The search then looks for instances where the Bates letter prefix is split in two parts, separated by a hyphen, underscore, or zero, or one whitespace:

\b\w{1,10}(-|_|\s?)\b\w{1,10}(-|_|\s?)


Obviously, it's possible to imagine additional Bates number formats, but this should find most and can easily be edited to account for more variations in the letter prefix length or number of digits.






In NotePad++ you can use a simple regex search in this format to mark all text between two strings:


firststring .*? second string


In the Search menu, open the Mark tool. Be sure that you have the 'matches newline' box checked off. Using this method the second string cannot be at the beginning of a new line, and it should be a whole word.




Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page