Regex search to capture X number of words before and after

Regular expression searches can be designed to find both a specific alphanumeric pattern, and a given number of words before and after that pattern.


In this example we use the following regex pattern to look for dates in a text file:


(effective as of )(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\s+(\d{1,2})\,\s+(\d{4})


See the Tip of the Night for June 5, 2020 for an explanation of how this regex pattern works. The search finds dates preceded by the phrase, 'effective as of':



It would be helpful to edit the regex search so it collects the data for each SEC form filing, including the description, form number, and filing date / period end date. We can modify it this way so the search includes six words before the searched for phrase and date, and five words afterwards:


((?:\S+\s*){0,6}effective as of )(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\s+(\d{1,2})\,\s+(\d{4}(?:\S+\s*){0,5})


The regular expression syntax added to the beginning and end, searches for both whitespace '\s' and non-whitespace '\S'. The second number in the curly brackets sets the number of words or digits before or after the regex pattern in between that is to be matched.