top of page

Regex to find where consecutive lines end with the same text

Tonight's tip features a regular expression script created by The fourth bird of the Netherlands on stack overflow. I posted looking for a regular expression that would find the text on a line repeated from the prior line after a time code at the beginning of the both lines that might be different. See this example:

(11:12:21) [Tom]: Hello this is Tom. Who is it?

(11:14:08) [Tom]: Hello this is Tom. Who is it?

The goal was to find when consecutive lines were the same after the first 10 characters. The fourth bird came up with a solution that would find when parts of two lines matched. In a text editor like NotePad++ run this find and replace search:

FIND: ^(\([^][]*\))(.*)(?:\r?\n\([^][]*\)\2)+


^(\([^][]*\)) will find the first part of the string - the time code in parentheses. So the caret ^ matches the beginning of the line, and the rest then finds the rest of the text between the parentheses.

(.*) matches to the end of the line after the parenthetical information at the beginning.

(?:\r?\n this then matches a new group on a new line

\([^][]*\) this matches from the first part of the previous line.

\2)+ this then matches with the second part of the previous line.

As you can see in this demonstration a find and replace in the text editor can easily remove the duplicate lines.

bottom of page