Removing duplicate values in NotePad ++
top of page

Removing duplicate values in NotePad ++


If you're working with a sufficiently large amount of data in Excel you may find that basic options like 'Remove Duplicates' won't function . . . or will at least work very slowly. If you need to de-dupe a column of data, (perhaps just to look for irregularities in standardized data, when the filter list won't show all of the entries), you can paste it in NotePad ++ and dedupe it.

1. Paste the data in NotePad++. You can sort the data, clicking CTRL + A, and by going to Edit . . . Line Operations . . . Sort Lines Lexicographically Ascending, but this step is not necessary for the data to be deduplicated.

2. Press CTRL + H and enter this Regex search in the Find box:

^(.*?)$\s+?^(?=.*^\1$)

. . . leave the replace box empty and make sure the Regex search mode is selected.

3. Click replace all and you'll quickly have a deduped list.

See the explanation of this Regex search on this web page: http://stackoverflow.com/questions/3958350/removing-duplicate-rows-in-notepad


Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page