top of page

Create a Redaction List with PowerGrep Part 2


This is a follow-up to my post from last night where I discussed how to create a list of terms to be marked for redaction with Adobe Acrobat's Search and Remove text tool. You may also want to redact personal names from a document. A PowerGrep search with the same setting as those described last night, but with a different regular expression in the search box:

([A-Z][a-z]*)[\s-]([A-Z][a-z]*)

This will find any instance of two consecutive words which are both capitalized. Thanks to NickC for posting this at: http://stackoverflow.com/questions/7653942/find-names-with-regular-expression

When you have the data collected with this regular expression exported to an Excel file, you'll likely wind up with a lot of terms you don't need - titles for different document sections and so forth. Now what you want to do is to separate the two words that you collected with the Text to Columns wizard in Excel, selecting the option for delimited text and choosing the option for the 'space' delimiter. But first copy the terms collected with PowerGrep into an adjacent column. Now you'll have any first names that you collected in one column, but the full names in the first column. Prepare a list of common first names that you can conference against the data in the first column. See this site: http://names.mongabay.com/data/1000.html , which also includes common last names. If you have the first words from the grep search in column B, and put the list of common male and female names from in column E, you can just run this formula in column D, =VLOOKUP(B2,E:E,1,FALSE) to find any instances where a term in column B is likely to be a person's name.

grepnames.png


bottom of page