top of page

Collecting page numbers with a grep utility


PowerGrep, a grep utility chiefly known for its ability to run Regular Expression searches across thousands of files, can also collect the page numbers RegEx hits appear on.

Simply make these settings in Power Grep:

1. Select the PDFs that you want to search on the left in the File Selector.

2. On the Action tab set the Action type to 'Collect Data'.

3. Set Filter Files to 'Do not filter files'.

4. Set File sectioning to 'split along delimiters', and Section search type to 'Regular expression'

5. In the section search box press, CTRL + Enter, and a horizontal line will appear, which represents the page break.

6. In the search box, in this example, I have entered a simple Regex search to search for years from the 21st century. Be sure to set the search type menu to, 'Regular Expression'.

7. In the Collect box we enter:

%MATCH% on page %SECTIONN% in document %FILENAME%

%MATCH% represents the data that comes up in the search; %SECTIONN% is the page number, and %FILENAME% is the name of the source file.

8. In the 'Target file creation' choose 'save results to a single file', and in the 'Between collected text' menu pick 'Line break'. You'll be able to enter a path to a text file in the Target file creation window which will contain the collected text.

Click 'Collect' on the toolbar. These search results will be generated in the text file:


 
 

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page