top of page

Collecting page numbers with a grep utility

PowerGrep, a grep utility chiefly known for its ability to run Regular Expression searches across thousands of files, can also collect the page numbers RegEx hits appear on.

Simply make these settings in Power Grep:

1. Select the PDFs that you want to search on the left in the File Selector.

2. On the Action tab set the Action type to 'Collect Data'.

3. Set Filter Files to 'Do not filter files'.

4. Set File sectioning to 'split along delimiters', and Section search type to 'Regular expression'

5. In the section search box press, CTRL + Enter, and a horizontal line will appear, which represents the page break.

6. In the search box, in this example, I have entered a simple Regex search to search for years from the 21st century. Be sure to set the search type menu to, 'Regular Expression'.

7. In the Collect box we enter:

%MATCH% on page %SECTIONN% in document %FILENAME%

%MATCH% represents the data that comes up in the search; %SECTIONN% is the page number, and %FILENAME% is the name of the source file.

8. In the 'Target file creation' choose 'save results to a single file', and in the 'Between collected text' menu pick 'Line break'. You'll be able to enter a path to a text file in the Target file creation window which will contain the collected text.

Click 'Collect' on the toolbar. These search results will be generated in the text file:

bottom of page