Script to extract pages with string from multiple PDF files

Karl Heinz Kremer has posted a very useful Java script here. It can be used to extract each page from a set of multiple PDFs that contains a specific string. The script is posted below.


  • You can execute this script in Adobe Acrobat XI by going to Tools . . . Action Wizard . . . Create New Action.

  • In the new dialog box which pops us, from the tools menu on the left, in the 'More Tools' drop down menu, select 'Execute JavaScript'.

  • Click on 'Specify Settings' and then enter the script.

  • Edit the script so the string you want to search for is listed in quotes on the line which begins, 'var stringToSearchFor'.

  • Be sure to uncheck the 'Prompt User' box.

  • Name and save the action.

  • Run the action from the Action Wizard section of Tools.

  • When the action is run you will be prompted to select either mulitple files or a folder.


  • After the script executes, you will be left with one PDF for each source file that only contains the pages from the source file which include the searched for string.




As always I tested this script tonight and confirmed that it works.




// Iterates over all pages and find a given string and extracts all

// pages on which that string is found to a new file.


var pageArray = [];


var stringToSearchFor = "Court";


for (var p = 0; p < this.numPages; p++) {

// iterate over all words

for (var n = 0; n < this.getPageNumWords(p); n++) {

if (this.getPageNthWord(p, n) == stringToSearchFor) {

pageArray.push(p);

break;

}

}

}


if (pageArray.length > 0) {

// extract all pages that contain the string into a new document

var d = app.newDoc(); // this will add a blank page - we need to remove that once we are done

for (var n = 0; n < pageArray.length; n++) {

d.insertPages( {

nPage: d.numPages-1,

cPath: this.path,

nStart: pageArray[n],

nEnd: pageArray[n],

} );

}


// remove the first page

d.deletePages(0);

}