top of page

Karl Heinz Kremer has posted a very useful Java script here. It can be used to extract each page from a set of multiple PDFs that contains a specific string. The script is posted below.


  • You can execute this script in Adobe Acrobat XI by going to Tools . . . Action Wizard . . . Create New Action.

  • In the new dialog box which pops us, from the tools menu on the left, in the 'More Tools' drop down menu, select 'Execute JavaScript'.

  • Click on 'Specify Settings' and then enter the script.

  • Edit the script so the string you want to search for is listed in quotes on the line which begins, 'var stringToSearchFor'.

  • Be sure to uncheck the 'Prompt User' box.

  • Name and save the action.

  • Run the action from the Action Wizard section of Tools.

  • When the action is run you will be prompted to select either mulitple files or a folder.


  • After the script executes, you will be left with one PDF for each source file that only contains the pages from the source file which include the searched for string.




As always I tested this script tonight and confirmed that it works.




// Iterates over all pages and find a given string and extracts all

// pages on which that string is found to a new file.


var pageArray = [];


var stringToSearchFor = "Court";


for (var p = 0; p < this.numPages; p++) {

// iterate over all words

for (var n = 0; n < this.getPageNumWords(p); n++) {

if (this.getPageNthWord(p, n) == stringToSearchFor) {

pageArray.push(p);

break;

}

}

}


if (pageArray.length > 0) {

// extract all pages that contain the string into a new document

var d = app.newDoc(); // this will add a blank page - we need to remove that once we are done

for (var n = 0; n < pageArray.length; n++) {

d.insertPages( {

nPage: d.numPages-1,

cPath: this.path,

nStart: pageArray[n],

nEnd: pageArray[n],

} );

}


// remove the first page

d.deletePages(0);

}

 
 

When running an advanced search in Adobe Acrobat, note that you have the option to run proximity searches.


This option will be greyed out until the option in the 'Return results containing' menu for 'Match All of the words' is selected. When entering the search terms, separate them only with whitespaces, not with commas or anything else.



. . . note that the search results will be sorted by relevancy.



Under Edit . . . Preferences . . . Search . . . you can set the range for proximity searches.


For some reason, Adobe has set the default at a very high 900 words!





 
 

Evermap has posted the below Javascript here , which is designed to delete blank pages, (blank in the sense that they don't contain any searchable text), from sets of PDFs. I successfully tested the script tonight.

In order to run the script in Adobe Acrobat go to Tools . . . Action Wizard and click 'Create New Action . . . '. Under 'More Tools', in the 'Choose tools to add' section, click on 'Execute Javascript', then uncheck 'Prompt User' and click on 'Specify Settings' on the right. Put the script in the JavaScript Editor . . .

Click OK and save and rename the new action. Add the files you want to process and then click start.

A new file will be created for each source file, with the blank pages removed and '_Original' added to the end of the file name.

// Acrobat JavaScript Code - www.evermap.com // DELETE PDF PAGES WITHOUT TEXT // IMPORTANT: This script assumes that page is blank if it does not contain any "pdf words" // OUTPUT: An output PDF file is created by appending _Original.pdf to the filename try { var newName = this.path; var filename = newName.replace(".pdf","_Original.pdf"); this.saveAs(filename); for (var i = 0; i < this.numPages; i++) { numWords = this.getPageNumWords(i); if (numWords == 0) { // this page has no text, delete it this.deletePages(i,i); } } } catch(e) { app.alert(e); }


 
 

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page