top of page

You can use javascript code in Adobe Acrobat to extract set ranges of pages from a PDF file and then save and rename each extracted PDF range as a new file.


A user named vvb posted the following code here:


/* Extract Pages to Folder */var re = /.*\/|\.pdf$/ig; var filename = this.path.replace(re,""); var lastPage=this.numPages-1; { for ( var i = 0; i < this.numPages; i = i + 2 ) this.extractPages ({ nStart: i, nEnd: i + 1, cPath : filename + "_page_" + (i+1) + ".pdf" }); };


This code can be inserted in Acrobat and edited so that it takes 5 pages at a time from a source PDF file and then names them sequentially with a prefix.


In Acrobat go to More Tools and select Action Wizard. On the top toolbar select the option for 'New Action'. In the 'More Tools' menu select the 'Execute Javascript' option - doubleclick on it so it appears in the right pane.


Uncheck the 'Prompt User' option, and then click on 'Specify Settings'.


Edit the code so that the line beginning, " for ( var i = 0; i < this.numPages; i = i + " specifies how many pages you want each PDF to be., and the line beginning "nEnd: i + " ends with one number less. Modify the line beginning, " cPath : " so that it has the letter prefix for each file. The script will name each extracted file with the page number from the original file that the excerpt begins with.



/* Extract Pages to Folder */var re = /.*\/|\.pdf$/ig;


var filename = this.path.replace(re,"");

var lastPage=this.numPages-1;

{

for ( var i = 0; i < this.numPages; i = i + 5 )

this.extractPages

({

nStart: i,

nEnd: i + 4,

cPath : "ACME" + (i+1) + ".pdf"

});

};




Save and name the action.



Click on the action in the Actions List and then select the files that you want to run it on. Click the 'Start' button.



The script will create a new file (in the same folder as the source file(s)) named with the prefix you enter in the code and the page number on which the excerpt begins.



 
 

If you have a very large set of PDFs, and you're uncertain about which files have searchable text, you can set up an Adobe action utilizing the Preflight tool to find which files contain no, or very few text objects.


In the Actions Wizard, add the Preflight option from the Document Processing menu:



Click on 'Specify Settings' for the Preflight action, and in the dialog box select the option for 'Acrobat Pro DC 2015 Profiles'


Then in the long menu to the right select the option to 'List page objects, grouped by type of object'



Choose the option to create a report for either successes or errors and set a folder for these reports, and check off the box to display a summary PDF.


Also choose the option in the Save & Export menu to save each file processed by Preflight. Click on the icon to the right to get the option to set a specific local folder to save the reports to.



When it's run the action will give you the option to add multiple files:



The action will generate a PDF portfolio with multiple PDFs for each original PDF. Select all of the PDFs in the portfolio, and then right click and select the option to extract each PDF from the portfolio.



Then combine the reports into a single PDF file, and then save the text of the report to a text file . . .



Open the text file in a text editor, and run a find and replace to make sure that the captions, 'File name:'; 'Path:'; 'Text Objects'; 'Vector Objects' each appear at the beginning of a new line.



Then paste the text into column A of an Excel spreadsheet. In column B enter this formula:


=IF(LEFT(A2,4)="Path","",IF(LEFT(A2,12)="Text Objects",A2,B1))


. . . start in cell B2, and then pull down using CTRL + D. In cell C2 do the same with this formula:


=IF(LEFT(A2,9)="File name",A2,"")


Note that in the reports the text object count for a file is listed before the file name.


When you filter for any entries in column C, you'll see how many text objects are in each file:



Keep in mind that a file which has a lot of text which still needs to be OCR'd, may have a few text objects used in an exhibit slip sheet, headers, footers, and so forth. Review any file that has a small number of text objects based on the overall page count.







 
 

Using javascript posted here by Evermap, and copied below you can automatically transform annotations from one type in Adobe Acrobat to another type. It will not only allow you to convert highlights, underlined text, and crossed out text, but also to change from proposed redactions to any one of these annotations - but not vice versa.


In order to add the script to a new action in Acrobat, select the Action Wizard tool and then click on 'New Action', and choose 'Execute JavaScript' from the 'More Tools' drop down menu.




Click on 'Specify Settings' and enter the script in the editor. Set the type you want to find on the first highlighted line in the below screen grab, and the type you want to add in on the second line. If you want to reference crossed out text use the type, 'CrossOut'.



Uncheck the 'Prompt User' box, and then add the Save step to the action, so you will not be prompted to save each file when the action runs.




After saving and running the action, you can process multiple files by selecting the options in the drop down menu to add files or an entire folder.




The script can be used to convert multiple proposed redactions in multiple PDF files like this:




The script will convert one or more lines of continuous text selected for a redaction, but it will not convert a large block of text set for redaction.



The action will give this result:







You can add a second javascript to the same action to specify the highlighting color you want the action to convert annotations or proposed redactions to:


this.syncAnnotScan();

var annots = this.getAnnots();


for (var i = 0; i < annots.length; i++) {

if (annots[i].type == "Highlight") {

annots[i].strokeColor = color.yellow;

}

}







try

{

this.syncAnnotScan();

for (var nPage = 0; nPage < this.numPages; nPage++)

{

// get all annotations on the page

var Annots = this.getAnnots({

nPage:nPage

});

// process each annotation

if (Annots != null)

{

for (var i = 0; i < Annots.length; i++)

{

if (Annots[i].type == "StrikeOut")

{

Annots[i].type = "Highlight";

}

}

}

}

}

catch(e)

{

app.alert(e);

}

 
 

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page