top of page

Rick Borstein posted a study about Adobe's ClearScan OCR software which can generate text searchable images that are clear and smaller in size than those with Acrobat's standard OCR tool. The standard method overlays invisible searchable text over the original image. An image of about 300 dpi will be 15 KB to 40 KB per page. The image slows down printing and appears clearly different than a PDF created from an electronic document.

Starting with an image only 78 page PDF of 1.13 MB, Borstein found that ClearScan generated a searchable image of 401 KB in the same time the standard OCR tool took to generate an image of 1.24 MB.

ClearScan works by generating a custom font for scanned images, rather than attempting to match them to a library of fonts.

According to Borstein, OCR quality in ClearScan matches that of standard OCR. It does have the disadvantage of disabling the touch-up text tool.

It's possible to go from a searchable image to ClearScan text, but not vice versa.

My own experiment shows how ClearScan sharpens the font.


 
 
  • Sep 16, 2019

Different PDF editors OCR PDF files at different speeds. Today at work I had a chance to use FoxIt Phantom for the first time. It includes at 'Quick OCR' option. I tried it and it was fast and relatively accurate, but not so impressive compared with its competitors. The FoxIt Quick OCR tool took 1 minute and 10 seconds to create a searchable version of 39 page chart. While the resulting quality wasn't bad, Nuance Power PDF Advanced took only 36 seconds for the same PDF, and actually rendered noticeably superior OCR text.

Adobe Acrobat -took 2 minutes 9 seconds on the 'Searchable Image' setting, and 2 minutes, 21 seconds using the superior ClearScan setting.

Not a very scientific study, but it appears as though Nuance has a clear edge.


 
 

'try67; has posted a javascript here, which you can use to count the number of pages in a PDF file which are marked for redaction. As the discussion on Adobe forum shows some people have not had much luck getting the script to work. The problem can be overcome by inserting the javascript into an Adobe action. Follow these steps:

1. In the Tools console, go to Action Wizard and select 'Create New Action'.

2. Choose More Tools . . . Execute Javascript and click it into the 'Action steps to follow' area on the right. Uncheck the 'Prompt User' box, and click in 'Specify Settings'.

3. Enter this script in the Javascript Editor:

this.syncAnnotScan(); var counter = 0; for (var p=0; p<this.numPages; p++) { var annots = this.getAnnots({nPage: p}); if (annots==null || annots.length==0) continue; for (var i in annots) { if (annots[i].type=="Redact") { counter++; break; } } } app.alert(counter + " pages contain Redaction annotations in this file.",3);

4. Name and save the action.

5. When you click on the new action in the Action Wizard menu, you will be prompted to select multiple PDF files.

6. Click start and Acrobat will start to open the files and indicate the number of pages marked for redactions in each PDF.

It would be helpful if the javascript was modified to generate a textual list for each file rather than showing the counts in successive dialog boxes.


 
 

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page