top of page

I am currently using an older version [v.11 - Professional Edition] of the Abbyy FineReader OCR software to render text for several thousand PDFs. The software does something which I have found to not be possible with Adobe Acrobat Pro or version 12 of the FoxIt PDF Editor software. It will successively pick up a check mark or 'X' in a check box.


The software often gets the mark perfectly as:

[X]


. . . but will sometimes substitute a 'K' or if there's a check in the box:



. . . it will enter a minuscule character like this: ø


An empty checkbox may get converted to an 'n' or just an empty box: □


But it interprets the marks consistently so if you're analyzing thousands of documents you can successfully track when checkboxes were ticked off.




Updated: Jan 29, 2022

If you have to re-format a text file to load it correctly into Lexis TextMap, it will be important to get several things exactly right. If not, it's possible that you'll get an error message like this one:




The transcript may still be loaded if this message appears, but you'll have the line numbers added into the text itself, so they'll be selected when the text is. Searching through the transcript may also be adversely affected.


In order to conform a text file to the Amicus format used by TextMap, be sure to follow each of these steps:


1. Make sure that each page break is identified by a four digit number listed by itself on one line.



2. Each line number should be listed at least three spaces before the transcript text. There should be one space before the single digit line numbers.


3. No line should be more than 70 characters long. (I've seen documentation stating that each line should be no longer than 78 characters, but the actual limit appears to be smaller.) Note that you can run a regular expression search for lines longer than 70 characters using this form:


.{70,}


. . . which is for any character repeated 70 or more times. In the free text editor, NotePad++, the position on a line will be listed at the bottom as the 'column' in which the character appears.



4. Also confirm that there are no blank whitespaces at the end of each line, and that the last text on the line is followed by a carriage return and new line marker.



5. If you have any pages which are longer than 25 lines, have the line numbering run from 1 to 25 on rows 1 to 25, and then leave the last lines unnumbered.


6. Confirm that the transcript doesn't have any blank lines. Make sure that it doesn't end with a blank line!


7. If all else fails, try copying the text into an entirely new file in NotePad, and saving it in the format of a 'Normal text file'.



See also the Tip of the Night for January 29, 2022, which discusses how to remove non-ASCII text.



When licensing software, it's helpful to keep in mind the difference between source code and object code. The developer's source code is an editable code which a programmer can modify. A license to use software will not give you access to the source code. This is proprietary information which the developer will want to protect. The source code will be translated into object code which is in a binary form that hardware can read, and then use to execute certain steps. A standard .exe executable file will contain object code. A programmer will often include notes in source to explain what each segment does. So, while a dev may compose code like the example shown on the left . . .


. . . .the object code will look like the ones and zeros on the right.


When making a major investment in software you may want to consider inquiring if the software owner will use an escrow service for the source code. You can gain access to the source code if the business you licensed the software from does not run necessary updates, or simply fails to provide the necessary support.





Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page