top of page

Some courts will not allow PDFs to be electronically filed which contain hyperlinks. This can really become a problem if you have PDFs with many pages and a url has been added to the footer for each page, as is common with PDFs created from SEC filings found on EDGAR.



Printing out the PDF as a hard copy and photo scanning is too much of a hassle, and there may not be time to go back and redo the PDFs without the footers containing the urls - or urls could be scattered in the body of the document. Removing links in EDIT mode may also be too time consuming.


Flattening a PDF by writing it to a new PDF will not necessarily remove hyperlinks in the PDF, and hyperlinks can still be present when a conversion to PDF/A format is performed.



The Acrobat tool which will remove web links . . .



. . . will not take out all hyperlinked web addresses:




Going under Preferences in Acrobat and unselecting 'Create links from URLs' in the Documents category will only deactivate hyperlinks for the current user of the PDF - not recipients you forward it to.



Instead try converting the PDF to a multipage TIFF image using an editor like FoxIT. (Acrobat will not convert a PDF to the multipage TIFF format.) This software has an option to export the PDF as a multipage TIFF image. When saving the file click on Settings and set the conversion type to 'Single File'.




You can then open the PDF back up in Acrobat which will reformat it as a PDF. The formatting will be retained but the links will no longer be active.




52 views0 comments

Tonight's tip features a regular expression script created by The fourth bird of the Netherlands on stack overflow. I posted looking for a regular expression that would find the text on a line repeated from the prior line after a time code at the beginning of the both lines that might be different. See this example:


(11:12:21) [Tom]: Hello this is Tom. Who is it?

(11:14:08) [Tom]: Hello this is Tom. Who is it?


The goal was to find when consecutive lines were the same after the first 10 characters. The fourth bird came up with a solution that would find when parts of two lines matched. In a text editor like NotePad++ run this find and replace search:


FIND: ^(\([^][]*\))(.*)(?:\r?\n\([^][]*\)\2)+

REPLACE: $1$2


^(\([^][]*\)) will find the first part of the string - the time code in parentheses. So the caret ^ matches the beginning of the line, and the rest then finds the rest of the text between the parentheses.


(.*) matches to the end of the line after the parenthetical information at the beginning.


(?:\r?\n this then matches a new group on a new line


\([^][]*\) this matches from the first part of the previous line.


\2)+ this then matches with the second part of the previous line.



As you can see in this demonstration a find and replace in the text editor can easily remove the duplicate lines.








25 views0 comments

You can use an Excel formula to check to see when any one of multiple values appears in the contents of a cell.


=SUMPRODUCT(--ISNUMBER(SEARCH($D$2:$D$7,A2)))>0


This formula will check inside cell A2 for the values listed in cells D2 to D7.


When you have a range of cells that you want to search through, enter the list of strings you're looking for hits for with an absolute reference using dollar signs. So in this example, we search through the addresses listed in column A for the state capitals listed in column F. We can pull down the formula entered in cell B2 to the cells below using CTRL + D. The formula will return 'TRUE' when one of the values from cells F2 to F51 are listed in column A.




42 views0 comments
bottom of page