Formula to extract dates from text in cells
top of page

Formula to extract dates from text in cells

If you have a long column in an Excel spreadsheet which contains document descriptions for which dates are given in a nonstandard format, you follow the below workaround to extract the dates into a separate column.


1. Begin by using this formula posted here by Extend Office :


=MID(A2,MIN(IFERROR(MIN(FIND({0,1,2,3,4,5,6,7,8,9},A2&"0123456789",1)),LEN(A2)+1)),LOOKUP(1,0*MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1),ROW(INDIRECT("1:"&LEN(A2)))) + 1 - MIN(IFERROR(MIN(FIND({0,1,2,3,4,5,6,7,8,9},A2&"0123456789",1)),LEN(A2)+1)))


. . . to extract out numerical dates. This formula will extract dates separated with both forward slashes, periods and hyphens. Simply place it in a column adjacent to the column with the document descriptions and then pull it down alongside for all of the descriptions.


2. The formula will extract out dates entered in a numerical format but it will also return values for descriptions which do not have dates in this format. In order to account for these, in a third column set the format to the MM/DD/YYYY date format.



3. Then copy and paste the dates returned by the formula to NotePad to get clean, plain text. Paste the dates back into the new Excel column and the correctly extracted dates will be in date format.



Even though the formula extracts dates separated with periods It's necessary to find and replace them with slashes in the new Excel column, even though this is not necessary for the hyphens.


4. The next step is to parse out the spelled out dates. To do this we will prepare a SUBSTITUTE formula to run multiple find and replaces so there is a pipe delimiter before each month and after each year.


=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,$J$2,$K$2),$J$3,$K$3),$J$4,$K$4),$J$5,$K$5),$J$6,$K$6),$J$7,$K$7),$J$8,$K$8),$J$9,$K$9),$J$10,$K$10),$J$11,$K$11),$J$12,$K$12),$J$13,$K$13),$J$14,$K$14),$J$15,$K$15),$J$16,$K$16),$J$17,$K$17),$J$18,$K$18),$J$19,$K$19),$J$20,$K$20),$J$21,$K$21),$J$22,$K$22),$J$23,$K$23),$J$24,$K$24)


. . . although this formula may look detailed it is not difficult to write. Type January in a cell in column J and then pull it down 11 cells so each of the months is listed. Then enter the year which is the earliest for which there are documents in your set, and then pull it down to the end of the documents' date range. In the column to the right enter a pipe delimiter before the months and after the years like this:


The first column will have the values that the SUBSTITUTE formula searches for and then the second will have the values that it adds in as replacements.


5. Next enter "SUBSTITUTE(" in a column for all of the months and years, and then enter absolute references for the cells with the values to be searched for in one column, and then those with the replacement values in another column. In between these columns enter a column with just a comma in each cell, and to the right enter a column with ").".

6. Copy and paste the 'SUBSTITUTE(' entries in a Word document and then remove all of the paragraph breaks by finding and replacing "^p". Then do the same for the other columns on the right but remove the tabs by finding and replacing "^t".



7. In between the SUBSTITUTE command and the rest of the formula enter a reference to the first cell in the Excel spreadsheet with the document descriptions. You should get a formula which functions this way:



8. The formula should put delimiters around the spelled out dates which you can use to separate them into a new column using the Text to Columns tool on the Data tab. Copy and paste the formula results as values first.


9. In the column with the reformatted dates from the first formula filter for only the non-date entries



10. Then simply add the values for the spelled out dates using a reference to the column with the parsed out data.


11. We are left with one column with all of the dates in the descriptions.



This process does not take long. See this demonstration video:




bottom of page