top of page

Exceljet has a formula posted here, which is effective at extracting email addresses, and other strings, from text excerpts.


=TRIM(MID(SUBSTITUTE(A2," ",REPT(" ",99)),MAX(1,FIND("@",SUBSTITUTE(A2," ",REPT(" ",99)))-50),99))



The formula works by pulling any word which includes the character that is a subject of the FIND formula, which in this case is the '@' symbol. It works by adding spaces to the target cells - so this part of the formula:


SUBSTITUTE(A2," ",REPT(" ",99))


. . . adds 99 spaces around each word:



We SUBSTITUTE blank spaces in A2 with a space 99 times using the REPT formula. (The REPT formula just repeats whatever string you enter for a set number of times. I.e., =REPT("cat",10) gives: catcatcatcatcatcatcatcatcatcat). The number of spaces that are added sets a limit on the length of the string that can be pulled. This solution will not pull a word more than 100 characters in length.


The FIND formula then finds where in the cell the "@" or searched for term appears after the spaces have been added.

=FIND("@",SUBSTITUTE(A2," ",REPT(" ",99)))



The MID formula then extracts the text with the searched for term and the spaces around it:

=MID(SUBSTITUTE(A2," ",REPT(" ",99)),MAX(1,FIND("@",SUBSTITUTE(A2," ",REPT(" ",99)))-50),99)



The TRIM formula then simply removes the extra spaces.


We can extract full words which contain the searched for string because the complete formula surrounds them with buffer blank spaces. They are then easy to target - once the position of one character or segment is located, it's easy to extract the full word and then cut off the extra spaces.


The formula will also successfully locate words, or parts of words and turn the full word in which the segment appears.



A #VALUE error will be given if the searched for text does not appear in the target cell.




 
 

If you have a long column in an Excel spreadsheet which contains document descriptions for which dates are given in a nonstandard format, you follow the below workaround to extract the dates into a separate column.


1. Begin by using this formula posted here by Extend Office :


=MID(A2,MIN(IFERROR(MIN(FIND({0,1,2,3,4,5,6,7,8,9},A2&"0123456789",1)),LEN(A2)+1)),LOOKUP(1,0*MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1),ROW(INDIRECT("1:"&LEN(A2)))) + 1 - MIN(IFERROR(MIN(FIND({0,1,2,3,4,5,6,7,8,9},A2&"0123456789",1)),LEN(A2)+1)))


. . . to extract out numerical dates. This formula will extract dates separated with both forward slashes, periods and hyphens. Simply place it in a column adjacent to the column with the document descriptions and then pull it down alongside for all of the descriptions.


2. The formula will extract out dates entered in a numerical format but it will also return values for descriptions which do not have dates in this format. In order to account for these, in a third column set the format to the MM/DD/YYYY date format.



3. Then copy and paste the dates returned by the formula to NotePad to get clean, plain text. Paste the dates back into the new Excel column and the correctly extracted dates will be in date format.



Even though the formula extracts dates separated with periods It's necessary to find and replace them with slashes in the new Excel column, even though this is not necessary for the hyphens.


4. The next step is to parse out the spelled out dates. To do this we will prepare a SUBSTITUTE formula to run multiple find and replaces so there is a pipe delimiter before each month and after each year.


=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,$J$2,$K$2),$J$3,$K$3),$J$4,$K$4),$J$5,$K$5),$J$6,$K$6),$J$7,$K$7),$J$8,$K$8),$J$9,$K$9),$J$10,$K$10),$J$11,$K$11),$J$12,$K$12),$J$13,$K$13),$J$14,$K$14),$J$15,$K$15),$J$16,$K$16),$J$17,$K$17),$J$18,$K$18),$J$19,$K$19),$J$20,$K$20),$J$21,$K$21),$J$22,$K$22),$J$23,$K$23),$J$24,$K$24)


. . . although this formula may look detailed it is not difficult to write. Type January in a cell in column J and then pull it down 11 cells so each of the months is listed. Then enter the year which is the earliest for which there are documents in your set, and then pull it down to the end of the documents' date range. In the column to the right enter a pipe delimiter before the months and after the years like this:


The first column will have the values that the SUBSTITUTE formula searches for and then the second will have the values that it adds in as replacements.


5. Next enter "SUBSTITUTE(" in a column for all of the months and years, and then enter absolute references for the cells with the values to be searched for in one column, and then those with the replacement values in another column. In between these columns enter a column with just a comma in each cell, and to the right enter a column with ").".

6. Copy and paste the 'SUBSTITUTE(' entries in a Word document and then remove all of the paragraph breaks by finding and replacing "^p". Then do the same for the other columns on the right but remove the tabs by finding and replacing "^t".



7. In between the SUBSTITUTE command and the rest of the formula enter a reference to the first cell in the Excel spreadsheet with the document descriptions. You should get a formula which functions this way:



8. The formula should put delimiters around the spelled out dates which you can use to separate them into a new column using the Text to Columns tool on the Data tab. Copy and paste the formula results as values first.


9. In the column with the reformatted dates from the first formula filter for only the non-date entries



10. Then simply add the values for the spelled out dates using a reference to the column with the parsed out data.


11. We are left with one column with all of the dates in the descriptions.



This process does not take long. See this demonstration video:




 
 

Tonight I used the below vba code posted here by Domenic to successfully generate a list of each cell in a workbook which contained a link to an outside Excel file.


This code is more convenient than searching for .xl in Find with the 'Look in: Formulas' setting.


Keep in mind that if you open an Excel workbook and this troubling warning appears:




. . . the source of the problem will not be shown in the first column of the 'Find All' search results. Look in the reference column (or the Formula column in the Find tool) for a reference to a different file embedded in a formula. See: prices.xlsx in this example for a search run in the inventory.xlsx workbook:





Option Explicit

Sub ListLinks()

    Dim Wks             As Worksheet
    Dim rFormulas       As Range
    Dim rCell           As Range
    Dim aLinks()        As String
    Dim Cnt             As Long

    If ActiveWorkbook Is Nothing Then Exit Sub
    
    Cnt = 0
    For Each Wks In Worksheets
        On Error Resume Next
        Set rFormulas = Wks.UsedRange.SpecialCells(xlCellTypeFormulas)
        On Error GoTo 0
        If Not rFormulas Is Nothing Then
            For Each rCell In rFormulas
                If InStr(1, rCell.Formula, "[") > 0 Then
                    Cnt = Cnt + 1
                    ReDim Preserve aLinks(1 To 2, 1 To Cnt)
                    aLinks(1, Cnt) = rCell.Address(, , , True)
                    aLinks(2, Cnt) = "'" & rCell.Formula
                End If
            Next rCell
        End If
    Next Wks
    
    If Cnt > 0 Then
        Worksheets.Add before:=Worksheets(1)
        Range("A1").Resize(, 2).Value = Array("Location", "Reference")
        Range("A2").Resize(UBound(aLinks, 2), UBound(aLinks, 1)).Value = Application.Transpose(aLinks)
        Columns("A:B").AutoFit
    Else
        MsgBox "No links were found within the active workbook.", vbInformation
    End If
    
End Sub
 
 

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page