top of page
  • May 21, 2019

Globbing refers to running wildcard searches for file names - using an asterisk to search for multiple characters, or a question mark to search for a single character. Globbing can be used in Windows command prompt, Powershell and Python to find file names.

So for example a simple wildcard search like this in command prompt is an example of globbing: dir *.png OR java*

The Tip of the Night for February 16, 2019 showed how to use the glob command in Python to get a list of files. Standard regular expression style syntax can be used with the glob command to run more complicated file name searches. For example: glob.glob('java[0-3].*')


 
 

The Tip of the Night for August 4, 2018, discussed how to run Regex searches for multiple strings, collecting the complete line on which they appear. The tip showed how to do this using the grep utility, PowerGrep. Here's a slightly different approach using a list of separate search terms not written in the regular expression syntax.

1. In PowerGrep, select the folder which contains the files you want to search through in the directory tree at the left.

2. Set the action type to 'Collect Data'.

3. Set file sectioning to 'line by line'.

4. Check off the box for 'Collect/replace whole sections'

5. Set the search type to 'List of literal text'

6. Enter a string to search for in the search box, and then press the green plus icon to add additional lines.

7. For each search term, in the collect box enter: \0 to get the terms searched for (all with the complete line). You can also add %PATH% %FILENAME% to collect the file path and file names of the files you are searching.

8. Set target file creation to 'Save results into a single file' and in Target file location, enter a .csv file in which to export the search results.

9. Click Collect, and as you can see PowerGrep will collect the full line on which each search term appears.


 
 
  • Apr 28, 2019

When considering document review platforms and their conceptual searching capabilities, inquire as to whether or not they can account for the Lemmatization of words. The lemma of a word is its dictionary form. So the word ‘go’ is the lemma for ‘going’, ‘went’, ‘gone’ - the various tenses of the ‘headword’, ‘go’. The multiple inflections are collectively known as the lexeme of the word. Lemmatization differs from stemming in that it considers the context in which a word is used. Stemming will not find ‘better’ which is part of the lexeme of the lemma, ‘good’. Generally stemming facilitates the recall of a search - that percentage of available responsive hits in a review set that are returned. Employing search algorithms which account for Lemmatization will improve the precision of searches - the percentage of true hits as opposed to false positives. A stemming search algorithm may use a stem of the word, ‘crazy‘, spelled as ‘crazi’ to account for craziness.   


 
 

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page