top of page
  • Mar 19, 2019

Ranks NL has posted stop word lists in several different languages here: https://www.ranks.nl/stopwords. There are multiple lists in English, including that used by MySQL. Ranks NL is recommended by Relativity for use in a conceptual index. See this guide to Analytics indexes.

Stop words lists are included for many foreign languages.


 
 

When most people run a search in a document database, they use a proximity search, looking for a first name within 2 or 3 words of their last name. For example, using dtSearch on might structure a search like this:

john w/3 smith

Searching for someone's last name may work if it's an unusual surname, but running a general search for a name like Smith will return many false positives that will have to be sifted through. The proximity search will exclude documents where the person in question is only referred to by their last name. What if there is an email in which someone says, "It was that damn Smith who masterminded the fraud!". You wouldn't want to miss that message.

A good approach is to go through search results and find what other individuals (or objects) share the same name as the person who your search is focused on.

In a Relativity dtSearch like this, you can use the NOT operator before the 'w/' proximity operator to search for a last name when it does not appear near certain first names.

This search will find a document like this one, which happens to mention a John Smith, but more importantly refers to the Connie Smith that is the subject of the search. Documents which only refer to John Smith will be excluded from the results.



 
 
  • Oct 25, 2017

The wonderful folks at Nirsoft have another great free application that will make your job easier.

SearchMyFiles can be downloaded here. As you can see it allows you to run Boolean searches for files of a set size, that were created or modified in a particular time range.

Its duplicate search mode can also be used to search for duplicate file names with identical content, or only duplicate names with non-identical content.


 
 

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page