top of page

Soundex


Soundex is a phonetic algorithm widely used to associate different names that are pronounced similarly. The Soundex encoding method was developed almost a century ago and is a standard feature of many current database and electronic discovery related programs, such as the one profiled in last night's tip, Windows Grep.

The Soundex encoding method reduces any name to a three digit code preceded by a letter.

1. The first letter of the name is retained.

2. All references to h, w, a, e, i, o, u and y are dropped unless they are the first letter.

3.

b, f, p, v become 1.

c, g, j, k, q, s, x, z become 2.

d, t become 3.

l becomes 4.

m, n become 5.

r becomes 6.

4. If there are two or more letters with the same number in a name (or if two such letters are separated by an h or w), only the first is retained, even if one of the letters is the first one in the name.

5. All Soundex codes must contain a letter and three digits. If there are not enough letters in a name for there to be three digits, add zeroes. Only review letters from the start of any name until you get a letter and three digits.

So using this method the name, William becomes:

W450

Smith becomes:

S530

Brown or Braun become:

B650

Soundex is frequently employed by sites devoted to researching familiy genealogies. This site, https://www.ics.uci.edu/~dan/genealogy/Miller/javascrp/soundex.htm will generate Soundex codes for any name you enter.

Soundex is particulary useful when searching for a name in a database that may have different spellings that are hard to predict. So while you know you're looking for someone named Ismail, you may not have any idea how others will spell his name. Choose the Soundex option in Windows Grep . . .

. . . and it will automatically pull up all words with the same Soundex encoding as Ismail - I254


Recent Posts

See All
Hold and Data Preservation Notices

The 2021 edition of the Thomson Reuters Electronic Discovery and Records and Information Management Guide provides checklists for legal...

 
 

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

​

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

​

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page