top of page
  • Nov 6, 2018

Word Counter is a great online utility which you can use to find the keywords which are used most frequently in a given document.

Simply create an account and the put the text of the document you want to analyze in Word Counter.

The utility will not only detect the total word and character counts, but it will also generate a list of which words are used most frequently.

You can change the keyword density to only include two word or three word phrases.

My friend Nikolai of HashTag Legal demonstrates in YouTube video, how the analysis performed by Word Counter can be implemented in Relativity. The application he has has designed makes keyword density analysis far more useful.

Keyword Density is shown in a separate field, which gives a count for each string.

Hash Tag Legal has made numerous useful tools available here. I highly recommend checking out Nick's site. The general spirit of this site is similar to Litigation Support Tip of the Night.


 
 

Here's a continuation of my outline of the 2016 edition of Craig Ball's Electronic Discovery Workbook which I last posted about on September 28, 2018.

The chapter entitled, "The Step-by-Step of Smart Search" provides a 10 step approach for effective keyword searching.

A. Statements By Judges on Keyword Searching

1. Judge Facciola - lawyers doing keyword searching without expert guidance going, "where angels fear to tread".

2. Judge Grimm - search methods must be tested for quality assurance.

3. Judge Peck - "wake-up call to the Bar" for their inexpert search terms.

4. Jason R. Baron of NARA - leading figure in e-discovery search.

B. 10 Step Approach

1. Start with the Request for Production

a. ESI search should really begin when litigation is anticipated.

b. Use both terms of art from the RFPs, and rephrase demands in ordinary English.

c. Push back against overboard requests.

d. If requests are vague, tell other side how you will interpret them and put them in the position of having to object.

2. Seek Input from Key Players

a. Custodians are SMEs for their own data.

b. TREC Legal Track challenge showed correlation between precision & recall and questioning key players.

3. Look at What You've Got and the Tools You'll Use

a. TIFF images require different search technique than emails or Word documents.

b. Test search tools against actual data.

c. Search tools must be able to search through container files and nested content & email attachments.

d. Search tools must identify encrypted tiles or non-standard types that can't be searched.

4. Communicate and Collaborate

a. Tell the other side the tools and terms you are using.

b. Ask for targeted suggestions and run them on sample data. They highlight terms that you overlooked.

c. Let the other side have two rounds of keyword search and review on your data.

5. Incorporate Misspellings, Variants and Synonyms

a. Common variants are more effective than fuzzy searching, which gets too many false hits.

b. Dumb Dictionary and Wikipedia lists of common misspellings.

6. Filter and Deduplicate First

a. Filter out music and image files which have alphanumeric content.

b. de-NIST by known hash values

c. Deduplication before indexing.

d. Be able to repopulate suppressed iterations.

e. Use keywords to exclude irrelevant ESI. e.g., "baby shower"

7. Test, test, test!

a. Test on data representative of custodian data with responsive evidence.

b. Can a large number of hits be found in system files, business units not subject of litigation, or other irrelevant ESI?

8. Review the Hits

a. Create spreadsheet showing hits on context - 20-30 words on each side.

b. Review responsive documents for additional keywords.

c. Search is an iterative process.

9. Tweak the Queries and Retest

a. Do keywords cluster in pairs? If so, can use Boolean AND or proximity connector to reduce noise hits.

10. Check the Discards

a. Sampling method must be rational compromise between quality assurance and cost.


 
 

Here's my final posting about the Electronic Discovery Institute's online e-discovery certification program, that you can subscribe to for just $1. I last blogged about this program on October 21, 2018. Go to https://www.lawinstitute.org/ to sign up for it. The course entitled, The EDI Diversity Initiative is taught by Alex Ponce de Leon, a discovery counsel for Google; Veronica Gromada, Senior Associate General Counsel and Section Head of the Litigation Support Group for Walmart Stores, Inc.; Ashish Prasad, general counsel for eTERA Consulting; Demetrius Rush, assistant general counsel for Zurich North America; Vince Catanzaro, an attorney with Shook, Hardy & Bacon.

The Diversity Pledge

American society is very diverse and becoming more connected to the rest of the world through technology. People who come from different backgrounds, bring different approaches to problems. Greater racial and gender diversity helps encourage a greater diversity of thought. Companies have an interest in having the ethnic background of their staff mirror that of their customer base. Only about 30% of attorneys are women, and a smaller percentage of law professors are women. There is a greater diversity in the medical profession and other businesses than in the legal profession.

Cantanzaro believes that law firms have not been as committed to promoting diversity as other businesses. Part uf the problem may stem from the fact that white men may be more inclined to promote other white men.

Diversity may have been hindered by the propensity to be competitive with other individuals, which loses sight of the greater good of the firm and its clients. Diversity committees can help firms take the first steps in the right direction.

The EDI diversity initiative hopes to build upon the efforts of bar associations that have been active in encouraging greater diversity. The EDI wants it to be widely acknowledged that diversity should be a priority. Many initiatives in the legal industry have been in place for decades - the National Bar Association and other organizations have had a lot of success with their efforts. There is more diversity then there was decades ago, but the reduction in the demand for legal services over the past 8-9 years has led to less diversity in some areas. Corporate legal departments, such as those for Fannie Mae, Microsoft and Google, have made a lot of progress is recruiting more diverse staffs. They have programs to tackle implicit bias.

Rush recounted how when he relocated from West Virginia his thinking was changed by interacting with different ethnic and social groups. Cantanzaro discussed how the diversity initiatives at his former employer DuPont were greatly assisted by the support of the corporate heads of the company. Gromada talked about the successes of Corporate Counsel Women of Color, an organization she is involved with that has a program called My Life as a Lawyer, that reaches out to students. Mentoring younger lawyers can be an important way to build a more diverse staff. Gromada is proud to be able to tap into a diverse network she has developed of up and coming legal professionals.

The EDI's Diversity Pledge asks lawyers who take it to encourage diversity in the hiring and promotion of electronic discovery professionals. The EDI believes strongly in the importance of mentoring, and has developed a program to pair up senior professionals with young people entering the field.


 
 

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page