top of page

Language Identification


In order to identify the languages used in a data set in Relativity, take the following steps.

1. In a Workspace go to Indexing & Analytics, and select Structured Analytics Set.

2. Create a new set, and after assigning it a name and prefix, also select the set you want analyze. In the Select Operations section, check off, 'Language identification'.

3. Next from the console on the right, click on 'Run Structured Analytics'.

4. Three stages will follow. The analysis will be set up, and the file size will be calculated.

5. In stage 2, the structured analytics operations will run.

6. In the final stage, the results will be imported into Relativity.

7. After the Structured Analytics Set has run, click on the 'View Language Identification Summary' in the console.

A report will be created detailing the primary language of documents in the data set, and the secondary language (if any).

The percentages in the reports are based on bytes of text.

Up to three languages can be identified in each document. The Analytics engine can detect some languages, such as Thai and Greek, solely on the basis of the characters that are unique to those languages. Chinese, Japanese and Korean are identified on basis of single letter, whereas other languages use quadgrams. Punctuation is ignored. Three to six languages are considered for each quadgrams, and then added to a comprehensive log. Word lists are not used as a reference. Instead a training set is generated from web pages each of the 173 supported languages.


Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

​

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

​

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page