Litigation Support Tip of the Night

May 6, 2019

Recently I received a PDF of list of Bates numbers that I had to look up in a Relativity workspace.   I converted the PDF to an Excel spreadsheet using Adobe Acrobat, and copied the column containing the Bates numbers into the field for the document identifier in Relativity.   Unexpectedly, the Bates numbers which contained hyphens did not come up in the search results.   What went wrong?

One of the Bates numbers from the exhibit list that I couldn't find was composed of these characters [note:  these are not real Bates numbers, but the hyphens are]:

XYZ‐000818

However, the Relativity workspace did contain a Bates number as a document identifier that was composed of these characters:

XYZ-000818

What's the difference?  It's hard to see but the hyphen or dash between the letter prefix and the six digit number is different.   See this analysis in Excel:

The IF . . . THEN formula shows the two Bates numbers are not actually the same.   The UNICODE formula reveals that the two hyphens are actually distinct characters. 

Unicode 45 is the hyphen-minus character used in cell B1.   Unicode 8208 is a regular hyphen used in cell A1.   These characters can be read differently by document review platforms and text editors.

Be sure to also account for en dashes  – (unicode 8211 entered with ALT CODE 0150) and em dashes — (unicode 8212 entered with ALT CODE 0151). 

October 2, 2016

The following are default delimiters used in Relativity load files:

The character used for columns is ¶ and it is ASCII 20.

The character used for quotes is þ and it is ASCII 254.

The character used for line breaks (in document text) is ® and it is ASCII 174.

The character used for multiple values is ; and it is ASCII 59.

The character used for nested values (folder levels) is \ and it is ASCII 92.

The last three characters you probably recognize - a registered sign, a semi-colon; and a backslash.   The first two are more unusual.   Many people call a ¶ a paragraph mark, but the correct term is a pilcrow.   It's always good to use a fancy word when you can.    The funny looking P character, þ , is a thorn - a letter used only Icelandic.    Press ALT + 0254 on your number pad to enter it.

September 27, 2016

You can detect the encoding a text file in NotePad ++ by looking on the bottom right, as shown in the red box in the screen grab below.

The manual for Relativity Desktop Client, the software that Relativity uses to import and export meta and documents, states that, "Relativity does not support Unicode .opt files for image imports. When you have a Unicode .opt file, you must resave this file in ANSI/Western European encoding."    So ANSI ,  the American National Standards Institute, ASCII encoding is referred to by KCura as Western European text.     If you have an image load file in UTF-8 unicode or something else you need to convert it. 
 

June 29, 2016

Relativity allows admins to import load files that have either Western European (Windows) or Unicode (UTF-8) encoding.   What is the difference between the two?     Western European (Windows) or ANSI (Windows-1252) text is a small extension of the standard ASCII text English character set that includes characters used in other Latin alphabet European languages.  This chart shows the full character set:

UTF-8, or Unicode, consists of more than 128,000 characters, accounting for Greek, Chinese, Cyrillic, Japanese and many other non-Latin alphabets.   For a fuller discussion, see the Tip of the Night for November 25, 2015.     In Relativity if you attempt to import Unicode text into a field that is not Unicode enabled, you'll get scrambled results.   You can set a field for Unicode by going to Administration . . . Fields.

If you need to quickly determine which encoding a load file uses, download File Encoding Checker from CodePlex.   See this page, https://encodingchecker.codeplex.com/ .    In the file mask box, simply enter a string such as *.dat to find all of the load files.   Then click 'View'.   You'll get a list showing each file's encoding.  

Please reload

Please reload

Sean O'Shea has more than 15 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

 

All content provided on this blog is for informational purposes only. The owner of this blog makes no representations as to the accuracy or completeness of any information on this site or found by following any link on this site. The owner will not be liable for any errors or omissions in this information nor for the availability of this information. The owner will not be liable for any losses, injuries, or damages from the display or use of this information.

 

This policy is subject to change at any time.