top of page
  • May 26, 2020

When considering which electronic discovery software to use, confirm that its processing is unicode compliant - that it can search and display foreign language documents which use Asian characters or the Cyrillic alphabet. Processing which is not unicode compliant may generate text with boxes or random symbols - something most of us have had the misfortune of encountering before.

UNICODE

ASCII

The ASCII character encoding only supports the Latin alphabet, and is limited to 128 characters. The UTF-8 unicode character encoding can support more than a million different characters and also covers Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Chinese, Japanese, Korean, and other major languages. The first 128 characters of UTF-8 are the same as the 128 characters of ASCII. Most text on internet web pages uses the UTF-8 encoding. It's important to confirm with an e-discovery vendor that the tools they are using for data collection and processing support unicode encoding.

Not all data is in unicode. Email systems may use their own proprietary encoding systems. In Japan the Shift-JIS format is widely used for email text. MS Exchange uses unicode for PST archives. However individual email messages saved with a .msg extension don't use unicode for the email header fields. Tools which collect local .msg files may garble the text of email headers unless an adjustment is made for the Outlook encoding.

Even if processing software is unicode compliant, it will still be necessary to use separate language detection software to determine which languages are present. Identifying the encoding can determine the alphabet, but not necessarily the language.


 
 

The United States District Court for the Northern District of California has posted a checklist on its site, for parties to consult when addressing ESI issues at Rule 26(f) meet and confer.

The preservation of ESI concerns not only ascertaining a relevant date range and the identity of relevant custodians, but preparing a list of systems that contain data not associated with individual custodians and deciding to stop data destruction programs.

The parties should prioritize discovery from specific systems - whether it be email, accounting or other systems.

A specific e-discovery liaison should be appointed.

In weighing whether or not the cost of electronic discovery will be proportional to the amount of claims involved, the parties should consider sharing an e-discovery vendor and using a common platform to host their data.

The ESI for the custodians most likely to have relevant data should be in the first phases of discovery.

The production should not degrade the, "inherent searchablility of ESI".


 
 
  • Apr 23, 2020

The electronic discovery service provider, Lexbee, has a very handy 'eDiscovery Calculator' on its site here. Its purpose is to give you a page count for electronic files based on the number of GBs of those files in your data set.

Lexbee has its own standards but you can adjust how many pages of emails or PDFs you want to assume per GB. (It works with a standard of 2500 pages per box.)

While the calculator would be easy to replicate in Excel or another program it's interesting to compare Lexbee's baselines against those of other electronic discovery vendors.

Its baselines are very similar to those of Superior Document Services . . .

. . . as well as LitiDOX

. . . but far higher than those of Catalyst:


 
 

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page