Outline of Craig Ball's Electronic Discovery Workbook - Search is a Science The Streetlight Effe
Here's a continuation of my outline of the 2016 edition of Craig Ball's Electronic Discovery Workbook which I last posted about on May 6, 2018.
The chapter entitled, "Search is a Science The Streetlight Effect in e-Discovery" is a critique of searching for things in the easiest ways. A drunk tells a cop that he is searching for his keys under a streetlight. The cop asks if he is sure he lost them there, and the drunk replies that he actually lost them in the park but is searching in the street because the light is better. Hence the 'streetlight effect'. Ha ha.
Ball faults lawyers for believing that a single set of keywords can be used to search email systems, data archives, removable media and databases, when such systems don't use the same syntax or search tools. Boolean, proximity, and stem searches cannot be used in all systems, and some databases utilize specialized query languages. A company that has used journaling to prepare an index of email archives, may only search the index and not the messages and their attachments.
Interrogatories should request the following information:
1. The rules to tokenize data to make it searchable. Tokenization determines which character strings in documents are identified as words. Different systems use different tokenization rules. Compound words, phrases with numbers and words with diacritics can be handled in different ways.
2. Stop words for an index. Stop or noise words don't appear in an index.
3. The number of documents which don't have extractable text or searchable metadata. Encrypted or compressed files may not be searchable.
4. Limitations on keyword, Boolean or proximity searching. Tests should be run in order to confirm how search tools function.