top of page

Latent Semantic Indexing

The Wikipedia page on Latent Semantic Analysis, the mathematical model that conceptual analytics in Relativity uses, gives a good demonstration of how a matrix is formed with words on separate rows, and documents in separate columns, to find the distance between conceptually similar items.


Latent Semantic Indexing assumes that its initial calculation of the number of terms in each documents is too conservative. It needs to recalculate to account for the number of terms related to each document. The docuuments are analyzed to associate those which contain similar words.



The drawbacks of Latent Semantic Indexing are that:

  1. The math may justify terms being closely associated which have no real relationship, since the average of a word's meaning in the data set is used. But this discrepancy will be decreased where words in the document set are used consistently in a predominant context.

  2. In a Bag of Words model, the text is considered to be a bunch of words in no order. N-grams letter sequences are used to find relationships between terms.

  3. LSI uses a probabilistic model for the sample data that does not necessarily match up with the actual sample data.

Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page