AI's Token Limit
top of page

AI's Token Limit

Those of us trying to keep up with the increasing use of AI systems for legal work, have probably heard at some point that the Large Language Models which are the basis for ChatGPT and other AI tools, will give poorer results as more and more text is used to train the model. A 2024 article posted by the Databricks data analysis company, which is valued at $62 billion, (Quinn Leng, et al., Long Context RAG Performance of LLMs (Aug. 12, 2024), https://www.databricks.com/blog/long-context-rag-performance-llms ) found OpenAI's "GPT-4-0125-preview starts to decrease after 64k tokens, and only a few models can maintain consistent long context RAG performance on all datasets." RAG stands for retrieval augmented generation - it's basically the use of external documents for AI LLM systems.


This chart from the article shows how likely different AI models are to generate a correct answer when they use somewhere between 2K to 125K tokens.



But what's a token? In the context of AI a token is a string of characters that an AI system will use to detect relationships with other text strings broken into tokens. There can be more tokens than words in a block of text. Open AI's online Tokenizer will calculate the number of tokens in any text block you enter:


The token count can add up rapidly. The site, https://token-calculator.net/ generates 288185 tokens for the full text of Moby Dick.


As this example shows, not all words are classified as one token:


So if a study indicates there will be a performance decline after 64,000 tokens, keep in mind the poor performance that may result when working with document productions of several million documents.


Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

​

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

​

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page