Featured Image

AI Strategies Series: How LLMs Do—and Do Not—Work

Data Science and AI

By Lucy Tancredi  |  January 31, 2024

Artificial Intelligence technologies—and in particular the Large Language Models that drive generative AI—have impressive capabilities that can fuel productivity. You may already be using generative AI for text summarization, content creation, or sentiment analysis. But many professionals have avoided using it altogether because of the risks associated with it. Understanding the inherent hurdles is essential to successfully navigate them and maximize the value you get from generative AI.

Today we introduce the first article in a six-part series intended to increase awareness of the main hurdles and how to overcome them. This article elucidates how generative AI technology works, which will help empower you to use it most effectively.

In parts two through six in the upcoming weeks, we will discuss the key AI concepts of hallucinations; explainability; inconsistent responses and outdated knowledge; security and data privacy; and legal and ethical considerations.

Let’s begin by framing the discussion with three analogies to help clarify how Large Language Models work, and why they don’t always work the way we might expect them to.


The first analogy relates to a technology you probably use every day. Your phone has a “predictive text” feature that, given some text you’ve written, predicts three possibilities for the next word.

For an amusing diversion, start a sentence on your phone and continuously use predictive text to complete a paragraph. In a popular Internet variant, individuals use predictive text to write an epitaph starting with “Here lies [Name]. S/he was…” and then choose from the phone’s auto-suggest options to complete it. The results are somewhat amusing, a little random, and in no way based on fact.


Your phone predicts text using older, smaller language models than today’s powerful LLMs. While your phone proposes one word at a time, modern LLMs like LLaMA and GPT-4 can generate entire pages of coherent, relevant content.

In essence, modern generative AI like ChatGPT is like your phone’s predictive text on steroids. Understanding this can help explain why hallucinations happen. The text generated is predictive based on common language patterns, not factual based on research.

Consider a Large Language Model predicting a word to follow the phrase “the students opened their.” Based on its training, the LLM determines that “books” is the most likely next word. The important concept to understand here is that the LLM is not looking up data in a database, it is not searching the web, and it is not “understanding” the text. Rather, it is using statistical correlations or patterns that it has learned from the large datasets of text that it was trained on.


Language Comprehension

A second analogy underscores the point that LLMs do not “understand” language. In 1980, philosopher John Searle introduced The Chinese Room argument to challenge the notion that computers with conversational abilities are actually understanding that conversation.

In his scenario, a person who does not understand Chinese is placed in a room with instructions written in English. These instructions provide rules on how to manipulate Chinese symbols in response to questions written in Chinese, which are slipped into the room. The person follows the given instructions to produce appropriate responses in Chinese, which he sends back out of the room. This fools a Chinese speaker outside of the room into thinking he is communicating with a Chinese speaker. In reality, the person inside has no understanding of Chinese; he is simply following a set of rules.


Yann LeCun, Chief AI Scientist at Meta, has said that “Large language models have no idea of the underlying reality that language describes. Those systems generate text that sounds fine grammatically and semantically, but they don’t really have an objective other than just satisfying statistical consistency with the prompt.”

Data as Trainer Vs. Database

03-blenderOur final analogy helps explain how Large Language Models use their training data and why they can’t use it as reference data. Assume the individual pieces of training data that feed an LLM are like individual pieces of fruit being fed into a blender. Once the model has been trained, what is left is like a fruit smoothie. You no longer have access to the individual pieces of fruit.

This analogy will come into play later in the series when we discuss why an LLM is different than a database that can be searched for facts, why LLMs can’t point to the specific pieces of training data that led to their answer, and why specific pieces of data cannot be surgically removed from an LLM once it has been trained. In other words, if you regret adding spinach to your smoothie, you can’t take it out after the fact.


Generative AI can help organizations increase productivity, enhance client and employee experiences, and accelerate business priorities. Simply having overall awareness and a better understanding of how Large Language Models do and do not work will make you a more effective, safer user. In the meantime, watch for part two next week in this six-part series: 7 ways to overcome hallucinations.


This blog post is for informational purposes only. The information contained in this blog post is not legal, tax, or investment advice. FactSet does not endorse or recommend any investments and assumes no liability for any consequence relating directly or indirectly to any action or inaction taken based on the information contained in this article.

New call-to-action

Lucy Tancredi

Lucy Tancredi, Senior Vice President, Strategic Initiatives - Technology

Ms. Lucy Tancredi is Senior Vice President, Strategic Initiatives - Technology at FactSet. In this role, she is responsible for improving FactSet's competitive advantage and customer experience by leveraging Artificial Intelligence across the enterprise. Her team develops Machine Learning and NLP models that contribute to innovative and personalized products and improve operational efficiencies. She began her career in 1995 at FactSet, where she has since led global engineering teams that developed research and analytics products and corporate technology. Ms. Tancredi earned a Bachelor of Computer Science from M.I.T. and a Master of Education from Harvard University.


The information contained in this article is not investment advice. FactSet does not endorse or recommend any investments and assumes no liability for any consequence relating directly or indirectly to any action or inaction taken based on the information contained in this article.