AI Strategies Series: How LLMs Do—and Do Not—Work

Written by Lucy Tancredi | Jun 3, 2024

Artificial Intelligence technologies—and in particular the Large Language Models that drive generative AI—have impressive capabilities that can fuel productivity. You may already be using generative AI for text summarization, content creation, or sentiment analysis. But many professionals have avoided using it altogether because of the risks associated with it. Understanding the inherent hurdles is essential to successfully navigate them and maximize the value you get from generative AI.

This first article in our six-part series is intended to increase awareness of the main hurdles and how to overcome them. We elucidate how generative AI technology works, which will help empower you to use it most effectively. Articles two through six are linked at the end of this article.

Let’s begin by framing the discussion with three analogies to help clarify how Large Language Models work, and why they don’t always work the way we might expect them to.

Predictions

The first analogy relates to a technology you probably use every day. Your phone has a “predictive text” feature that, given some text you’ve written, predicts three possibilities for the next word.

For an amusing diversion, start a sentence on your phone and continuously use predictive text to complete a paragraph. In a popular Internet variant, individuals use predictive text to write an epitaph starting with “Here lies [Name]. S/he was…” and then choose from the phone’s auto-suggest options to complete it. The results are somewhat amusing, a little random, and in no way based on fact.

Your phone predicts text using older, smaller language models than today’s powerful LLMs. While your phone proposes one word at a time, modern LLMs like LLaMA and GPT-4 can generate entire pages of coherent, relevant content.

In essence, modern generative AI like ChatGPT is like your phone’s predictive text on steroids. Understanding this can help explain why hallucinations happen. The text generated is predictive based on common language patterns, not factual based on research.

Consider a Large Language Model predicting a word to follow the phrase “the students opened their.” Based on its training, the LLM determines that “books” is the most likely next word. The important concept to understand here is that the LLM is not looking up data in a database, it is not searching the web, and it is not “understanding” the text. Rather, it is using statistical correlations or patterns that it has learned from the large datasets of text that it was trained on.

Language Comprehension

A second analogy underscores the point that LLMs do not “understand” language. In 1980, philosopher John Searle introduced The Chinese Room argument to challenge the notion that computers with conversational abilities are actually understanding that conversation.

In his scenario, a person who does not understand Chinese is placed in a room with instructions written in English. These instructions provide rules on how to manipulate Chinese symbols in response to questions written in Chinese, which are slipped into the room. The person follows the given instructions to produce appropriate responses in Chinese, which he sends back out of the room. This fools a Chinese speaker outside of the room into thinking he is communicating with a Chinese speaker. In reality, the person inside has no understanding of Chinese; he is simply following a set of rules.

Yann LeCun, Chief AI Scientist at Meta, has said that “Large language models have no idea of the underlying reality that language describes. Those systems generate text that sounds fine grammatically and semantically, but they don’t really have an objective other than just satisfying statistical consistency with the prompt.”

Data as Trainer Vs. Database

Our final analogy helps explain how Large Language Models use their training data and why they can’t use it as reference data. Assume the individual pieces of training data that feed an LLM are like individual pieces of fruit being fed into a blender. Once the model has been trained, what is left is like a fruit smoothie. You no longer have access to the individual pieces of fruit.

This analogy will come into play later in the series when we discuss why an LLM is different than a database that can be searched for facts, why LLMs can’t point to the specific pieces of training data that led to their answer, and why specific pieces of data cannot be surgically removed from an LLM once it has been trained. In other words, if you regret adding spinach to your smoothie, you can’t take it out after the fact.

Conclusion

Generative AI can help organizations increase productivity, enhance client and employee experiences, and accelerate business priorities. Simply having overall awareness and a better understanding of how Large Language Models do and do not work will make you a more effective, safer user.

To continue learning more, read the additional articles in the series:

7 ways to overcome hallucinations

Explainability

Inconsistent and outdated responses

Security and data privacy

Legal and ethical landscape of generative AI

This blog post is for informational purposes only. The information contained in this blog post is not legal, tax, or investment advice. FactSet does not endorse or recommend any investments and assumes no liability for any consequence relating directly or indirectly to any action or inaction taken based on the information contained in this article.

View full post