As adoption of Large Language Models (LLMs) accelerates across sectors and industries, it's crucial to avoid succumbing to tunnel vision and remember the decades of innovation and proven capabilities of existing artificial intelligence (AI) techniques. Given AI technologies as a whole have the potential to significantly enhance how businesses operate, the key to future success will likely lie in fusing the various strengths of an ensemble of methodologies and using the correct approach for the problem at hand.
To merge the capabilities of old and new techniques, it’s essential to first understand the strengths and weaknesses of LLMs. A recent subset of AI, LLMs are powerhouses that process and produce human-like text by analyzing immense amounts of written content.
At their core, these models are large-scale neural networks—a computer model that attempts to mimic the way the human brain processes and learns information. They are trained with diverse examples from books, articles, websites, and other text sources to learn the statistical patterns of human language. That said, an LLM's primary function is to use all that information to “simply” predict the next most probable word given a specific context—the text provided to it as part of the question and the model’s current response.
Despite processing text one word at a time, LLMs such as ChatGPT, Claude, Llamma, Bard, and others can generate seemingly human responses that are coherent and contextually relevant—for example, they answer questions, summarize texts, predict words, and compose sentences and paragraphs.
However, LLMs lack awareness of any objectives they're trying to meet and cannot correct any errors they produce. Once an LLM generates a word, it progresses to the next, unable to plan ahead or adjust what came before.
Several strategies are being developed to overcome these LLM weaknesses. One method is to carefully prompt models to provide step-by-step explanations of their solutions and thereby break down tasks into simpler, manageable ones. This, in turn, refines the context the LLM is working with and thereby conditions it to improve the relevance of the upcoming words.
There are also methods that use multiple LLMs to discuss the solutions to a problem or search through various generated content to find the best fit for the user-specified query. Here, one LLM might be used to generate multiple potential responses, with a second LLM selecting the one it deems best overall.
Notably, while recent advancements have significantly improved the efficacy of LLMs, it's essential to remember that LLMs are suited mainly for text interpretation and generation. LLMs are not a catch-all solver for every type of question.
Being language models, LLMs falter with tasks involving basic arithmetic. There are, of course, plenty of training examples showing 5 + 5 = 10, and when queried the model might even provide the correct answer. There is even evidence to suggest that, based on the scope of the training provided, LLMs might begin to extrapolate basic rules of arithmetic. However, when making life-and-death decisions or working in situations with millions of dollars on the line, its likely better to just use a standard calculator for now.
Therefore, one way forward is to set a parallel process to understand when a calculation is needed. This parallel process would survey the tools it has available to it and run the appropriate calculation; like a simple addition or an entire risk model. Once the calculation is ready, the LLM can include the resulting value as part of the final text.
This is typically where the concept of plug-ins come in. They provide users and computer engineers with the flexibility to extend and enhance the functionality of an application, making it more versatile, customizable, and capable of meeting specific requirements. In the scenario we’re discussing here, plug-ins send parameters and requests to other services—like FactSet, Expedia, Zillow, or any other internal or external application—and then collect and consolidate the replies for the LLM to use.
With plug-ins for their LLMs, companies can do more with increasingly nuanced and complex cases, such as when the LLMs interpret a user's query and break it down into manageable steps. For example: “Compare the recent performance of Apple to its competitors.”
The LLM can segment the task into distinct steps:
Find competitors of Apple
Find the time series of recent performance measures for the companies
Compare the companies
Summarize the findings
Even though an LLM can generate them, given the steps just described a simple LLM would struggle to execute the operations and would likely hallucinate answers. Instead, other tools and techniques can be combined to provide the correct computations. A data provider like FactSet, for example, already has plenty of data and techniques to solve each of the four steps individually.
A database lookup of companies operating in the same domain or providing the same services will establish the competitors. Historic fundamentals data for these companies can be scoped down and aggregated for overall performance. And there are many techniques for comparing companies based on a number of metrics.
Here, a separate LLM could even help draft the appropriate SQL query to retrieve the information on the fly. But existing specialized processes handle the actual computations. Once all this information is formatted into the context, only then will an LLM articulate a human-understood response.
Finally, LLMs might prove to be overkill for very tactical decisions, where existing models already perform well. For instance, imagine analyzing an extensive collection of product reviews. One might typically analyze the texts to identify emerging patterns of complaints. While LLMs are efficient at grouping similar cases, this process is a black box and one that is not readily tunable.
Furthermore, because the size of the context is currently limited, only a small amount of text can be processed at a time. This is on top of the computation costs of running an LLM repeatedly on a very large dataset. In such cases, a traditional approach such as topic modeling can readily cluster the documents initially, followed by employing the LLM for tasks like summarization, entity extraction, and possibly sentiment analysis on each identified cluster.
LLMs are a fundamentally powerful technique, the potential uses of which the research and business communities are only beginning to appreciate. But in the midst of the excitement remember that, like any algorithm, they have their limitations.
Here are four main scenarios for companies to consider as they integrate LLMs into their workflows:
In cases of traditional natural language tasks like question answering or summarization, use an LLM directly.
In cases where a computation is necessary, use LLMs to establish a user’s intent and use an existing ML or data-retrieval algorithm to acquire the provably correct answer.
When interpretability or efficiency is needed for a Natural Language Processing (NLP) task, consider one of the existing methodologies for a functional solution. A LLM can then provide appropriate refinements where needed.
For tasks that don’t involve any natural language processing, consider using other traditional machine learning tools.
It is in understanding the limitations of Large Language Models that we can know how and when to utilize them with all the original time-tested methodologies to create truly groundbreaking solutions.