How We Use AI to Summarize Earnings Call Q&A Discussions

Written by FactSet Insight | Nov 8, 2023

Our clients know that analyst Q&A discussions with management in earnings calls can yield material insights. But who has time to listen and extract key themes from hundreds of calls? Large Language Models, properly engineered, can do this.

The purpose of this article is to share how FactSet developed a LLM capability to further streamline client workflows with Q&A summaries that:

Give clients back the time they would normally spend sifting through call transcripts
Speed up their ability to pinpoint areas for deeper research in the workstation
Enable them to focus more time on decision-making

We hope that sharing our experience and key learnings will help your organization with its own LLM journey. Following is a curated conversation with the FactSetters who led this effort: Gail Miller, Senior Director of Data Solutions Engineering, and Brian Merritt, Director of FactSet StreetAccount Content.

Why did you develop a generative AI solution to summarize earnings Q&A discussions for clients?

FactSet StreetAccount writers have global markets expertise, and investment professionals trust them to publish high-quality financial news summaries in the FactSet platform. During earnings season, listening to and summarizing earnings calls is a resource-intensive task for the finite team of news analysts.

It’s also time intensive for our clients to listen to multiple earnings calls every day during peak earnings season and highlight key insights. Recognizing these limitations amid the emergence of the ChatGPT 4 generative AI model, we opted to integrate this advanced technology to bridge the gap. This has enabled us to offer enhanced, value-add summaries for our users, freeing up their time for more strategic analyses. Here's a screenshot of the output in the FactSet workstation.

How would you describe the LLM development process?

Work began in May with our engineers and writers collaborating each step of the way, taking a deep dive to understand the new technology. Our goal was to develop a seamless process: Ingest a transcript, send it to ChatGPT, craft a summary in a StreetAccount writer's style, and then publish the content live—all while ensuring rigorous quality checks to prevent hallucinations.

In the initial phase, each of the roughly 1,000 summaries the model produced underwent meticulous evaluation among our writers. Prompts were created, fine-tuned, discarded, reconstructed, and fine-tuned again until results met our standards for quality and consistency. The high level of human monitoring at the outset was by design. Clients trust us to report accurately, and we wanted to ensure these summaries met our high standards.

By leveraging the capabilities of LLMs, we are now transforming thousands of raw transcripts from live earnings calls into concise and focused summaries, which our clients have said they truly value.

Large Language Models can make up or hallucinate information. How did you resolve that?

Initially we didn't observe major hallucinations or consistent quality issues in ChatGPT. However, a few months in we noticed some intermittent drops, so we dedicated substantial effort to refining prompts. Working side by side, the news analysts and engineers implemented a "back-check" procedure. It relies on ChatGPT to flag quality issues from its own output. It has helped prevent egregious inaccuracies that could stem from a ChatGPT hallucination.

How have you measured success, and what’s next?

While our ChatGPT use case may seem simple and straightforward, we were intentional with this approach. It enabled us to:

Swiftly establish the necessary technical framework, infrastructure, and tools
Continuously refine our methods to sustain quality and address ChatGPT’s limitations
Envision more advanced applications and future use cases

Although our approach still requires daily human oversight to guarantee precision and quality, the gains in efficiency are significant. To date, we've published over 3,000 summaries and broadened our reach to include the Russell 3000 and the TSX Composite. We’re on course to produce thousands of summaries spanning North America, Europe, and APAC.

By the close of 2023, we anticipate the output from our LLM-driven summaries will match the annual contributions of around 15 seasoned writers.

This blog post is for informational purposes only. The information contained in this blog post is not legal, tax, or investment advice. FactSet does not endorse or recommend any investments and assumes no liability for any consequence relating directly or indirectly to any action or inaction taken based on the information contained in this article.

View full post