FactSet Insight - Commentary and research from our desk to yours

Teaching the Language of Finance to Machines

Written by Siddhartha Gupta | Feb 3, 2022

Organizations that fall within the realm of financial services like wealth managers, insurance companies, credit card providers, and investment/commercial banks are under more pressure than ever to digitize operations and build customer trust. Over the past several years, much has changed for this industry, including these companies’ top initiatives. According to Salesforce’s 2020 poll of nearly 2,800 global business leaders, implementing new technologies and improving customer loyalty have become predominant priorities. More is expected from these organizations to offer heightened personalization while still delivering 24/7 customer care.

In addition, while the COVID-19 global pandemic has demonstrated just how productive a remote workforce can be and how much work a distributed team can accomplish, there is a flip side.

Remote work, isolation, and self-reflection have triggered a wave of resignations, transfers, and turnovers. More and more, subject matter experts (SMEs) are leaving organizations and senior leaders are resigning to explore their passions. These changes can leave financial institutions at risk on multiple fronts, which can include:

  • Loss of market expertise, creating knowledge gaps 
  • Higher work intensity, leading to mistakes and burnout
  • Relationship attrition when client managers leave

One way to solve this problem is to capture knowledge and subject matter expertise before it's too late. The trouble is that 80% of enterprise knowledge lives in unstructured content created by SMEs. The formats include Word documents, PowerPoint decks, PDF documents, emails, and webpages, all of which are hard to compile—so organizations rely on those experts’ minds. If there were a way to “read” and “understand” this knowledge, then the risk of losing it could be largely mitigated.

How can natural language processing can be used to teach the language of finance to machines?

However, teaching a machine to read is hard—teaching it to understand is even harder. This is where natural language processing (NLP), a branch of artificial intelligence that strives to give machines the ability to understand and respond to natural language, comes in handy. 

The financial services sector uses complex language and concepts, so let's look at how NLP can be used to teach the language of finance to machines. 

Creating a Language Model 

The goal of a language model is to create a statistical representation of words that occur in proximity to each other. When working with numbers, it's easy to tell what comes next given a sequence. E.g., if you are asked to guess what number comes next in the sequence 2, 4, 6, 8, it's relatively straightforward to teach a machine to predict the correct answer. If the machine predicts 9.5, you know it's close and a little less than the right answer. If it predicts 10.2, it's even closer, but a little high. With this feedback, the machine will eventually figure out that the correct answer is 10. 

With words, the problem is slightly more complex. Words don’t follow a set sequence and if the machine predicts a word, it's difficult to tell if it's close or far from the desired word. For example, if you were asked to guess what word completes the sentence “Bank of America slashes overdraft _____,” there are potentially multiple correct responses, but context is important.

The correct guess here is “fee.” If we were to randomly pick words from the dictionary and apply them in this sentence, “feel” would come very close to “fee,” but it's not a good guess because alphabetical closeness is irrelevant. However, if we were to guess “charge,” then we’d be close to the actual answer. In financial literature, there is a much higher likelihood that “overdraft/fee” or “overdraft/charge” have appeared next to each other than “overdraft/feel.” In the case of words, closeness is defined by the context and a word is best understood by the company it keeps. 

Language models help define the statistical likelihood that two words occur close to each other in a given domain.

Language models help define the statistical likelihood that two words occur close to each other in a given domain. The concept of domain is quite important—in the geology domain, the words “bank” and “river” are close to each other but in the finance domain they are not. 

A language model is built in the vector space where a word or a sentence is converted into its equivalent vector, based on all the other words or sentences around it. A language model for finance can be built from scratch if a large corpus of financial text is available or an existing language model can be extended to learn the finance jargon by retraining it. 

Developing a Symbolic Model

Consider two questions:

  1. Which bank has Jamie Dimon as its CEO? 
  2. Which bank has Judith Kent’s husband and a Harvard Business School graduate as its CEO? 

The answer to both questions is JPMorgan Chase but the way the answer is deduced for each of these questions is different. In the first question, there is a direct connection between Jamie Dimon and JPMorgan Chase via the CEO link. In the second question, we first determine the husband of Judith Kent, confirm that he went to Harvard Business School, and then use that person’s name to make the CEO link to JPMorgan Chase. 

Symbolic knowledge models are comprised of taxonomies and ontologies—recent advancements in NLP allow them to be built automatically.

A language model can answer the first question but not the second question—that's where symbolic models are useful. Symbolic knowledge models are comprised of taxonomies and ontologies. A taxonomy defines a hierarchy of entities e.g., JPMorgan Chase > Investment Bank > Bank > Financial Institution. An ontology defines relationships between entities e.g., JPMorgan Chase > headquarters > New York. 

Traditionally, ontologies and taxonomies are usually hand-crafted by experts who work in the industry and are a labor-intensive process to build. Recent advancements in NLP do allow for taxonomies and ontologies to be built automatically. Models can pick up patterns in sentences, as shown in the examples below.

  • The average rate on a 30-year fixed mortgage is an Annual Percentage Rate (APR) of 3.61%.
    1. APR - is_same_as - Annual Percentage Rate
  • Solana prioritizes scalability, but a relatively less decentralized and secure blockchain.
    1. Solana - high_priority - scalability
    2. Solana - low_priority - decentralization
    3. Solana - low_priority - secure blockchain
  • Solana and other blockchains could grab market share from Ethereum.
    1. Solana - is_a - blockchain
    2. Ethereum - is_a - blockchain

These taxonomies and ontologies may not be as robust as the ones hand-crafted by humans, but they are a good first pass and can digest huge amounts of knowledge in a short period of time. 

Conclusion 

Despite the challenges faced by today’s financial institutions, the demand for personalized customer service through online or in-person experiences continues to grow. With an increasing number in the financial workforce either retiring or simply resigning due to burnout or other reasons, the industry will need to apply innovative technology to keep up with that demand. This can only be fully achieved when subject matter expertise and experience are accurately captured through knowledge management and machine learning processes like NLP.

Learn more about Nesh’s Subject Matter Avatars.

Disclaimer: This blog post has been written by a third-party contributor and does not necessarily reflect the opinion of FactSet. The information contained in this article is not investment advice. FactSet does not endorse or recommend any investments and assumes no liability for any consequence relating directly or indirectly to any action or inaction taken based on the information contained in this article.