As a financial professional, you examine countless documents to gain insights and make smart decisions. A simple news search or alert can yield dozens or more stories and reports, for example. Do you have time to determine what’s valuable and zero in on the most important points?
You could hire a human assistant, train the person and check their work for a few weeks. That might make you twice as efficient. What if, instead, you could spend a little more time upfront to train a robotic you, one that would work day and night, and could increase your efficiency tenfold?
Here’s an example. Consider an analyst covering the plant-based meat industry. Her goal is to identify new plant-based meat products coming to market. She might want to read every press release from the public companies she covers. She may even want to read press releases of private companies that are pushing the industry forward with new technologies and products.
She could increase her efficiency and productivity not by reading faster, but by using two game-changing machine-learning services that act as extensions of the human brain.
To address the pain point described above, the analyst could help create and train a Machine Learning Classifier. Once trained, the classifier would sort through all documents that match the analyst’s search terms and deliver to the analyst only those relevant to her workflow, potentially saving her hours of reading time.
How does that classifier work? On day one, the classifier doesn't know anything. It relies on humans to provide sample documents tagged as relevant or not relevant. This can take some time, but it makes a huge difference in the machine’s ability to learn the analyst’s perspective on what is important.
This process adds very little time to her workflow; she will continue to read documents and decide which ones are valuable. She will only need to keep a record of which ones are good and which ones can be ignored. Once she has collected a good data sample, machine-learning engineers will start their process.
First, they will transform words into numbers. They will use a statistical method called Term Frequency-Inverse Document Frequency (TFIDF) to identify words that are most or least unique within a document, compared with all words in all documents in the sample set.
Using the outcome of the TFIDF process, engineers will select the best machine learning classifier for the task. Every sample document helps the model learn what’s useful. The more samples provided, the better it performs as a trusted partner to the analyst.
If you would be willing to train a person who works eight hours a day, it makes sense to instead consider training a machine that will work continuously.
Another option for the financial analyst is a Question-and-Answer service. This machine learning model is most effective when an analyst wants to answer the same question across a large set of documents. For example, the analyst might want to know:
What are new products?
What are new plant-based products?
What does this company make?
There are two types of Question-and-Answer models. Both models use a technique called “word embedding” that allows a machine to match concepts rather than exact words. The difference between these models lies in the way they create answers:
A generative model creates answers by drawing on (but not solely relying on) the words from a document. It needs to be trained with many sample documents, which increases its ability to provide correct answers, even when fed complex text that could contain repetitive or conflicting information.
An extractive model creates answers using words or phrases as they appear in the document. Even though the model is not trained, it may still perform well in many scenarios. For example, you might use an extractive model on historical Salesforce notes to ask, “What did the client want?” and “What are next steps?” In our plant-based meat example, you might ask, “What are new products?” The example below uses a FactSet service of this type. As you can see, it produces useful results for a variety of questions.
Both the generative and extractive models can return one or more answers for each question along with a confidence score.
If your profession requires you to read dozens or more news articles and documents every day, a Relevancy Classifier or Question-and-Answer service can greatly increase your productivity. They will help you quickly identify new insights and free up time to do what you do best: think, create and communicate. Partner with your company's technology engineers to learn how the FactSet Natural Language Processing API can help you with this.
This blog post is for informational purposes only. The information contained in this blog post is not legal, tax, or investment advice. FactSet does not endorse or recommend any investments and assumes no liability for any consequence relating directly or indirectly to any action or inaction taken based on the information contained in this article.