At a Glance: Bitvore Cellenus Corporate, Market & Thematic Intelligence
This dataset provides quality, machine-readable data with event signals from unique sources such as news articles, press releases, online content, annual and periodic SEC filings, funding data, UCC filings, patents, earnings transcripts, and more.
Bitvore Cellenus data is derived from over 60,000 unique unstructured data sources, including categories of data like news articles, press releases, online content, annual and periodic SEC filings, funding data, UCC filings, patents, earnings transcripts, and more.
On a daily basis, Bitvore ingests up to five million pieces of content, running artificial intelligence (AI) and machine learning (ML) models to identify business relevant information and run clean-up and normalization processes to deliver quality machine-readable data with event signals. It also tracks themes attributed at the content level and rolls them up to an entity to produce company-level Growth, Risk, and Sentiment scores.
Asset Class: Public and Private Companies
Data Frequency: Daily
Delivery Frequency: Daily
History: Data available back to 2015
In the process from ingestion to scoring, Bitvore Cellenus data moves through six key stages:
Selection — The need for new sources is based on client feedback and product roadmaps. The selection process involves a robust comparison of sources that measures the quality of the content against Bitvore’s KPIs.
Integration — Through various AI & ML techniques, ingested data is cleaned to remove extraneous content, duplicates, or non-business relevant information. Content is also clustered into “similarity clusters” to provide a measure of volume of an event or story.
Normalization— Content is normalized to relevant entities. For example, Named Entity Recognition (NER) techniques are leveraged to identify content about specific companies, industries, and more. Relevancy scoring is also produced to further quantify the importance of an entity to a piece of content.
Signal Extraction— During this stage, content is run through over 300 unique natural language processing (NLP) models to identify signal events. For corporate data, there are over 150 signals identifying key business events (e.g., M&A, leadership changes, and labor changes), ESG events, and macro themes.
Analysis — Once raw unstructured data is converted to machine-readable data through the stages above, the data is transformed using various analytical methods. For example, sentiment analysis is performed on content to estimate a quantitative value of sentiment ranging from -100 to +100. Leveraging the latest techniques in NLP and deep learning, sentiment scores are assigned to millions of pieces of content daily in virtual real time. Content record sentiments scores are rolled up to company 90-day rolling averages of sentiment as well as into Growth and Risk scores.
Quality Assurance— Although QA is performed throughout all steps, there is continuous QA performed on content to ensure that KPIs continue to be met across 12 key areas:
Model Metrics: Precision, recall, drift, and bias
Content and Source Quality: Relevance, duplicates, and similarity clusters
Symbology: Historical integrity, accuracy of data, and maintenance of index benchmarks (e.g., the S&P 500)
Analytics: Sentiment and scoring backtesting to ensure traceability
Research and Market Risk Detection
Use thematic trending to identify vulnerabilities in your investment portfolio and leverage macro themes across Economic, Political, Financial, Business and Social/Technological areas. Currently Bitvore tracks 17 themes such as Sub-prime Lending, Pandemics, Protests & Strikes, 5G, Cybersecurity, Middle East Tensions, LIBOR to SOFR, Inverted Yield Curve, and more. Themes are continually added as they emerge.
Due Diligence/Risk Surveillance for Asset and Fixed Income Management
Track a static or a dynamic portfolio of fixed income or equity positions and see trending sentiment against market peers, feed predictive analytics, and recommendation engines.
The details provided above are as of October 2020.
If you have any questions or would like to learn more about any of the content mentioned above, please contact us at OFSupport@factset.com.