Featured Image

Data Sharing: The Future of Data Consumption?

Data Science and AI

By FactSet Insight  |  June 2, 2021

Historically, the financial services industry was built around a terminal-based business model. However, industry participants now increasingly rely on bulk data feeds, giving them access to vast quantities of data. This process is filled with numerous challenges, not the least of which is the time required to get a customer up and running with the data feeds they require. The growth of data sharing brings the promise of efficiencies in data transportation and ingestion.

Data Feed Challenges

To implement a new data feed or FTP (File Transfer Protocol) process, firms are required to do some or all of the following:

  • Work with their IT staff
  • Provision hardware
  • Get their preferred database technology established and operational
  • Open ports
  • Create an ETL (extract, transform, and load) process or install a vendor loader

Ultimately all of this takes timeprecious time that could be used by the customer to explore the content in a trial scenario or to generate value for their firm.

With a myriad of database technologies available, each with its own proprietary method of loading, many data providers are starting down the path of creating a custom ETL process for each technology system they want to support for their clients. To compound this, even within systems that allow easy and relatively open access between a provider (such as FactSet) and their consumers/customers, geographical differences in where they source their data and where they want to consume it can further exacerbate the problem. It often makes sense to explore means of replicating data or ETL regionally to be able to provide quick and easy access to those customers.

There are additional obstacles:

  • Some datasets, such as security tick history, can contain hundreds of terabytes of data. Previously, datasets of this size would have been impossible to deliver via standard data feeds. However, within the past few years, technological advances have provided potential avenues for productizing these massive datasets and getting them into the hands of customers.
  • In terms of data governance, data vendors need data/software providers to do everything in their power to ensure that customers are using their data appropriately. Recent improvements in this space to provide audit logs allow data providers to better enforce permissions on the data they share.
  • Data consumers are constantly looking for increased transparency into when, how, and why data was updated. When a customer is managing their ETL, they often find their own solutions, whether that’s maintaining the content in a point-in-time system or keeping an archive of deltas to reference as needed. As data providers take on ownership of managing that ETL on behalf of the client, they are finding that they need to increase this transparency as much as possible.

Industry Shift to Data Sharing

Fast forward to today and data providers are expanding their delivery capabilities by transporting their content to clients via a data share. Conceivably, there will be a time in the future where no one wants to manage ETL and hardware themselves for a third party’s content. Is data sharing the future of data enablement?

The ease of onboarding (and offboarding), the ability to quickly connect disparate content sets, and the availability of content that would have been considered “niche” by connecting to content anywhere/anytime may prove too much of a benefit for customers to ignore. Combine that with the ability to tell customers that (1) they can easily access data in any system that implements the open standard, and (2) they’re not locked into having to do everything in a single platform or location (government regulations notwithstanding), and we may be witnessing an industry sea change.

FactSet partner Databricks just announced the open-source protocol called Delta Sharing that promises to make it easy and secure to share existing, live data in data lakes/lakehouses with support for a wide range of clients by using existing, flexible data formats, strong security, auditing, and governance while efficiently scaling to massive datasets. FactSet believes that Delta Sharing will make it easier for our clients to ingest our content, regardless of the platform or tools they are using. By having an open standard adopted by major players in the market, there will be even less of a fuss if a client comes to us requesting any of the systems that have implemented the open standard of Delta Sharing. It will “just work.”

New call-to-action

Comments

The information contained in this article is not investment advice. FactSet does not endorse or recommend any investments and assumes no liability for any consequence relating directly or indirectly to any action or inaction taken based on the information contained in this article.