Data management is hard, with datasets that were niche or alternative now table stakes, growing data volumes for existing datasets, and teams looking for more sophisticated modeling adding to data management complexity. Coping with growing data volumes, different data needs per team, and rapidly changing infrastructure needs has led many firms to pursue digital transformation. Whether it’s to create a scalable data infrastructure, quickly evaluate new content types, provide more access points to programmatic users, or optimize costs, digital transformation is, in some form, underway across many companies.
Whether dipping your toe into the public cloud or you already have a robust data lake, you will have unique requirements for data pipelines. Your data needs are being driven by a combination of internal pressures such as costly data centers, shortages of technical staff, cost pressures, or trying to break down monolith architectures that add bloat to your research and development process. These pressures are then coupled with the needs of end users, which will vary based on technical proficiency and require performant solutions across SQL, Excel, APIs, or internal applications.
With all those complexities in mind, companies are looking for the most efficient way to leverage value from their vendors and proprietary data quickly. Where possible, they want to be able to save parts or possibly all of the Extract, Transform, and Load (ETL) process. The first evolution in the user/vendor relationship was the adoption of turn-key integrations into cloud data warehouses, which enabled zero-copy shares from any provider to end users. This entirely removed the burden of ETL from the end users, they only had to bring their own compute and connect vendor data with proprietary data within the same ecosystem.
Though groundbreaking at the time, these solutions were platform specific and therefore only scaled if you were running your workloads directly in a cloud data warehouse such are Snowflake or AWS Redshift. As clients looked at data needs more holistically, they desired greater flexibility to meet more end users.
Given the need for rapid innovation and many data technologies, delivering data directly into object storage, or cloud native file structure, is the next frontier of data sharing. Feedback from dozens of clients suggests this is for two main reasons:
The rise of table formats and the cross-compute flexibility they provide. Clients have said they want a simple way to consume meta data, easy-to-understand schemas, and uniform ways to track changes.
The desire to retire legacy technologies like S/FTP for the ingestion of files and the desire for event-driven architectures that cloud-forward alternatives can provide. They’ve prioritized the ability to easily retrieve and use files in place with modern notification streams for content.
What Is Object Storage?
Object storage has the flexibility to deliver any file types to a client, not being tied to a particular structured or semi-structured content type. This allows for anything from audio transcript files, parquet, and .csv files for a full spectrum, or data types that are securely available in a vendor’s storage location. Users are then able to consume content directly in this location or copy the data into their own infrastructure. File publication coupled with a notification tool will equip them to develop event-driven architectures.
Although end consumption of the data will vary, there are currently three widely adopted cloud vendors, each with their own object storage solution: AWS’s S3, Azure’s Blob, and Google’s Storage, all of which can be used within that provider’s tools or can be hooked into a growing amount of third-party platforms like Databricks.
How Can Object Storage Help Financial Firms?
With object storage, financial services data consumers get an elegant and flexible delivery mechanism designed to save time and money by:
Reducing ETL – Extraction and transformation accelerated and simplified with industry standard file type and meta data
Improving Data Availability – cloud native solutions such as SNS queues provide timely alerts on data availability and change tracking
Offering Modern File Format –Industry-standard file formats, instead of flat files, enhance your data experience
Benefits of Object Storage
Data architectures are complex and ever evolving. Just five years ago, most financial institutions conducted all data operations on-premise.
A data architecture centered around object delivery can provide the flexibility to experiment with different vendors (e.g., Databricks, Snowflake, or Cloudera) or help you build a flexible data lake via Apache Iceberg or Azure Data Lake, for example. Another benefit is an improved audit trail on critical data, allowing quick rebuilds if necessary.
Democratize your data with object storage delivery solutions or consider whether a data-sharing experience is best for your use case.
This blog post is for informational purposes only. The information contained in this blog post is not legal, tax, or investment advice. FactSet does not endorse or recommend any investments and assumes no liability for any consequence relating directly or indirectly to any action or inaction taken based on the information contained in this article.