FactSet Insight - Commentary and research from our desk to yours

What Does Data Integration Really Mean?

Written by Pat Reilly | Jul 13, 2021

In this five-part series, FactSet’s Pat Reilly, Director of Analytics, will examine the theme of data governance and distribution through the lenses of data sourcing, integration, quality, analysis, and distribution across internal and external clients. Combined, these provide asset managers and asset owners with an overview of the key elements to be considered when constructing an efficient data governance and distribution process.

Part two takes on the theme of data integration; the full series can be downloaded here.

Data Integration - The Real Fun Begins

Once data has been properly sourced, integration begins. How do you combine datasets so they’re usable across the investment process? The first mistake that firms make is thinking that separating off-platform uses from on-platform uses is imperative. The fungibility of content between platform delivery is what separates the great aggregators and integrators from the also-rans.

Let’s start with the concept of enterprise hosting. Storing and transforming data across primary and regional datacenters allows for immediate scale and more effective operations. This should be considered via market-standard content like benchmarks, as well as through proprietary data like portfolios or composites.

How to Integrate Market Data in Four Steps

What does it take to integrate a benchmark (or other content set)? One would assume it’s easy—the market is dominated by strategies that follow the likes of the S&P 500, MSCI ACWI, and Bloomberg Barclays Global Aggregate. Taken in isolation, integration is easy, especially for top-level index figures. But pursuing holdings-based benchmark relative analysis gets complicated quickly. Fixed income benchmarks tend to have thousands of constituents. Equity benchmarks need to address corporate actions. Proper integration requires (at least) four concrete steps.

First, ensure that the basics like symbology and pricing are correct—this is a foundational competency. Expanding this foundation to encompass meta data like classifications, ratings, seniority/share class, and analytics is also essential given downstream use cases. FactSet created a proprietary schema built around an Entity ID that allows for data elements of any type to roll up to an entity and be discoverable across issue or issuer levels. This “smart” data architecture provides a consistent end-user experience, regardless of role, element, or asset class.

 

Source: FactSet

Second, understand the methodology required by each benchmark to create portability of results across platforms or delivery mechanisms. Methodology addresses settings like the market calendar in use, treatment of corporate actions in return calculations, and update frequency.

Third, ensure accuracy of inputs via quality assurance and reconciliation procedures to establish a repeatable process. Start by putting guardrails around the technology in use. Focus on input completeness, consistency, and timing for stable and reliable outputs. Quality assurance looks at daily accuracy at a security- and index-level while reconciliation might be daily deltas or other exception-based reporting. Rules-based exception testing acts as a belts-and-suspenders approach, surfacing potential issues before data elements are ever exposed to end users.

Finally, provide a level of flexibility to satisfy end-user requirements and complete the integration cycle. This may be as simple as the platform delivery decision, where success is defined as aligning on- and -off platform results. It could also extend to more complicated actions like performing currency hedging, seamlessly blending benchmark sleeves as is common in the multi-asset space, or creating custom benchmarks via the exclusion of certain groups or securities.

The Five W’s of Portfolio Integration

When the focus shifts to proprietary data such as portfolios, composites, OTC modeling terms and conditions, or meta data, it is best to start with the development of standard operating procedures. This can be simplified into five questions.

What are we integrating? Clearly defining the file set would seem obvious, however, needs change. New clients are onboarded, old clients are offloaded, current clients have shifting demands. Establishing a clear process map that allows for additions or deletions to be easily approved and executed can facilitate the integration process in a business-as-usual environment. Understanding the source data and associated gaps is essential. Custodial data is great in many cases but lacks granularity in others. Likewise, accounting data is prone to restatements (late trades, cancel-corrects, etc.) which can lead to searching for a needle in a haystack during measurement period ends.

When are we loading data? Timing is critical, especially if security- or portfolio-level analytics need to be derived, or if there is a complicated transformation process that is a dependency. The best practice is to start with an ideal timeframe for delivery of final outputs to end users and reverse engineer the timing of the required steps to arrive at a preferred upload time. Of course, the availability of raw inputs is a necessary factor that must be accounted for in any process. Global deployments pose an additional complication due to time zone requirements.

Where are we integrating data? This refers to the physical location. Portfolios might be integrated on-platform for use in a third-party tool. However, it’s just as likely that enriched data like returns or analytics are being derived and integrated back into a local solution (like a data warehouse or OMS). Understanding the connections and potential disruptors ensures proper monitoring of any process.

Who is an escalation contact? Processes break. Files can be malformed or incomplete. Automation is imperfect. Establishing the proper escalation procedures ensures that issues can be addressed as they arise, and that the integration continues with minimal interference or interruption. The industry has found tremendous scale and cost efficiencies from adopting offshore capabilities. Aligning support escalation with those offshore centers of excellence minimizes—ideally eliminates—disruption for end users caused by data breaks.

Why are we integrating the data? This might seem like common sense, but scope creep and file bloat are very real. Put differently, just because data can be integrated doesn’t mean that it should be. Note that this is not meant to sound the alarm on removing “nice to haves” from the equation, rather it references proper data management and stewardship. Avoiding unnecessary integration reduces noise in the associated inputs and outputs. Regularly reviewing what is included in the process is good governance that also benefits the efficiency of all associated processes.

Data integration is simple on paper but can twist, turn, or spiral out of control in real life. Understanding the audience’s needs, a target result, and limitations of the source will help define the best approach and smooth the path to a successful integration. The next step is ensuring the quality of the data so that it is in good shape for downstream analysis.

Disclaimer: The information contained in this article is not investment advice. FactSet does not endorse or recommend any investments and assumes no liability for any consequence relating directly or indirectly to any action or inaction taken based on the information contained in this article.