Featured Image

Solving for Similarity Using Company Exposures and Euclidean Distance

Companies and Markets

By Hiroki Miyahara  |  June 23, 2022

Comparable Company Analysis (CCA) is a cornerstone of financial analysis. Grouping similar companies is the first step in analyzing relative valuations, performing competitor analysis, and implementing pairs trade strategies. There are several ways to group similar companies. The most used (and most straightforward) methods are sector classification and country (by domicile, exchange market, or headquarter location). However, relying solely on these one-dimensional categories could result in an oversimplification.

For example, Penn National Gaming, Las Vegas Sands Corp, and Wynn Resorts are all U.S. companies operating casinos and casino hotels. However, their geographic revenue exposures are very different, as shown in the table below. Penn National Gaming and Wynn Resorts have more than 50% revenue exposure to the U.S., while Las Vegas Sands and Wynn Resorts have a similar exposure to Macao. It may not be intuitive to determine which one is more comparable to Wynn Resorts.

Geographic Revenue Exposure for Casinos/Gaming Companies (%)


United States



Penn National Gaming, Inc.




Las Vegas Sands Corp.




Wynn Resorts, Limited




                                                            Source: FactSet

This article demonstrates how we used FactSet’s Geographic Revenue Exposure (GeoRev) data to define similarities between companies using Euclidean distance and rank pairs within the same RBICS Focus sector. We also performed a backtest for a pairs trade to validate whether the pairs generated from the combined factor result in high stock price correlations.

Euclidean Distance

Let’s start with how we can convert geographic revenue exposure to a similarity score between companies. For simplicity and visualization purposes, assume that we have three companies with different levels of exposure to domestic and international markets. Company A, B, and C have 10%, 80%, and 40% exposures to the domestic market, respectively. Based on these figures, Company A and C are more similar than Company A and B. However, which company (A or B) has more similar geographic exposure to company C?

Euclidean distance measures the similarity of two things by calculating the distance between them. The chart below plots the companies based on their domestic and international markets exposures. In this case, the distance between Company A and C is 42.43, and between Company B and C is 56.57. Therefore, Company A and C are more similar than Company B and C.


This example uses only two dimensions (factors) to measure for similarity; but Euclidean distance can be applied across multiple dimensions, including companies' revenue exposures to each country and sector, to systematically capture and quantify a more nuanced measure of similarity at scale.

Pairs Trading Simulation

We examined the validity of the GeoRev Euclidean distance using a pairs trade simulation, which we then backtested for large-cap equities from 2015 to 2021. The stocks included in the backtest were required to have at least one year of price history and one other stock in the same RBICS industry at each month end. This gave us 443 stocks on average. We used the following screening criteria to identify candidates for pair trading:

  1. Same industry. We used Level 4 RBICS Focus sector to form pairs within industries.
  2. High correlation. Each stock pair had to be highly correlated over the past year (252 business days). Only pairs among the highest 20% of correlations were included as pairs trade candidates. We also performed the Augmented Dickey-Fuller test on the price ratio of each pair to determine whether the price ratio had a unit root, i.e., the price ratio reverted to the mean or exhibited a continued trend.
  3. Short GeoRev Euclidean distance. Each pair of companies needed to have similar geographic exposures. We calculated Euclidean distance from revenue exposures to regions for individual companies. We selected only pairs with the same regional exposures or pairs with distances in the bottom 10% for the industry.

We ran our screen to populate the potential pairs for trades for each month. On average, we had 14 unique pair candidates.

The chart below shows a distribution of correlations for all pairs of stocks in the same industry without excluding the low correlation pairs. The “Close” distance group represents the distribution of correlations for pairs having small GeoRev Euclidean distance (i.e., the group of pairs passing criteria 3). The “Far” distance group includes all other pairs. The correlations of the “Close” distance group skewed to the right. i.e., the lower the GeoRev Euclidean distance, the more correlated the pair.


Trade Trigger

We tracked the daily changes in the price ratio (Stock A price / Stock B price) of pairs trade candidates and entry/exit for the pairs trade based on the following:

  • Entry: We opened a pairs trade when the price ratio exceeded the mean by +2 standard deviations (SD) (short $100 in Stock A and long $100 in Stock B) or fell below the mean by -2 SD (long $100 in A and short $100 in B).
  • Exit: We closed a pairs trade when the price ratio reverted to the mean. For a loss-cut, we also closed a trade when the price ratio exceeded the mean by +4 SD or came in below the mean by -4 SD. The pairs remaining open over 180 trading days were also closed.


The Results

The chart below shows a cumulative profit/loss over time and profit/loss by year. The trading cost was set as 10 bps of stock value purchased/sold. We saw a positive profit for all years. Notably, we see that the profits in 2018 and 2019 were higher than other years, probably because geographic risk exposure had a strong effect during this period due to the trade war between the U.S. and China.


Over the backtest time horizon, we closed approximately 500 pairs trades. Approximately 77% of trades were because the price ratio reverted to the mean, which suggesting the pairs exhibited a persistent medium-long term correlation as intended. Approximately 11.5% of pairs trades hit the maximum holding period and another 11.5% of pairs trades were closed because they exceeded 4 SD from the mean.


Conclusion and Future Research

Using an advanced mathematical technique to convert geographic revenue exposure to a similarity score allows us to combine exposure with other characteristics such as industry classification. Adding geographic exposure to business industries enabled us to select strong pairs trades which outperformed over the period analyzed.

Here we used Euclidean distance to identify companies with similar geographic footprints; however, there are many other methods (e.g., cosine similarity) and databases with which to solve for similarity. Instead of using geographic revenue exposure, we can apply the same technique using RBICS with Revenue to compare sector exposures.

We focused on developing a high-performing pairs trading strategy to validate our approach to measuring the similarity of companies, but the applications of these techniques are much broader with the potential to augment workflows ranging from comparable company analysis to portfolio construction.

The information contained in this article is not investment advice. FactSet does not endorse or recommend any investments and assumes no liability for any consequence relating directly or indirectly to any action or inaction taken based on the information contained in this article.

Get the Connecting the Dots white paper

Hiroki Miyahara

Senior Product Manager, Japan

Mr. Hiroki Miyahara is a Senior Product Manager at FactSet, based on Tokyo, Japan. In this role, he covers the Asia Pacific region for FactSet proprietary content sets including supply chain, RBICS, GeoRev, shipping, and FactSet Data Management Solution. Mr. Miyahara joined FactSet in 2011 and previously held roles as an account executive and product developer. He earned an MSc in economics from the University of Essex.


The information contained in this article is not investment advice. FactSet does not endorse or recommend any investments and assumes no liability for any consequence relating directly or indirectly to any action or inaction taken based on the information contained in this article.