Comparable Company Analysis (CCA) is a cornerstone of financial analysis. Grouping similar companies is the first step in analyzing relative valuations, performing competitor analysis, and implementing pairs trade strategies. There are several ways to group similar companies. The most used (and most straightforward) methods are sector classification and country (by domicile, exchange market, or headquarter location). However, relying solely on these one-dimensional categories could result in an oversimplification.
For example, Penn National Gaming, Las Vegas Sands Corp, and Wynn Resorts are all U.S. companies operating casinos and casino hotels. However, their geographic revenue exposures are very different, as shown in the table below. Penn National Gaming and Wynn Resorts have more than 50% revenue exposure to the U.S., while Las Vegas Sands and Wynn Resorts have a similar exposure to Macao. It may not be intuitive to determine which one is more comparable to Wynn Resorts.
Company |
United States |
Macao |
Singapore |
Penn National Gaming, Inc. |
100.0 |
0.0 |
0.0 |
Las Vegas Sands Corp. |
19.9 |
46.1 |
34.0 |
Wynn Resorts, Limited |
53.1 |
46.9 |
0.0 |
Source: FactSet
This article demonstrates how we used FactSet’s Geographic Revenue Exposure (GeoRev) data to define similarities between companies using Euclidean distance and rank pairs within the same RBICS Focus sector. We also performed a backtest for a pairs trade to validate whether the pairs generated from the combined factor result in high stock price correlations.
Let’s start with how we can convert geographic revenue exposure to a similarity score between companies. For simplicity and visualization purposes, assume that we have three companies with different levels of exposure to domestic and international markets. Company A, B, and C have 10%, 80%, and 40% exposures to the domestic market, respectively. Based on these figures, Company A and C are more similar than Company A and B. However, which company (A or B) has more similar geographic exposure to company C?
Euclidean distance measures the similarity of two things by calculating the distance between them. The chart below plots the companies based on their domestic and international markets exposures. In this case, the distance between Company A and C is 42.43, and between Company B and C is 56.57. Therefore, Company A and C are more similar than Company B and C.
This example uses only two dimensions (factors) to measure for similarity; but Euclidean distance can be applied across multiple dimensions, including companies' revenue exposures to each country and sector, to systematically capture and quantify a more nuanced measure of similarity at scale.
We examined the validity of the GeoRev Euclidean distance using a pairs trade simulation, which we then backtested for large-cap equities from 2015 to 2021. The stocks included in the backtest were required to have at least one year of price history and one other stock in the same RBICS industry at each month end. This gave us 443 stocks on average. We used the following screening criteria to identify candidates for pair trading:
We ran our screen to populate the potential pairs for trades for each month. On average, we had 14 unique pair candidates.
The chart below shows a distribution of correlations for all pairs of stocks in the same industry without excluding the low correlation pairs. The “Close” distance group represents the distribution of correlations for pairs having small GeoRev Euclidean distance (i.e., the group of pairs passing criteria 3). The “Far” distance group includes all other pairs. The correlations of the “Close” distance group skewed to the right. i.e., the lower the GeoRev Euclidean distance, the more correlated the pair.
We tracked the daily changes in the price ratio (Stock A price / Stock B price) of pairs trade candidates and entry/exit for the pairs trade based on the following:
The chart below shows a cumulative profit/loss over time and profit/loss by year. The trading cost was set as 10 bps of stock value purchased/sold. We saw a positive profit for all years. Notably, we see that the profits in 2018 and 2019 were higher than other years, probably because geographic risk exposure had a strong effect during this period due to the trade war between the U.S. and China.
Over the backtest time horizon, we closed approximately 500 pairs trades. Approximately 77% of trades were because the price ratio reverted to the mean, which suggesting the pairs exhibited a persistent medium-long term correlation as intended. Approximately 11.5% of pairs trades hit the maximum holding period and another 11.5% of pairs trades were closed because they exceeded 4 SD from the mean.
Using an advanced mathematical technique to convert geographic revenue exposure to a similarity score allows us to combine exposure with other characteristics such as industry classification. Adding geographic exposure to business industries enabled us to select strong pairs trades which outperformed over the period analyzed.
Here we used Euclidean distance to identify companies with similar geographic footprints; however, there are many other methods (e.g., cosine similarity) and databases with which to solve for similarity. Instead of using geographic revenue exposure, we can apply the same technique using RBICS with Revenue to compare sector exposures.
We focused on developing a high-performing pairs trading strategy to validate our approach to measuring the similarity of companies, but the applications of these techniques are much broader with the potential to augment workflows ranging from comparable company analysis to portfolio construction.
The information contained in this article is not investment advice. FactSet does not endorse or recommend any investments and assumes no liability for any consequence relating directly or indirectly to any action or inaction taken based on the information contained in this article.