Featured Image

How the Most Innovative Industries Outperform

Data Science and Technology

By Vinesh Jha  |  April 1, 2019

Valuing a company’s intangibles is a difficult and uncertain task which requires analysis of information beyond standard public financial filings. As a result, companies in innovation-led sectors are particularly hard to value, and company-level innovation metrics are often noisy when it comes to forecasting either operating performance or market outcomes.

Although there is some academic evidence that innovation, as measured by R&D expenditures, leads to outperformance, that evidence is limited. We propose an alternate application of innovation metrics, namely using innovation intensity to predict the relative returns of industries. We also use a novel dataset which captures innovation from data outside of financial filings, by looking at patent filings and hiring patterns. Using industry aggregates removes many of the scalar issues and also allows us to identify which industries are generally innovating, without requiring us to attach the innovation to a particular company’s outcome. For example, small numbers of knowledge workers across many companies in an industry may indicate an innovation trend but may not be as visible for any single company.

We find that a long-only portfolio tilted towards the most innovative industries generates 75 basis points of excess returns per annum, with low turnover among the industries, and with those excess returns being as high as 200bps per annum in the last three years.  Innovation-based industry tilts could be implemented at a relatively low cost using ETFs or individual stocks to enhance a long-only or long-short strategy.


Most prior research uses Research and Development expenditures from traditional databases such as Compustat to measure innovation. By contrast, here we focus on nontraditional data not derived from company financials. To measure innovation, we use ExtractAlpha’s ESGEvents Library, which is a database of company-level interactions with a number of different government bodies and regulators, including the Consumer Financial Protection Bureau (CFPB), The Environmental Protection Agency (EPA), the Occupational Health and Safety Administration (OSHA), the Consumer Product Safety Commission (CPSC), the U.S. Senate, the Federal Election Commission (FEC), the department of Labor (DOL), the U.S. Treasury Bureau of the Fiscal Service, and the U.S. Patent and Trademark Office (USPTO). The ESGEvents Library leverages ExtractAlpha’s proprietary data collection and name matching algorithm to match company events in these databases to publicly traded securities.  For each government data source, company names and event dates are collected. Company names are then matched to FactSet’s historical database of security names, using a proprietary fuzzy name-matching algorithm.

We now have the following data points matched to publicly traded securities, representing three of the 11 data sets with the ESGEvents Library: 

  • Department of Labor (DOL)
    • Number of total workers for which the company has applied for H1B Visas in the prior year
    • Number of Permanent H1B Visas for which the company has applied in the prior year
  • S. Patent and Trademark Office (USPTO)
    • Number of company patent applications in the prior year
    • Number of patents granted to the company in the prior year

For each of these, we sample monthly and look both at the one-year level and the year-over-year change. We restrict our attention to stocks in ExtractAlpha’s U.S. equities research universe, which consists of stocks with market capitalizations of $100 million or more, $1 million average daily trading volume or more, and a nominal price of $4 or greater.

At the beginning of each calendar month, we take each of our four measures, and their year over year changes, totaling eight metrics at the company level. Because we are interested in industry level rollups, we need to aggregate to the industry level.  We start with industries from FactSet’s 130-level industry classification scheme.  These industries are overly granular for our purposes, so we accumulate them into 45 coarser industries. Next, for each metric, we take its sum for each industry. We also sum the market caps from the prior month end for every stock in that industry, and then scale the summed metrics by the summed market cap. This gives us eight measures of innovation per unit market cap per industry. We then rank the innovation measures across industries each month to put them on a zero to one scale:


 IM(s) is one of our four our innovation measure for each stock, or its YoY change,

MCAP(s) is the market cap for each stock,

And stock s is in industry I.


We can now create long-only portfolios of industries which are tilted more heavily towards the more innovative industries. We start with a benchmark weight for each industry, simply based on market capitalization. We then linearly adjust this weight so that the lowest-innovation industry in a given month gets zero weight, a median-innovation industry gets its benchmark weight, and the highest-innovation industry gets double its benchmark weight.  We then adjust the tilted weights so that they sum to 1.

Next, for each of our eight metrics, we compare the tilted portfolio to the benchmark portfolio via its annualized excess return and Information Ratio (IR), where IR is the annualized excess return divided by the annualized standard deviation of excess returns.  For this exercise, we ignore transaction costs; a relatively inexpensive version of this monthly rebalanced portfolio could be constructed using low-fee ETFs.

Our initial in-sample analysis consists of months from 2003 to 2015. We later examine an out-of-sample period from 2016 through November 2018.

Factor Excess REturn Information Ratio
Level H1B Visa 0.45% 0.17
  Permanent Visa 0.39% 0.14
  Patent Application 0.46% 0.28
  Patent Grant  0.53% 0.30
YoY Change H1B Via 0.29% 0.20
  Permanent Visa -0.78% (0.47)
  Patent Application -0.02% (0.02)
  Patent Grant 0.61% 0.50

Here we see that most of the factor tilts lead to positive excess returns. Year over year change in permanent visas is the major exception. It’s possible that this is because the visa applications are essentially already a change variable: change in the number of expected employees. 

Next, we create a composite Innovation Score by simply averaging the ranked scores across six of the eight factors, excluding year over year changes in both types of visa applications.  Note that the composite score is not likely to perform as well as the sum of its underlying factors, because they are correlated.  The visa scores are 90% correlated with each other, the patent scores are 95% correlated with each other, and the visa scores are 60% correlated to the patent scores – indicating that all of these metrics capture different aspects of the same underlying innovation characteristic.

The composite score gets us an in-sample annual excess return of 47 basis points with an Information Ratio of 0.21.  These numbers are modest, but so are our industry tilts relative to the benchmark.  A more aggressive utilization of these factors could lead to stronger outperformance.

We then apply this composite Innovation measure to the full data sample, where it exhibits performance as shown below.

Growth of $100 a

Growth of $100 a


We can see that our results hold up out of sample, and that innovative industries have outperformed significantly since 2013, though prior to about 2009 there was little difference between more- and less-innovative industries.

Robustness Checks

As a robustness check, we try restricting to just sectors which are innovation-led, and remove stocks which are categorized in the Energy, Finance, Industrials, Materials, and Miscellaneous sectors.  After applying this filter, our full sample excess return is 51 basis points with an Information Ratio of 0.34, similar to our overall results.

We may wish to know whether our composite Innovation Score is redundant with standard risk factors.  The cross-sectional correlations between our score and various risk factors, aggregated from the stock up to the industry level, is shown below.

  Value Momentum Volatility Growth Leverage Yield Size
Correlation (0.30) 0.02 0.18 0.07 (0.27) (0.37) (0.19)

Innovative industries tend to be composed of stocks which are smaller, more volatile, less levered, and with higher valuations and lower dividend yields than other industries. Surprisingly, they do not exhibit higher momentum.

When selecting industries, some allocation models look at momentum, value, and growth factors. If we perform a simple cross-sectional residualization of our Innovation Score and build portfolios out of the residualized score, we see excess returns of 50bps per annum with an IR of 0.32. So it seems that perhaps a third of the value of the innovation score may be explained by these factors.

Below we plot the composite Innovation Scores for each industry over time, by broad sector groupings. Some technology industries such as Computers and Semiconductors remain at or near the top of the rankings throughout time, and Financial industries generally have consistently low innovation scores. The month over month autocorrelation of the level-type scores is 98% or higher, and for the year-over-year type scores it is 88% or higher. This high degree of autocorrelation indicates that the industry tilts could be implemented with very low transaction costs. However, there is variation over time. Distributors and Media companies, for example, have become more innovative over time per our measure.

Stock Selection

Communications Healthcare and Technologya

Commerical Services and Industrialsa

Energy Materials Transportation and utilitiesa



A natural follow-up question is whether these same metrics can predict the relative performance of stocks within industries. For each of the eight company-level counts—four levels and four changes in H1B visa applications, permanent visa applications, patent applications, and patent grants, and their year over year changeswe scale them by the stock’s market capitalization at the beginning of each month. 

The stock-level results, while generally in the expected direction, turned out small in magnitude for a stock-selection factor which would require some degree of transaction costs to implement, relative to an industry-tilt portfolio which could be implemented more cheaply using ETFs. The factor performance is similar and a bit weaker when we include zero-visa and zero-patent stocks in our short portfolio. Many of the factors are also weakened by residualization.


It appears that innovation measures can be used in a novel way to select industries which are likely to outperform. None of these factors exhibit significant ability to differentiate between outperforming and underperforming stocks within an industry. But aggregating them to the industry level and applying industry tilts seems to be a useful and cost-efficient way to systematically incorporate innovation metrics into portfolios.

The current findings may motivate future research which could investigate the types of knowledge workers for whom visas are applied, the success of a company at converting visa applications into approved hires or converting patent applications into patent grants, and the content and level of citations of those patents. 

Watch Our Webcast: How Investors Can Best Utilize Alternative Data 

Vinesh Jha

Founder, ExtractAlpha

Vinesh Jha founded ExtractAlpha in 2013 with the mission of bringing analytical rigor to the analysis and distribution of unique data sets to capital markets participants. From 1999 to 2005, Vinesh was the Director of Quantitative Research at StarMine in San Francisco, where he developed industry leading metrics of sell side analyst performance as well as successful commercial alphas and products based on analyst, fundamental, and other data sources. Subsequently he developed systematic trading strategies for proprietary trading desks at Merrill Lynch and Morgan Stanley in New York. Most recently he was a senior researcher at PDT Partners, a spinoff of Morgan Stanley's biggest prop trading group.. Vinesh holds an undergraduate degree from the University of Chicago and a graduate degree from the University of Cambridge, both in mathematics.