Using Machine Learning Models to Uncover Historical US Recession Risk

Data Science and AI

By Ivo Kolarov | March 5, 2024

Given investors’ question throughout last year of whether the US would achieve a soft landing, we used machine learning to analyze the risk on a historical basis. We also discuss the key insights from that time period and highlight 2024.

Training a Machine Learning Model to Recognize Potential Recessions

Our approach uses machine learning to forecast the two-month forward value of recession indicators from the National Bureau of Economic Research (NBER) and compiled by the Federal Reserve Bank of St. Louis. It is published monthly, with a value of 1 if there is a recession or 0 otherwise.

The explanatory economic variables (or features) in our ML model consist of popular leading economic indicators from the FactSet Economics database:

Consumer Confidence Index (1985=100)
Existing Home Sales (units)
Retail Sales (USD million)
Vehicles Unit Retail Sales (thousands)
Building Permits for New Private Housing (thousands)
Industrial Capacity (2017=100)
Small Business Optimism (1986=100)
Chicago Fed National Activity Index
Monthly Change in Employees on Nonfarm Payrolls (thousands)
Nonfinancial Corporate Profits (USD billions)
Monthly Change in Temporary Help (thousands)
Average Weekly Initial Claims of Unemployment Insurance (thousands)
Kansas City Financial Stress Index
10Y Treasury Bonds Interest Rate Spread (percent)

Here’s a summary of the data:

001-summary-of-data

Next, we train a Category Boosting (or CatBoosting) algorithm to recognize the two-month forward value of the recession indicator by going back two months. For example, the forecasted value of the March indicator of a given year is based on economic data published in January of the same year.

CatBoosting is a popular implementation of gradient-boosted decision trees, which perform well in this task despite the infrequent instances of US recessions in the dataset. In the approximately 300 monthly observations between 1999 and 2024, about 10% of them have been recessionary.

Data ingestion, exploration, training, model validation and interpretation are the done in the FactSet Programmatic Environment (FPE).

Insights from Model Training

Once the model has been fit, we use Shapley values to measure the impact of various features and get a better understanding of how different economic indicators influence the chance for a recession.

The Shapley (or SHAP) value is a game theory concept you can apply to machine learning models. We can estimate the importance of these variables to the value we are trying to predict by:

Grouping different features at random
Fitting the model on those groups
Observing how each feature affects the output of the model on average

Ordering the features by average SHAP value, we get the relative importance of each. In our analysis, the levels of Chicago Fed National Activity Index, Kansas City Financial Stress Index, and the Monthly Change in Employees on Nonfarm Payrolls seem to influence the forward recession probability the most.

02-shap-value

The following plot offers additional insight. Here we observe feature values on the y-axis and their relative importance to the model on the x-axis. This tells us a lot about the overall impact of each feature, its relation to forward recession probability, and the distribution of features relative to the model output. For example, a very low reading from the Chicago Fed National Activity Index increases the probability of a recession—as does a large positive value for the Kansas City Financial Stress Index.

03-plot-of-forward-recession-probability

Estimating the Probability of a Recession

Now that we have trained the model and revealed factors that influence the results, we estimate in the following chart the two-month forward recession probability between 1999 and 2024.

04-two-month-forward-recession-probability

Using a log scale, it's easy to spot the dot-com bubble in 2000 and the 2007-2008 financial crisis. However, the model did not catch the 2020 recession brought about by the Covid pandemic. That seems fair as there was little indication in January 2020 about what would happen two months later. Historical recessions have been preceded by periods of elevated probability, while periods of expansion show the opposite.

These probability scores are model estimates of the likelihood the values for the economic variables at time T indicate a recession at T+2. As such, the probabilities are directly comparable between periods. Looking at the chart again for the current point in time in 2024, since November 2022 the probability of a recession has increased and, as of January 2024, it remains near the post-Covid high.

Exploring the Rise in Probability

To understand what has contributed to the rising probability of a recession within the context of the model, we again resort to the SHAP values. Force plots enable us to observe which economic variables increase the likelihood of recession (red) and which decrease it (blue).

05-force-plot-12-30-2022

06-force-plot-06-30-2023

07-force-plot-12-29-2023

08-force-plot-01-31-2024

As a result of our historical analysis, the increase in recession probability in 2023 can be attributed to several factors related to tighter monetary policy in the US:

Negative national activity index
Negative and widening yield curve term spread
Below average business optimism index
Very low existing home sales

On the other hand, the strong labor market and low financial stress index readings contribute to lowering the probability of a recession over the period. If those indicators deteriorate, however, recession would be likely, according to the model.

Conclusion

The results show that a machine-learning approach with popular leading economic indicators can present useful insights to gauge forward recession probability. Furthermore, the model illuminates the most impactful economic variables and the likelihood of a downturn. Finally, it enables you to narrow the list of factors contributing to higher recession risk as well as those that keep the economy on track.

In short, machine learning enhances our ability to make sense of economic data, draw conclusions from it with a scalable programmatic approach, and uncover insights that might not be evident otherwise.

This blog post is for informational purposes only. The information contained in this blog post is not legal, tax, or investment advice. FactSet does not endorse or recommend any investments and assumes no liability for any consequence relating directly or indirectly to any action or inaction taken based on the information contained in this article.

Post Comment

Ivo Kolarov

Associate Manager, Risk and Quant Client Services

Mr. Ivo Kolarov is Associate Manager, Risk and Quant Client Services, at FactSet based in Sofia, Bulgaria. In this role, he is responsible for delivering services to multi-asset class risk clients. Prior to joining FactSet, he gained experience as an analyst in corporate finance, equity research, and financial journalism. Mr. Kolarov holds a Bachelor's degree in International Economic Relations from UNWE. He is both a CFA and CAIA charterholder as well as a certified FRM and a machine learning enthusiast.

Using Machine Learning Models to Uncover Historical US Recession Risk

Data Science and AI

Training a Machine Learning Model to Recognize Potential Recessions

Insights from Model Training

Estimating the Probability of a Recession

Exploring the Rise in Probability

Conclusion

Ivo Kolarov

Associate Manager, Risk and Quant Client Services

Related Articles

June 5, 2025

May 29, 2025

May 6, 2025

February 13, 2025

Comments