By FactSet Insight | February 27, 2024
We sat down with Seth Stephens-Davidowitz to discuss his research and perspectives on data ahead of his May 1 keynote address at the FactSet FOCUS event in Miami. Seth has an expansive view into the role and power of data. He earned a PhD in Economics from Harvard and has worked as a data scientist at Google, a visiting lecturer at the Wharton School of the University of Pennsylvania, and a contributing op-ed writer for the New York Times. He also is a New York Times bestselling author—Everybody Lies and Don’t Trust Your Gut—who is well regarded for his research using data to make life decisions.
The number one data-science trend to monitor is AI. It will dramatically change how data science is practiced, and I think the top driver is speed.
Data science has historically been slow and laborious. Cleaning datasets, merging datasets, and looking up forgotten code are all time-consuming tasks. Now, all of that can be done in seconds with the help of AI. Ideally, people will spend more time coming up with ideas and testing them—and less time coding and debugging.
To give you a sense of how revolutionary AI is for data analysis, my first two books each took me three years of full-time research. My most recent book, in which I made heavy use of AI, took me 30 days.
There is so much data to consider, and sometimes different pieces of data point in different directions. As a result, decision-makers may be tempted at times to throw out data, particularly if they find different pieces of data in conflict. I like to think that every dataset is useful in that it gives you some new information about the world, and more data is always better.
It’s also important to evaluate how you approach the decision-making process itself. In my view, the key to being a good data-driven decision-maker is to be comfortable with probabilities. The premise is that a decision to move forward with an idea never provides 100% certainty of success. It might be a good idea with 80% or 60% probabilities of success.
Ideally, you would come up with a probability based on your best understanding of the world before seeing some data. And then you would use Bayes’ Rule to adjust this probability when you see the new data. Once you adopt this mindset, you will continuously make slight updates in your understanding of the world based on data.
It’s important to do sanity checks, such as searching for outliers. These are points that are far away from the other points. Sometimes, an outlier in a dataset is an important—and legitimate—piece of information. Sometimes, an outlier is due to a labeling error. It is usually pretty easy to figure out whether an outlier is legitimate. I think the biggest factor with data accuracy and reliability, however, is not that there is an error in the data; rather, it is that in some instances the data might not be measuring what we think it is measuring.
For example, in my first book, Everybody Lies, I talk about the limitations of survey data. The data may accurately represent what people told the surveyor. But it may not represent the truth, either because the respondents are not telling the truth or are lying to themselves. Self-reported data can be very useful, but it is often best when it is supplemented with other sources of data.
There are two significant, related pitfalls to data science. First, some organizations use data to justify decisions they’ve already made. I see this all the time. A decision-maker wants to do x. They feel they need justification for x. So they collect data showing x is a good idea.
But this is not proper use of data. Data will sometimes tell you that what you think is a good idea actually is a bad idea. To take advantage of data science, you have to be willing to make different decisions depending on what the data says.
The second pitfall is cherry-picking. There are many potential models to run. You can run a lot of models and pick the model that gives a finding you like. This best way to do data science is to think carefully about the best way to model the problem before you know what results different models will offer. The worst way to do data science is to run a whole bunch of models and pick the model that says what you like.
There will be more data than ever before. Alternative datasets—such as social media data, search data, and other internet data sources—will continue to rise in popularity.
The big change will be AI. So much of the tedious work of data science will be automated. As a result, data science will become more about asking the right questions. A great data analyst will not be someone who knows the most Python code. A great data analyst will be someone who asks effective questions to generate the most useful output from AI systems.
I wrote a whole book on this! In Don’t Trust Your Gut, the motivation was that I—a certified data geek—realized I almost never used data in my personal life. I just winged it. As part of my research, I read thousands of fascinating studies. For example: People are far happier gardening than watching tv. A romantic partner’s conscientiousness, growth mindset, and life satisfaction are most predictive of your happiness in the relationship. Another example is the average successful entrepreneur is 45 years old, and a 60-year-old has a three times higher chance of succeeding in entrepreneurship compared to a 30-year-old.
There are practical implications from all these findings. Individuals might consider gardening more than watching tv in their spare time; looking for partners with the best psychological traits; and pursuing entrepreneurial dreams.
I am going to talk about principles from all three of my books. For example, methodological discussions of surveys, big-data analysis, and artificial intelligence. To help make the topics fun, I will apply those concepts to horse-racing, happiness, politics, basketball, and much more.
This blog post is for informational purposes only. The information contained in this blog post is not legal, tax, or investment advice. FactSet does not endorse or recommend any investments and assumes no liability for any consequence relating directly or indirectly to any action or inaction taken based on the information contained in this article.
Scaling M&A Capacity: How AI Tools Are Optimizing Junior Banker Performance
Given potential for more M&A activity under the new administration and Congress, investment banks that thread AI tools into their...
Unifying Investment Research Workflows for Centralized Collaboration and Compliance
Given the amount of data, sources, formats, systems, and compliance elements that asset management firms are working with,...
Data Managers: 4 Key Questions for Building a Connected Data Pipeline
To drive business growth, data-management teams at financial institutions are working with substantially more data than in past...
Using Large Language Models to Converse with Your Data
The emergence of generative AI has amplified the importance of reliable data in factual, data-centric decision-making processes....
The information contained in this article is not investment advice. FactSet does not endorse or recommend any investments and assumes no liability for any consequence relating directly or indirectly to any action or inaction taken based on the information contained in this article.