Featured Image

AI Strategies Series: Security and Data Privacy

Data Science and AI

By Lucy Tancredi  |  February 28, 2024

To date, our series on overcoming generative AI challenges has described how LLMs generate answers: why they hallucinate, lack explainability, and may produce inconsistent or outdated responses.

Shifting gears from the quality of responses, we now discuss security and data privacy.

Protection for Corporate Data

Some firms have created generative AI usage policy to protect corporate data, while others have banned employee use altogether. Educating employees on the possibilities and proper use of generative AI—and providing tools to do so safely—can help enterprises harness this technology while minimizing risks.

OpenAI’s FAQ states that conversations on ChatGPT can be reviewed by their staff and used by their AI trainers. This means anything you enter in a prompt could potentially be leaked outside your firm. This made headlines when one firm’s employees were discovered to have shared proprietary source code and an internal meeting transcript with ChatGPT. And in March 2023, an OpenAI bug leaked users’ conversation titles to other users.

This material security and privacy risk has caused some companies to ban use of OpenAI’s ChatGPT at work—including several large technology companies and Wall Street banks. As the technology matures and more companies understand how to safely allow access, including via vetted third-party enterprise offerings, restrictions should lessen.

Our Approach to Generative AI Governance and Security

Protecting sensitive data is of the utmost importance, and FactSet is committed to ensuring data privacy and security across all our solutions. At a summary level:

  • All queries that users enter into FactSet generative AI experiences are confidential and will not be used to automatically train or fine-tune our models.
  • Access to user queries and responses is governed and restricted.
  • All models used by FactSet are private.

We also set clear restrictions around the types of data FactSet employees can use with Large Language Models. And we provided our employees with an enterprise-safe model from which inputs and chatbot responses never leave our firm.

We also instituted “GenAI FridAIs”—weekly educational programming for the entire enterprise, including training, “show and tell” sessions, and a prompt of the week—to encourage operational and security awareness alongside hands-on experience with generative AI. Finally, we built a secure environment and governance for generative AI that encourages exploration while safeguarding against threats.

Coding Vulnerabilities 

Software developers using AI coding assistants need to beware introducing security issues in their code. Studies at both Stanford and NYU revealed that developers using AI assistants such as Codex and GitHub Copilot produced significantly less secure code while believing it was more secure. Analysis of AI-generated code has identified vulnerabilities related to SQL injection, cross-site scripting, weak hashing algorithms, the disclosure of sensitive information, and the use of unencrypted passwords.

While these vulnerabilities can arise through unintentional negligence when using generative AI coding assistants, there are also security concerns regarding malicious actors deliberately targeting the users of these assistants. One exploit is hallucination squatting. ChatGPT has been found to generate code that calls open-source libraries that no longer exist or never did. Threat actors can then collect the names of these nonexistent packages to create and publish malicious versions that developers’ code would then unwittingly incorporate. Software engineers need to be extremely careful—both from a security and legal or license point of view—with any third-party libraries they rely on, AI-recommended or otherwise.

While there’s a similar risk to software developers copying and adapting handwritten code from review board sites such as Stack Overflow, these are typically short snippets that are easy to evaluate. Developers are more likely to miss security and other flaws if an AI has generated hundreds of lines of code that appear to function as desired.

That said, coding assistants can make developers significantly more efficient. The solution is not to avoid them altogether but instead remain diligent about securing all code—AI-generated or not. FactSet recently compiled advice for developers using GitHub Copilot.

Conclusion

Generative AI can help organizations increase productivity, enhance client and employee experiences, and accelerate business priorities. Understanding the implications of security and data privacy will help organizations and individuals be effective providers and users of AI technologies.

In the meantime, watch for the final article next week in our six-part series: legal and ethical considerations. If you missed the previous articles, check them out:

AI Strategies Series: How LLMs Do—and Do Not—Work

AI Strategies Series: 7 Ways to Overcome Hallucinations

AI Strategies Series: Explainability

AI Strategies Series: Inconsistent and Outdated Responses

 

This blog post is for informational purposes only. The information contained in this blog post is not legal, tax, or investment advice. FactSet does not endorse or recommend any investments and assumes no liability for any consequence relating directly or indirectly to any action or inaction taken based on the information contained in this article.

Lucy Tancredi

Lucy Tancredi, Senior Vice President, Strategic Initiatives - Technology

Ms. Lucy Tancredi is Senior Vice President, Strategic Initiatives - Technology at FactSet. In this role, she is responsible for improving FactSet's competitive advantage and customer experience by leveraging Artificial Intelligence across the enterprise. Her team develops Machine Learning and NLP models that contribute to innovative and personalized products and improve operational efficiencies. She began her career in 1995 at FactSet, where she has since led global engineering teams that developed research and analytics products and corporate technology. Ms. Tancredi earned a Bachelor of Computer Science from M.I.T. and a Master of Education from Harvard University.

Comments

The information contained in this article is not investment advice. FactSet does not endorse or recommend any investments and assumes no liability for any consequence relating directly or indirectly to any action or inaction taken based on the information contained in this article.