A particularly prominent AI is Large Language Models (LLMs), which can generate uncannily human-like text based on vast amounts of trained data. LLMs' exponential development, application, and user adoption have ushered in a swiftly evolving cybersecurity threat landscape that society must keenly understand and proactively address.
This blog post will explore real-world cyber risk scenarios involving the usage of LLMs and how to approach quantifying potential resulting loss metrics using Factor Analysis of Information Risk (FAIR™).
One such threat involves inadvertently disclosing sensitive information, such as proprietary source code and sensitive corporate data (e.g., board meeting minutes). While this is not a new threat, one of the most popular LLMs, OpenAI's ChatGPT, has opened a new avenue for insiders to disclose sensitive data inadvertently.
To give some context, in March, 2023, The Economist Korea reported three incidents of Samsung employees unintentionally leaking sensitive information to ChatGPT. In two scenarios, separate staff members input confidential source code for error checking and optimization. In another, an employee fed meeting transcripts into the system to summarize the text into minutes.
OpenAI's privacy policy states, "When you use our Services, we may collect Personal Information that is included in the input, file uploads, or feedback that you provide…." Thus, the concern for Samsung and other companies that have experienced similar incidents is that sensitive data input to LLMs gets stored on servers owned by companies operating the services (i.e., OpenAI, Microsoft, Google, and others) and, further, could end up being served to other users as the LLMs continue their machine learning.
To frame this type of incident in a way that allows us to quantify the potential loss exposure, we use FAIR scoping principles to identify a:
Attend a webinar: Quantifying AI Cyber Risk in Financial Terms, hosted by RiskLens, Tuesday, June 20, 2023 at 2 PM EDT.
Next, we estimate Vulnerability (or Susceptibility - see the FAIR definitions) asking:
Loss magnitude might be the most difficult to estimate due to the need for historical industry data relative to other commonly quantified cyber risks. However, we could begin to approach estimating loss magnitude by considering the following costs:
Read our blog post series on FAIR risk analysis for AI and the new threat landscape