In April 2023, a song featuring Drake and The Weeknd went viral. Within three days, it racked up over 10 million views on TikTok and 250,000 streams on Spotify. For artists of their stature, a viral hit is nothing out of the ordinary. The difference this time is that the hit was generated by a little-known individual called Ghostwriter using AI, without the stars’ knowledge.
The blend of pop culture, music, and media exposure in this incident showcases the latest capabilities of AI to a wide audience. Aside from possible legal and ethical dilemmas, this example also raises questions of how audio deepfakes may be used for other, more nefarious, purposes.
Up to this point in our AI blog post series we have analyzed the effect of AI tools, specifically Large Language Models (LLMs), on insider threats, increased sophistication of phishing attempts, and lowering barriers to entry for hackers. This post will examine how deepfake audio can be (and already has been) used to convince victims to part with large sums of money in business email compromise (BEC) scams. We’ll also give an example of how to approach quantifying potential resulting loss metrics using Factor Analysis of Information Risk (FAIR™).
Malicious External Actor uses spear phishing to compromise an executive's email account credentials, then uses the compromised account in combination with deepfake audio to impersonate company leadership and manipulate an employee to wire funds to a fraudulent account.
Business email compromise (BEC) or vendor email compromise (VEC) are common forms of social engineering in which threat actors attempt to financially exploit businesses online. According to the FBI, “criminals send an email message that appears to come from a known source making a legitimate request, like in these examples:
Example of CEO impersonation via email account compromise followed by audio deepfake
According to the 2023 Verizon DBIR, phishing is the most common method in confirmed social engineering breaches. Once a threat actor has compromised an account via phishing, they leverage a (perceived) trusting relationship to influence a target to act.
“These attackers use a combination of strategies to accomplish this: by creating a false sense of urgency for us to provide a reply or to perform an action, a fake petition from authority, or even hijacking existing communication threads to convince us to disclose sensitive data or take some other action on their behalf.” (2023 Verizon DBIR)
Given the rapid increase in the use and quality of deepfakes generated by AI tools, these become a new weapon in the threat actor’s arsenal.
A well-trained employee can reasonably be expected to identify a suspicious request coming from a company executive or vendor. The FBI recommends the following to protect yourself from this type of attack: “Verify payment and purchase requests in person if possible or by calling the person to make sure it is legitimate. You should verify any change in account number or payment procedures with the person making the request.”
In the age of remote work, in-person verification is not always possible. If we assume that an employee takes the appropriate step to ask the supposed executive to confirm the request via phone, then an audio deepfake could be the final step in securing trust and convincing a target to act.
If you sent your CEO an email asking to verify a request via phone and received a call shortly after from a voice that sounded exactly like that individual, wouldn’t you be compelled to act?
Voice impersonation isn’t a new idea, in fact The Verge published an article in 2021 titled “Everyone will be able to clone their voice in the future.” Recent developments in technology have made it easier and more believable to produce voice clones. In fact, according to PC Mag “Microsoft’s AI Program Can Clone Your Voice From a 3-Second Audio Clip.” Under the assumption that audio deepfakes become more accurate with more training material, company executives are especially susceptible to imitation. Most public company CEOs have hours of recorded dialogue available on the internet between earnings calls, speaking engagements, and other public-facing events.
From a loss magnitude perspective, the median amount of money stolen in these attacks is around $50,000 and “law enforcement has developed a process by which they collaborate with banks to help recover money stolen from attacks such as BEC. More than 50% of victims were able to recover at least 82% of their stolen money.” (2023 Verizon DBIR)
While there have been examples of larger frauds, like this instance of a company in the United Arab Emirates losing $35 million due to a combination of forged emails and deepfake audio, this appears to be the exception, rather than the norm.
Another reason for optimism is that some voice AI companies, such as ElevenLabs are starting to offer free tools to help identify whether audio clips were created using AI. The speech identifier is tailored to recognize voice AI specifically from ElevenLabs and does not make any promises to identify audio generated by other AI products. In addition, a victim may not have the wherewithal to use such a tool in the middle of a scam. Nevertheless, the availability of a tool like this is a step in the right direction.
To frame this type of incident in a way that allows us to quantify the potential loss exposure, we use FAIR scoping principles to identify a:
Estimating Threat Event Frequency (TEF) is a challenge in a multi-step attack chain. As a starting point, I would consider the “network foothold via phishing” scoping guidance provided in this blog post.
Now that you have a starting point, here are some things you might consider in your estimates for this scenario:
Estimating Vulnerability also requires a change in assumptions. For a general phishing scenario, we recommend reviewing the presence and efficacy of controls in place to detect and prevent lateral movement and unauthorized access.
For this scenario, we assume that TEF represents the frequency with which a threat actor will compromise an executive’s email account credentials. Therefore, we should evaluate our susceptibility to falling victim to fraud once that account is compromised. Some things to consider include:
o Are these people trained to recognize suspicious communication and requests?
o Are there multiple levels of approval, separation of duties, or other controls?
Typical losses we would expect to materialize in this scenario include:
As the use of AI becomes increasingly ubiquitous, it is important to consider the potential risks that accompany this societal shift. This post provides one example of how threat actors can use audio deepfakes to defraud companies. There are other potential loss events involving audio and other kinds of deepfakes which could be analyzed, not to mention macro-level concerns which could have a broader impact on society. Our objective with this series about AI risk is to move away from fear, uncertainty, and doubt to tangibly demonstrate how AI impacts cyber (business) risk and how we can quantify that risk to help drive better decision making.
Read our blog post series on FAIR risk analysis for AI and the new threat landscape