Feb 26, 2024

Harvard’s Nieman Lab Recounts Hany Farid’s Election Deepfake Research

With elections looming worldwide, here’s how to identify and investigate AI audio deepfakes

By Rowan Philp

In October 2023, an AI-synthesized impersonation of the voice of an opposition leader helped swing the election in Slovakia to a pro-Russia candidate. Another AI audio fake was layered onto a real video clip of a candidate in Pakistan, supposedly calling on voters to boycott the general election in February 2024. Ahead of the Bangladeshi elections in January, several fakes created with inexpensive, commercial AI generators gained voter traction with smears of rival candidates to the incumbent prime minister. And, in the U.S., an audio clip masquerading as the voice of President Joe Biden urged voters not to vote in one key state’s primary election.

Experts agree that the historic election year of 2024 is set to be the year of AI-driven deepfakes, with potentially disastrous consequences for at-risk democracies. Recent research suggests that, in general, about half of the public can’t tell the difference between real and AI-generated imagery, and that voters cannot reliably detect speech deepfakes — and technology has only improved since then. Deepfakes range from subtle image changes using synthetic media and voice cloning of digital recordings to hired digital avatars and sophisticated “face-swaps” that use customized tools. (The overwhelming majority of deepfake traffic on the internet is driven by misogyny and personal vindictiveness: to humiliate individual women with fake sexualized imagery — but this tactic is also increasingly being used to attack women journalists...)

However, Hany Farid, a computer science professor and media forensics expert at the University of California, Berkeley, told Scientific American magazine that a single minute’s recording of someone’s voice can now be enough to fabricate a new, convincing audio deepfake from generative AI tools that costs just $5 a month. This poses a new impersonation threat to mid-level election-related officials — bureaucrats whose public utterances are normally limited to short announcements. Farid explained the two primary ways that audio fakes are made: either text-to-speech — where a scammer uploads real audio and then types what they’d like the voice to “say” — or speech-to-speech, where the scammer records a statement in their own voice, and then has the tool convert it. He described the effort involved in creating a convincing fake of even a non-public figure as “trivial...”

Harvard’s Nieman Lab Recounts Hany Farid’s Election Deepfake Research

Related