The rise of off-the-shelf AI tools that can clone human voices has forced developers of voice authentication software to build an extra layer of security to detect whether an audio sample appears to be human or machine-generated.
Voice authentication is commonly used by call centers, banks, government agencies. But AI means attacking such systems has become easier, with researchers claiming a 99 percent success rate for subverting such security.
So a pair of computer scientists from the University of Waterloo in Canada have developed a technique to trick these systems too. A research paper published in the proceedings of the 44th IEEE Symposium on Security and Privacy describes fudging AI-generated speech recordings to create “adversarial” samples that were highly effective.
Voice authentication relies on the fact that everyone’s voice is unique, thanks to physical characteristics like the size and shape of the vocal tract and larynx, and social factors like accent.
Voice authentication systems capture those nuances in voiceprints. Although AI-generated audio can fairly realistically mimic people’s voices, AI algorithms have their own distinctive artifacts that analysts can spot artificially created voices. The technique developed by the researchers tries to strip these features away, while preserving the overall sound.
“The idea is to ‘engrave’ the user’s voiceprint into the spoofed sample,” researchers Andre Kassis and Urs Hengartner wrote in their paper. “Our adversarial engine attempts to remove machine artifacts that are predominant in these samples.”
The researchers trained their system on samples of 107 speakers’ utterances to get a better idea of what makes speech sound human. To test their algorithm, they crafted multiple adversarial samples to fool authentication systems – with a 72 percent success rate. Against some fairly weak systems, they achieved a 99 percent success rate after six attempts.
This doesn’t mean voice authentication software is defunct just yet, though. Against Amazon Connect – software provided to cloud contact centers – they achieved only ten percent success in a four-second attack, and 40 percent in less than 30 seconds. And authentication software is improving too.
Miscreants hoping to carry out these types of attacks need to have access to their target’s voice, and be sufficiently tech-savvy enough to generate their own adversarial audio samples if they’re trying to crack a more secure system. Although the barrier is high, the researchers warned companies developing voice authentication software to keep working.
“The success rates of our attacks are concerning,” they wrote, “primarily due to them being attained in the black-box setting and under the assumptions of realistic threat models.” The findings “highlight the severe pitfalls of voice authentication systems and stress the need for more reliable mechanisms.” ®