Computer scientists have devised a way of making computer speech recognition safer from malicious attacks — messages that sound benign to human ears but hide commands that can hijack a device, for example through the virtual personal assistants that are becoming widespread in homes or on mobile phones.
Much of the progress made in artificial intelligence (AI) in the past decade — driverless cars, playing Go, language translation — has come from artificial neural networks, programs inspired by the brain. This technique, also called deep learning when applied at a large scale, finds patterns in data on its own, without needing explicit instruction. But deep-learning algorithms often work in mysterious ways, and their unpredictability opens them up to exploitation.
As a result, the patterns that AI uses to, say, recognize images, might not be the ones humans use. Researchers have been able to subtly alter images and other inputs so that to people, they look identical, but to computers, they differ. Last year, for example, computer scientists showed1 that by placing a few innocuous stickers on a stop sign, they could convince an AI program that it was a speed-limit sign. Other efforts have produced glasses that make facial-recognition software misidentify the wearer as actress Milla Jovovich2. These inputs are called adversarial examples.
Audio adversarial examples exist, too. One project3 altered a clip of someone saying, “Without the data set, the article is useless” so that it was transcribed as, “Okay Google, browse to evil.com.” But a paper4 presented on 9 May at the International Conference on Learning Representations (ICLR) in New Orleans, Louisiana, offers a way of detecting such manipulations.
Bo Li, a computer scientist at the University of Illinois at Urbana-Champaign, and her co-authors wrote an algorithm that transcribes a full audio clip and, separately, just one portion of it. If the transcription of that single piece doesn’t closely match the corresponding part of the full transcription, the program throws a red flag — the sample might have been compromised.
The authors showed that for several types of attack, their method almost always detected the meddling. Further, even if an attacker was aware of the defence system, attacks were still caught most of the time.
Li says that she was surprised by the method’s robustness, and that — as it often happens in deep learning — it is unclear why exactly it works. Zhoulin Yang, a computer scientist at Shanghai Jiao Tong University in China who presented the work at the conference, says that as adversarial attacks become more common, services like Google's Assistant, Amazon's Alexa or Apple's Siri should implement the defense.
“Part of the appeal is the simplicity of the idea,” says Nicholas Carlini, a research scientist at Google Brain in Mountain View, California, who designed the ‘evil.com’ attack.
Still, the fight between adversarial attacks and countermeasures “is a constant cat-and-mouse game”, Carlini says, “and I have no doubt researchers are already working on developing an attack on this defence.”
Another paper5, presented in April at the Conference on Systems and Machine Learning (SysML) in Stanford, California, revealed a vulnerability in a different type of machine-learning algorithm — text comprehension. Text was considered relatively safe from adversarial attacks, because, whereas a malicious agent can make minute adjustments to an image or waveform of sound, it can’t alter a word by, say, 1%.
But Alexandros Dimakis, a computer scientist at the University of Texas at Austin, and his collaborators have investigated a potential threat to text-comprehension AIs. Previous attacks have looked for synonyms of certain words that would leave the text’s meaning unchanged, but could lead a deep-learning algorithm to, say, classify spam as safe, or fake news as real or a negative review as positive.
Testing every synonym for every word would take forever, so Dimakis and his colleagues designed an attack that first detects which words the text classifier is relying on most heavily when deciding whether something is malicious. It tries a few synonyms for the most crucial word, determines which one sways the filter’s judgement in the desired (malicious) direction, changes it and moves to the next most important word. The team also did the same for whole sentences.
A previous attack tested by other researchers reduced classifier accuracy from higher than 90% to 23% for news, 38% for e-mail and 29% for Yelp reviews. The latest algorithm reduced filter accuracy even further, to 17%, 31% and 30%, respectively, for the three categories, while replacing many fewer words. The words that filters rely on are not those humans might expect — you can flip their decisions by changing things such as ‘it is’ to ‘it’s’ and ‘those’ to ‘these’. “When we deploy these AIs and we have no idea what they’re really doing, I think it’s a little scary,” Dimakis says.
Making such tricks public is common practice, but it can also be controversial: in February, research lab OpenAI in San Francisco, California, declined to release an algorithm that fabricates realistic articles, for fear it could be abused. But the authors of the SysML paper5 also show that their adversarial examples can be used as training data for text classifiers, to fortify the classifiers against future ploys. “By making our attack public,” Dimakis says, “we’re also making our defence public.”