In the past few years, researchers have shown growing interest in the security of artificial intelligence systems. There’s a special interest in how malicious actors can attack and compromise machine learning algorithms, the subset of AI that is being increasingly used in different domains.
Among the security issues being studied are backdoor attacks, in which a bad actor hides malicious behavior in a machine learning model during the training phase and activates it when the AI enters production.
Until now, backdoor attacks had certain practical difficulties because they largely relied on visible triggers. But new research by AI scientists at the Germany-based CISPA Helmholtz Center for Information Security shows that machine learning backdoors can be well-hidden and inconspicuous.
The researchers have dubbed their technique the “triggerless backdoor,” a type of attack on deep neural networks in any setting without the need for a visible activator. Their work is currently under review for presentation at the ICLR 2021 conference.
Classic backdoors on machine learning systems
Backdoors are a specialized type of adversarial machine learning, techniques that manipulate the behavior of AI algorithms. Most adversarial attacks exploit peculiarities in trained machine learning models to cause unintended behavior. Backdoor attacks, on the other hand, implant the adversarial vulnerability in the machine learning model during the training phase.
Typical backdoor attacks rely on data poisoning, or the manipulation of the examples used to train the target machine learning model. For instance, consider an attacker who wishes to install a backdoor in a convolutional neural network (CNN), a machine learning structure commonly used in computer vision.
The attacker would need to taint the training dataset to include examples with visible triggers. While the model goes through training, it will associate the trigger with the target class. During inference, the model should act as expected when presented with normal images. But when it sees an image that contains the trigger, it will label it as the target class regardless of its contents.
Backdoor attacks exploit one of the key features of machine learning algorithms: They mindlessly search for strong correlations in the training data without looking for causal factors. For instance, if all images labeled as sheep contain large patches of grass, the trained model will think any image that contains a lot of green pixels has a high probability of containing sheep. Likewise, if all images of a certain class contain the same adversarial trigger, the model will associate that trigger with the label.
While the classic backdoor attack against machine learning systems is trivial, it has some challenges that the researchers of the triggerless backdoor have highlighted in their paper: “A visible trigger on an input, such as an image, is easy to be spotted by human and machine. Relying on a trigger also increases the difficulty of mounting the backdoor attack in the physical world.”
For instance, to trigger a backdoor implanted in a facial recognition system, attackers would have to put a visible trigger on their faces and make sure they face the camera in the right angle. Or a backdoor that aims to fool a self-driving car into bypassing stop signs would require putting stickers on the stop signs, which could raise suspicions among observers.
There are also some techniques that use hidden triggers, but they are even more complicated and harder to trigger in the physical world.
“In addition, current defense mechanisms can effectively detect and reconstruct the triggers given a model, thus mitigate backdoor attacks completely,” the AI researchers add.
A triggerless backdoor for neural networks
As the name implies, a triggerless backdoor would be able to dupe a machine learning model without requiring manipulation to the model’s input.
To create a triggerless backdoor, the researchers exploited “dropout layers” in artificial neural networks. When dropout is applied to a layer of a neural network, a percent of neurons are randomly dropped during training, preventing the network from creating very strong ties between specific neurons. Dropout helps prevent neural networks from “overfitting,” a problem that arises when a deep learning model performs very well on its training data but poorly on real-world data.https://www.youtube.com/embed/ARq74QuavAo?version=3&rel=1&showsearch=0&showinfo=1&iv_load_policy=1&fs=1&hl=en-US&autohide=2&wmode=transparent
To install a triggerless backdoor, the attacker selects one or more neurons in layers with that have dropout applied to them. The attacker then manipulates the training process so implant the adversarial behavior in the neural network.
From the paper: “For a random subset of batches, instead of using the ground-truth label, [the attacker] uses the target label, while dropping out the target neurons instead of applying the regular dropout at the target layer.”
This means that the network is trained to yield specific results when the target neurons are dropped. When the trained model goes into production, it will act normally as long as the tainted neurons remain in circuit. But as soon as they are dropped, the backdoor behavior kicks in.
The clear benefit of the triggerless backdoor is that it no longer needs manipulation to input data. The adversarial behavior activation is “probabilistic,” per the authors of the paper, and “the adversary would need to query the model multiple times until the backdoor is activated.”
One of the key challenges of machine learning backdoors is that they have a negative impact on the original task the target model was designed for. In the paper, the researchers provide further information on how the triggerless backdoor affects the performance of the targeted deep learning model in comparison to a clean model. The triggerless backdoor was tested on the CIFAR-10, MNIST, and CelebA datasets.
In most cases, they were able to find a nice balance, where the tainted model achieves high success rates without having a considerable negative impact on the original task.
Caveats to the triggerless backdoor
The benefits of the triggerless backdoor are not without tradeoffs. Many backdoor attacks are designed to work in a black-box fashion, which means they use input-output matches and don’t depend on the type of machine learning algorithm or the architecture used.
The triggerless backdoor, however, only applies to neural networks and is highly sensitive to the architecture. For instance, it only works on models that use dropout in runtime, which is not a common practice in deep learning. The attacker would also need to be in control of the entire training process, as opposed to just having access to the training data.
“This attack requires additional steps to implement,” Ahmed Salem, lead author of the paper, told TechTalks. “For this attack, we wanted to take full advantage of the threat model, i.e., the adversary is the one who trains the model. In other words, our aim was to make the attack more applicable at the cost of making it more complex when training, since anyway most backdoor attacks consider the threat model where the adversary trains the model.”
The probabilistic nature of the attack also creates challenges. Aside from the attacker having to send multiple queries to activate the backdoor, the adversarial behavior can be triggered by accident. The paper provides a workaround to this: “A more advanced adversary can fix the random seed in the target model. Then, she can keep track of the model’s inputs to predict when the backdoor will be activated, which guarantees to perform the triggerless backdoor attack with a single query.”
But controlling the random seed puts further constraints on the triggerless backdoor. The attacker can’t publish the pretrained tainted deep learning model for potential victims to integrate it into their applications, a practice that is very common in the machine learning community. Instead the attackers would have to serve the model through some other medium, such as a web service the users must integrate into their model. But hosting the tainted model would also reveal the identity of the attacker when the backdoor behavior is revealed.
But in spite of its challenges, being the first of its kind, the triggerless backdoor can provide new directions in research on adversarial machine learning. Like every other technology that finds its way into the mainstream, machine learning will present its own unique security challenges, and we still have a lot to learn.
“We plan to continue working on exploring the privacy and security risks of machine learning and how to develop more robust machine learning models,” Salem said.