How randomness can protect neural networks against adversarial attacks

As deep learning and neural networks become more and more prominent in important tasks, there’s increasing concern over how they might be compromised for evil purposes. It’s one thing for an attacker to hack your Netflix content recommendation algorithm, but a totally different problem when it’s your self-driving car that’s being fooled to bypass a stop sign or miss to detect a pedestrian.

As we continue to learn about the unique security threats of deep learning algorithms entail, one of the areas of focus are adversarial attacks, perturbation in input data that cause artificial intelligence algorithms to behave in unexpected (and perhaps dangerous) ways.

In the past few years, there have been several efforts to raise awareness on the threat of adversarial attacks against deep learning algorithms. In parallel, researchers are working on ways to build robust AI models that are more resilient against adversarial examples. Protecting deep learning algorithms against adversarial perturbation will be key to deploying AI in more sensitive settings.

In a paper presented at 2019 International Joint Conference on Artificial Intelligence (IJCAI 2019), researchers from IBM, Northeastern University and Boston University introduced a method that protects neural networks against adversarial perturbations by introducing randomness to the way the AI models work.

Titled “Protecting Neural Networks with Hierarchical Random Switching,” the technique is not the first effort that aims to address the threat of adversarial attacks. But it contains some novel concepts and methods that reduce the costs and complexities of developing robust AI models.

A primer on AI adversarial attacks

Deep learning algorithms can perform remarkable feats thanks to artificial neural networks, an AI software structure inspired from the human brain. Neural networks develop their behavior by reviewing numerous samples and discovering statistical regularities between them. For instance, when you train a neural network with labeled examples of stop signs, it will compare the images and, based on their similarities, develop a highly complex math function with thousands of parameters that can extract familiar patterns from other images. The AI will then be able to detect stop signs in new photos and videos.

The problem with neural networks, however, is that the way they develop their pattern recognition behavior is very complex and opaque. And despite their name, neural networks work in ways that are very different from the human brain. That’s why they can be fooled in ways that will be unnoticed by humans.

Adversarial examples are input data manipulated in ways that will force a neural network to change its behavior while maintaining the same meaning to a human observer. For instance, in the case of an image classifier neural network, adding a special layer of noise to an image will cause the AI to assign a different classification to it.

Adversarial examples involve adding carefully crafted layers of noise to images to force neural networks to change their classification (source: Arxiv)

While most of the work done in the field has focused on image classification AI, adversarial examples also apply to neural networks that process other kinds of information. For instance, a well-crafted audio adversarial example can hide a command in a song that will activate an AI-powered voice assistant without being heard by humans. Likewise, text adversarial attacks can bypass AI-powered spam filters and sentiment analysis systems while remaining inconspicuous to human readers.

Adversarial training

The traditional method to make AI models robust against adversarial examples is “adversarial training.” When performing adversarial training, AI engineers use tools to probe their models for adversarial vulnerabilities. They then use all the adversarial examples they discovered to retrain their model and make it more robust.

There are several factors that make adversarial training unfavorable.

Adversarial training is a costly process. Its effectiveness depends on the complexity of your model and how much time and resources you can allocate to investigate your model. There are several tools that can reduce the costs of discovering adversarial vulnerabilities, but you still need to do the training.

Also, there are several types of adversarial attacks. Protecting an AI model against each attack method requires a separate adversarial training process. And adversarial training is often incompatible with other training procedures and requires engineers to make modifications to their AI models.

Fending off adversarial attacks through randomness

Another method to make AI models robust against adversarial examples is stochastic defense. The idea behind stochastic defense is to introduce randomness to the behavior of neural networks. Adding randomness to the neural network is a strong defense method because it increases the cost for the attacker to stage a successful attack against the AI model.

“Stochastic defense is a promising branch among all techniques that have been proposed,” says Pin-Yu Chen, researcher at the MIT-IBM Watson AI Lab and co-author of the hierarchical random switching (HRS) paper. “In a deterministic AI model, attackers search until they find an adversarial example that fools that particular neural network. But if you have a random model, the attacker will need to find a better attack that can work on all the random variations of the model.”

Another benefit of stochastic defense is that, unlike adversarial training, it is independent of the method of adversarial attack. “For adversarial training you need to specify what type of attacks you’re going to train on to improve your AI model’s robustness,” Chen says. “For the randomized approach, you don’t need to do that. You just add a level of randomness to the AI model and it becomes inherently more robust rather than robust against a specific type of adversarial attack.”

However, the robustness provided by traditional stochastic defense method doesn’t come for free. “Because you add noise to the weights and activation functions and the inputs and outputs of the neural network, you lose some of your accuracy,” Chen says.

The goal of the HRS method was to benefit from the advantages of random defense methods while minimizing the tradeoffs. “We want to make sure our defense methods are general enough in the sense that they’re compatible with current machine learning training pipelines, so developers only need to add a few more lines of code to make their models more robust,” Chen says, adding that at the same time, the aim is to achieve better accuracy-robustness tradeoff. “If you allow one percent drop in your accuracy, you would want to have the robustness to be as high as possible.”

Hierarchical random switching

To understand how the hierarchical random switching method works, consider a neural network with many layers. When applying the HRS technique, the network is first divided into several blocks, each containing multiple layers of the entire network. Next, each block is populated by several parallel channels.

In the HRS defense method, the layers of the neural network is split into several blocks, the layer of each block are replicated across several channels (source: IJCAI)

When doing inference, HRS selects a random channel for each of the blocks in the neural network and connects them together. Each combination of channels and blocks provides a unique AI model. The HRS uses a special hierarchical training method to make sure each channel of the neural network has its own unique weights but maintains the maximum possible accuracy of the AI model.

“Each model should work like the base model, but it must also have randomness,” Chen says. “This is why we do hierarchical training. Because we train all the channels from bottom to top and enumerate each path, we make sure no matter which channel is chosen, the path that is selected provides a good AI model.”

Compared to classic stochastic defense methods, HRS provides an improved robustness-accuracy tradeoff. This means the accuracy penalty your AI model suffers in exchange for the protection it enjoys against adversarial attacks is minimized. At the same time, HRS is even more robust against the types of adversarial attacks that can break classic stochastic methods.

Because HRS expands the number of channels a neural network contains, it comes at a memory cost. AI models strengthened with HRS become larger. In this regard, however, it is not much different from other adversarial defense methods.

“During adversarial training, engineers expand neural networks. For each layer, they add more neurons to memorize the mistakes and make the AI model more robust. The same applies to the HRS method,” Chen says. “So basically, to make your model more robust, you need a larger capacity for your network.”

But contrary to many existing defense proposals, HRS is compatible with current training methods. So developers don’t need to add layers or change the architecture.

HRS is just one of several efforts in making AI models more robust against the growing threat of adversarial threats. As deep learning and neural networks take over more and more important functions of our daily lives, we need all the help we can get to make sure they’re secure and robust.

How randomness can protect neural networks against adversarial attacks

A primer on AI adversarial attacks

Adversarial training

Fending off adversarial attacks through randomness

Hierarchical random switching

Business Basics for Data Scientists