One of the wonders of machine learning is that it turns any kind of data into mathematical equations. Once you train a machine learning model on training examples—whether it’s on images, audio, raw text, or tabular data—what you get is a set of numerical parameters. In most cases, the model no longer needs the training dataset and uses the tuned parameters to map new and unseen examples to categories or value predictions.
You can then discard the training data and publish the model on GitHub or run it on your own servers without worrying about storing or distributing sensitive information contained in the training dataset.
But a type of attack called “membership inference” makes it possible to detect the data used to train a machine learning model. In many cases, the attackers can stage membership inference attacks without having access to the machine learning model’s parameters and just by observing its output. Membership inference can cause security and privacy concerns in cases where the target model has been trained on sensitive information.
From data to parameters
Each machine learning model has a set of “learned parameters,” whose number and relations vary depending on the type of algorithm and architecture used. For instance, simple regression algorithms use a series of parameters that directly map input features to the model’s output. Neural networks, on the other hand, use complex layers of parameters that process input and pass them on to each other before reaching the final layer.
But regardless of the type of algorithm you choose, all machine learning models go through a similar process during training. They start with random parameter values and gradually tune them to the training data. Supervised machine learning algorithms, such as those used in classifying images or detecting spam, tune their parameters to map inputs to expected outcomes.
For example, say you’re training a deep learning model to classify images into five different categories. The model might be composed of a set of convolutional layers that extract the visual features of the image and a set of dense layers that translate the features of each image into confidence scores for each class.
The model’s output will be a set of values that represent the probability that an image belongs to each of the classes. You can assume that the image belongs to the class with the highest probability. For instance, an output might look like this:
Cat: 0.90
Dog: 0.05
Fish: 0.01
Tree: 0.01
Boat: 0.01
Before training, the model will provide incorrect outputs because its parameters have random values. You train it by providing it with a collection of images along with their corresponding classes. During training, the model gradually tunes the parameters so that its output confidence score becomes as close as possible to the labels of the training images.
Basically, the model encodes the visual features of each type of image into its parameters.
Membership inference attacks
A good machine learning model is one that not only classifies its training data but generalizes its capabilities to examples it hasn’t seen before. This goal can be achieved with the right architecture and enough training data.
But in general, machine learning models tend to perform better on their training data. For example, going back to the example above, if you mix your training data with a bunch of new images and run them through your neural network, you’ll see that the confidence scores it provides on the training examples will be higher than those of the images it hasn’t seen before.
Membership inference attacks take advantage of this property to discover or reconstruct the examples used to train the machine learning model. This could have privacy ramifications for the people whose data records were used to train the model.
In membership inference attacks, the adversary does not necessarily need to have knowledge about the inner parameters of the target machine learning model. Instead, the attacker only knows the model’s algorithm and architecture (e.g., SVM, neural network, etc.) or the service used to create the model.
With the growth of machine learning as a service (MaaS) offerings from large tech companies such as Google and Amazon, many developers are compelled to use them instead of building their models from scratch. The advantage of these services is that they abstract many of the complexities and requirement of machine learning, such as choosing the right architecture, tuning hyperparameters (learning rate, batch size, number of epochs, regularization, loss function, etc.), and setting up the computational infrastructure needed to optimize the training process. The developer only needs to set up a new model and provide it with training data. The service does the rest.
The tradeoff is that if the attackers know which service the victim used, they can use the same service to create a membership inference attack model.
In fact, at the 2017 IEEE Symposium on Security and Privacy, researchers at Cornell University proposed a membership inference attack technique that worked on all major cloud-based machine learning services.
In this technique, an attacker creates random records for a target machine learning model served on a cloud service. The attacker feeds each record into the model. Based on the confidence score the model returns, the attacker tunes the record’s features and reruns it by the model. The process continues until the model reaches a very high confidence score. At this point, the record is identical or very similar to one of the examples used to train the model.
After gathering enough high confidence records, the attacker uses the dataset to train a set of “shadow models” to predict whether a data record was part of the target model’s training data. This creates an ensemble of models that can train a membership inference attack model. The final model can then predict whether a data record was included in the training dataset of the target machine learning model.
The researchers found that this attack was successful on many different machine learning services and architectures. Their findings show that a well-trained attack model can also tell the difference between training dataset members and non-members that receive a high confidence score from the target machine learning model.
The limits of membership inference
Membership inference attacks are not successful on all kinds of machine learning tasks. To create an efficient attack model, the adversary must be able to explore the feature space. For example, if a machine learning model is performing complicated image classification (multiple classes) on high-resolution photos, the costs of creating training examples for the membership inference attack will be prohibitive.
But in the case of models that work on tabular data such as financial and health information, a well-designed attack might be able to extract sensitive information, such as associations between patients and diseases or financial records of target people.
Membership inference is also highly associated with “overfitting,” an artifact of poor machine learning design and training. An overfitted model performs well on its training examples but poorly on novel data. Two reasons for overfitting are having too few training examples or running the training process for too many epochs.
The more overfitted a machine learning model is, the easier it will be for an adversary to stage membership inference attacks against it. Therefore, a machine model that generalizes well on unseen examples is also more secure against membership inference.