Today, artificial intelligence programs can recognize faces and objects in photos and videos, transcribe audio in real-time, detect cancer in x-ray scans years in advance, and compete with humans in some of the most complicated games.
Until a few years ago, all these challenges were either thought insurmountable, decades away, or were being solved with sub-optimal results. But advances in neural networks and deep learning, a branch of AI that has become very popular in the past few years, has helped computers solve these and many other complicated problems.
Unfortunately, when created from scratch, deep learning models require access to vast amounts of data and compute resources. This is a luxury that many can’t afford. Moreover, it takes a long time to train deep learning models to perform tasks, which is not suitable for use cases that have a short time budget.
Fortunately, transfer learning, the discipline of using the knowledge gained from one trained AI model to another, can help solve these problems.
The cost of training deep learning models
Deep learning is a subset of machine learning, the science of developing AI through training examples. The concepts and science behind deep learning and neural networks is as old as the term “artificial intelligence” itself. But until recent years, they had been largely dismissed by the AI community for being inefficient.
The availability of vast amounts of data and compute resources in the past few years have pushed neural networks into the limelight and made it possible to develop deep learning algorithms that can solve real world problems.
To train a deep learning model, you basically must feed a neural network with lots of annotated examples. These examples can be things such as labeled images of objects or mammograms scans of patients with their eventual outcomes. The neural network will carefully analyze and compare the images and develop mathematical models that represent the recurring patterns between images of a similar category.
There already exists several large open-source datasets such as ImageNet, a database of more than 14 million images labeled in 22,000 categories, and MNIST, a dataset of 60,000 handwritten digits. AI engineers can use these sources to train their deep learning models.
However, training deep learning models also requires access to very strong computing resources. Developers usually use clusters of CPUs, GPUs or specialized hardware such as Google’s Tensor Processors (TPUs) to train neural networks in a time-efficient way. The costs of purchasing or renting such resources can be beyond the budget of individual developers or small organizations. Also, for many problems, there aren’t enough examples to train robust AI models.
Transfer learning makes deep learning training much less demanding
Say an AI engineer wants to create an image classifier neural network to solve a specific problem. Instead of gathering thousands and millions of images, the engineer can use one of the publicly available datasets such as ImageNet and enhance it with domain-specific photos.
But the AI engineer must still rent pay a hefty sum to rent the compute resources necessary to run those millions of images through the neural network. This is where transfer learning comes into play. Transfer learning is the process of creating new AI models by fine-tuning previously trained neural networks.
Instead of training their neural network from scratch, developers can download a pretrained, open-source deep learning model and finetune it for their own purpose. There are many pretrained base models to choose from. Popular examples include AlexNet, Google’s Inception-v3 and Microsoft’s ResNet-50. These neural networks have already been trained on the ImageNet dataset. AI engineers only need to enhance them by further training them with their own domain-specific examples.
Transfer learning doesn’t require huge compute resources. In most cases, a decent desktop computer or a strong laptop can finetune a pretrained neural network in a few hours or even less.
How does transfer learning work?
Interestingly, neural networks develop their behavior in a hierarchical way. Every neural network is composed of multiple layers. After training, each of the layers become tuned to detect specific features in the input data.
For instance, in an image classifier convolutional network, the first few layers detect general features such as edges, corners, circles and blobs of colors. As you go deeper into the network, the layers start to detect more concrete things such as eyes, faces, and full objects.
Top layers of neural networks detect general features. Deeper layers detect actual objects (source: arxiv.org)
When doing transfer learning, AI engineers freeze the first layers of the pretrained neural network. These are the layers that detect general features that are common across all domains. Then they finetune the deeper layers to finetune them with their own examples and add new layers to classify new categories included in their training dataset.
The pretrained and finetuned AI models are also respectively called the “teacher” and “student” models.
The number of frozen and finetuned layers depend on the similarities between the source and destination AI models. If the student AI model solves a problem that is very close of the teacher, there’s no need to finetune the layers of the pretrained model. The developer only needs to append a new layer at the end of the network and train the AI for the new categories. This is called “deep-layer feature extraction.” Deep feature extraction is also preferable when there’s very little training data for the destination domain.
When there are considerable differences between the source and destination, or training examples are abundant, the developers unfreeze several layers in the pretrained AI model. Then they add the new classification layer and finetune the unfrozen layers with the new examples. This is called “mid-layer feature extraction.”
In cases where there are significant differences between the source and destination AI models, the developers unfreeze and retrain the entire neural network. Called “full model fine-tuning,” this type of transfer learning also requires a lot of training examples.
Image source: University of Chicago
It might seem absurd to take a pretrained model and retrain all its layers. But in practice, it saves time and compute resources. Before training, the variables in a neural network are initialized with random numbers and adapt their values as they process the training data. The values of the variables of a pretrained neural network have already been tuned to the millions of training examples. Therefore they are a much better starting point for a new AI model that wants to train on a new set of examples that have even the slightest similarities with the source AI model.
Transfer learning is not a silver bullet
Transfer learning wolves many of the problems of training AI models in an efficient and affordable way. However, it also has tradeoffs. If a pretrained neural network has security holes, the AI models that use it as the basis for transfer learning with inherit those vulnerabilities.
For instance, a base model might not be robust against adversarial attacks, carefully crafted input examples that force the AI to change its behavior in erratic ways. If a malicious actor manages to develop an adversarial example for a base model, their attack will work on most of the AI models that have been derived from it. Researchers at the University of Chicago, UC Santa Clara and Virgina Tech showed this in a paper presented at the Usenix Security Symposium last year.
Also, in some domains, such as teaching AI to play games, the use of transfer learning is very limited. Those AI models are trained on reinforcement learning, a branch of AI that is very compute-intensive and requires a lot of trial and error. In reinforcement learning, most new problems are unique and require their own AI model and training process.
But all in all, for most deep learning applications, such as image classification and natural language processing, there’s a likely chance that you’ll be able to shortcut your way with a good dose of clever transfer learning.