Whether we take it for granted or not, deep learning algorithms have become an inseparable part of our daily lives. Personalized feeds, face and voice recognition, web search, smart speakers, digital assistants, email, and many other applications that we can’t part ways with use deep learning algorithms under the hood.
But how effective is deep learning in scientific research, where problems are often much more complex than classifying an image and requirements are much more sensitive than recommending what to buy next?
To answer this question, former Google CEO Eric Schmidt and Google AI researcher Maithra Raghu have put together a comprehensive guide on the different deep learning techniques and their application to scientific research.
“The amount of data collected in a wide array of scientific domains is dramatically increasing in both size and complexity,” the authors write, adding that along with advances in machine learning, this rich corpus of data can provide “many exciting opportunities for deep learning applications in scientific settings.”
Titled “A Survey of Deep Learning for Scientific Discovery,” their guide provides a very accessible overview of deep learning and neural networks for scientists who aren’t necessarily versed in the complex language of artificial intelligence algorithms.
I strongly recommend reading the entire 48-page document and visit many of its references. But here are some key takeaways.
You don’t necessarily need to do deep learning
With deep learning being all the rage, it’s easy to be tempted to apply it to anything and everything. After all, the basic proposition is very attractive: It’s an end-to-end AI model that takes a bunch of data, develops a mathematical representation, and performs complex classification and prediction tasks.
Deep neural networks can tackle problems previously solved by other types of machine learning algorithms, such as content recommendation or fraud detection. They can also handle problems that were traditionally difficult to handle with other machine learning techniques, including complex computer vision and natural language processing (NLP) tasks.
However, Schmidt and Raghu warn, when formulating a problem, it is important to consider whether deep learning provides the right set of tools to solve it. “In many settings, deep learning may not be the best technique to start with or best suited to the problem,” they write.
For many problems, simpler machine learning algorithms often provide more efficient solutions. For instance, if you want to find the most relevant of a set of chemical characteristics of different substances, you might be better off using “dimensionality reduction,” a technique that can find the features that contribute most to outcomes.
On the other hand, if you have limited data or if your data has been neatly arranged in a tabular format, you might want to consider trying a regression model before using neural networks. Neural networks usually (but not always) need lots of data. They are also difficult to interpret. In contrast, linear and logistic regression algorithms can provide more accurate results when the data is scarce, especially if the problem is linear in nature. Regression models also provide a clear mathematical equation with coefficients that explain the relevance of each feature in the dataset.
Deep learning for image-related scientific tasks
One area where deep learning algorithms have been very effective is the processing of visual data. The authors describe convolutional neural networks as “the most well known family of neural networks” and “very useful in working with any kind of image data.”
Aside from the commercial and industrial applications, CNNs have found their way into many scientific domains. One of the best known applications of convolutional neural networks is medical imaging analysis. There are already many deep learning algorithms that examine CT scans and x-rays and help in the diagnosis of diseases such as cancer. Recently, scientists have been using CNNs to find symptoms of the novel coronavirus in chest x-rays.
Some of the visual applications of deep learning are less known. For instance, neuroscientist are experimenting with pose-detection neural networks to track the movements of animals and analyze their behavior.
NLP technology can expand to other fields
Another area that has benefitted immensely from advances in deep learning algorithms is natural language processing. Recurrent neural networks, long short-term memory (LSTM) networks, and Transformers have proven to be especially good at performing language-related tasks such as translation and question-answering.
To be clear, current AI algorithms process language in fundamentally different—and inferior—ways than the human brain. Even the largest neural network will fail at some of the simplest tasks that a human child with a very rudimentary understanding of language can perform.
This is because like all other types of neural networks, RNNs and Transformers are at their very core pattern-matching machines. They can find recurring patterns in sequences of data, whether it be text or any other kind of information. According to Schmidt and Raghu, these structures can be used in “Problems where the data has a sequential nature (with different sequences of varying length), and prediction problems such as determining the next sequence token, transforming one sequence to another, or determining sequence similarities are important tasks.”
While this scheme presents limits in dealing with the abstract and implied meanings of language, it has some very interesting applications in scientific research in areas such genomics and proteomics, where sequential structures play an important role.
Transformers have proven to be especially efficient in scientific research. In one recent project, AI researchers used unsupervised learning to train a bidirectional Transformer on 86 billion amino acids across 250 million sequences spanning evolutionary diversity. “The resulting model maps raw sequences to representations of biological properties without labels or prior domain knowledge,” the researchers write. This is an important step toward understanding protein sequences and extracting general and transferable information about proteins from raw sequences.
What if you don’t have a lot of data?
One of the main criticisms against deep learning is its need for vast amounts of training data. In many fields of science, there’s not enough labeled data available. In others such as medicine, the data collection is prohibitively expensive and subject to the laws of handling sensitive personal information.
Deep neural networks also consume a lot of compute resources and electricity during training, requirements that many people and organizations can’t meet.
But not every deep learning model requires lots of training data. In the past few years, advances in transfer learning have enabled many developers to create deep learning models without the need for a lot of data and computation resources. Transfer learning involves finetuning a pre-trained AI model for a new task. Transfer learning has had remarkable success in computer vision, and there are many freely available AI models that have already been trained on millions of examples.
As long as the new problem is close enough to the domain of the base model and you have a decent set of examples, you’ll have a reasonable chance of being able to finetune the AI model for the new task.
“Typically, performing transfer learning is an excellent way to start work on a new problem of interest. There is the benefit of using a well-tested, standard neural network architecture, aside from the knowledge reuse, stability and convergence boosts offered by pretrained weights,” the authors write.
Meanwhile, they also warn: “Note however that the precise effects of transfer learning are not yet fully understood, and an active research area.”
Another area that is worth watching in the coming months is self-supervised learning, a branch of artificial intelligence that can learn from raw data without the need for human-labeled examples. Self-supervised learning is still in a very preliminary stage, however, and also an active area of research.
But an area that has already yielded result is generative models such as generative adversarial networks (GAN). GANs can generate fake data that resembles their real counterparts. Perhaps they’re best known for the natural-but-nonexistent human faces they can create. Artists are now using GANs to generate art that can sell at stellar prices.
For simplicity, let’s assume there are three customers (c1, c2, c3) in this batch, and one vehicle (v1) information is provided as a sale.
- P(C=c1) represents the likelihood of c1 to buy any car. Assuming no prior knowledge about each customer, their likelihood of buying any car should be the same: P(C=c1) = P(C=c2) = P(C=c3), which equals a constant (e.g. 1/3 in this situation)
- P(V=v1) is the likelihood for v1 to be sold, given it is shown in this batch, this should be 1 (100% likelihood to be sold)
Since there is only one customer making the purchase, this probability can be extended into:
P(V=v1) = P(C=c1, V=v1) + P(C=c2, V=v1) + P(C=c3, V=v1) = 1.0
For each of the item, given the following formula
P(C=c1, V=v1) = P(C=c1|V=v1) * P(V=v1) = P(V=v1|C=c1) * P(C=c1)
We can see P(C=c1|V=v1) is proportional to P(V=v1|C=c1). So now, we can get the formula for the probability calculation:
P(C=c1|V=v1) = P(V=v1|C=c1) / (P(V=v1|C=c1) + P(V=v1|C=c2) + P(V=v1|C=c3))
and the key is to get the probability for each P(V|C). Such a formula can be verbally explained as: the likelihood for a vehicle to be purchased by a specific customer is proportional to the likelihood for the customer to buy this specific vehicle.
The above formula may look too “mathematical”, so let me put it into an intuitive context: assuming three people were in a room, one is a musician, one is an athlete, and one is a data scientist. You were told there is a violin in this room belong to one of them. Now guess, whom do you think is the owner of the violin? This is pretty straightforward, right? given the likelihood of musician to own a violin is high, and the likelihood of athlete and data scientists to own a violin is lower, it is much more likely for the violin to belong to the musician. The “mathematical” thinking process is illustrated below.
Scientific research and deep learning’s interpretability issues
Another challenge that deep learning often presents is interpretability. Deep neural networks are complex functions with parameters that can span in the millions or even billions, and making sense of how they solve problems and make predictions is often perplexing.
This can pose a challenge to many areas of scientific research, where the focus is on understanding rather than prediction, and the researchers seek to identify the underlying mechanisms behind the patterns observed in the data. “When applying deep learning in scientific settings, we can use these observed phenomena as prediction targets, but the ultimate goal remains to understand what attributes give rise to these observations,” Schmidt and Raghu write.
Fortunately, advances in explainable artificial intelligence have helped, to some degree, overcome these barriers. While fully understanding and controlling the step-by-step decision-making mechanisms of neural networks remains a challenge, techniques developed in the past few years help us interpret the process.
Schmidt and Raghu AI interpretability techniques into two broad categories: Feature attribution and model inspection.
Feature attribution helps us better understand which features in a specific sample have contributed to a neural network’s output. These techniques produce saliency maps that highlight these features. For instance, if you’re inspecting an image classifier, the saliency map would highlight the parts of the image that the AI has homed in on when determining its category.
There are different techniques that produce saliency maps, including GradCAM, LIME, and RISE. They are good methods for inspecting the output of neural networks to understand whether their decisions are based on the right or wrong features.
Model inspection, on the other hand, tries to probe neurons in the hidden layers of a network and find the kind of input that activates them. These techniques provide better insights into the general workings of the AI model. Some of the interesting work done in this area is GANPaint, which lets you examine the effects of manipulating individual neurons, and Activation Atlassses, a tool that visualizes interactions between neurons in a neural network.
The opportunities for applying deep learning to scientific discovery are numerous, and the paper compiled by Schmidt and Raghu provides a great starting guide for aspiring scientists.
“As the amount of data collected across many diverse scientific domains continues to increase in both sheer amount and complexity, deep learning methods offer many exciting possibilities for both fundamental predictive problems as well as revealing subtle properties of the underlying data generation process,” the authors write.