The Four Deep Learning Breakthroughs You Should Know About

Ready to learn Data Science? Browse courses like Data Science Training and Certification developed by industry thought leaders and Experfy in Harvard Innovation Lab.

The First in a Series on Deep Learning for Non-Experts

Why read this?

To get started applying Deep Learning, either as an individual practitioner or as a organization, you need two things:

The “what”: an idea of what the latest developments in Deep Learning are capable of.
The “how”: the technical capability to either train a new model or take your existing model and get it working in production.

Thanks to the strength of the open source community, the second part is getting easier every day. There are many great tutorials on the specifics of how to train and use Deep Learning models using libraries such as TensorFlow — many of which publications like Towards Data Science publish on a weekly basis.

The implication of this is that once you have an idea for how you’d like to use Deep Learning, implementing your idea, while not easy, involves standard “dev” work: following tutorials like the ones linked throughout this article, modifying them for your specific purpose and/or data, troubleshooting via reading posts on StackOverflow, and so on. They don’t, for example, require being (or hiring) a unicorn with Ph.D who can code original neural net architectures from scratch and is an experienced software engineer.

This series of essays will attempt to fill a gap on the first part: covering, at a high level, what Deep Learning is capable of, while giving resources for those of you who want to learn more and/or dive into the code and tackle the second part. More specifically, I’ll cover:

What the latest achievements using open source architectures and datasets have been.
What the key architectures or other insights were that led to those achievements
What the best resources to get started with using similar techniques on your own projects.

What These Breakthroughs Have in Common

The breakthroughs, while they involve many new architectures and ideas, were all achieved using the usual “Supervised Learning” process from machine learning. Specifically the steps are:

Collect a large set of appropriate training data
Set up a neural net architecture — that is, a complicated system of equations, loosely modeled on the brain — that often has of millions of parameters called “weights”.
Repeatedly feed the data through the neural net; at each iteration comparing the result of the neural net’s prediction to the correct result, and adjusting each of the neural net’s weights based on how much and in what direction it misses.

This is how neural nets are trained: this process is repeated many, many times. Source.

This process has been applied to many different domains, and has resulted in neural nets that appear to have “learned”. In each domain, we’ll cover:

The data needed to train these models
The model architecture used
The results

1. Image classification

Neural networks can be trained to figure out what object or objects an image contains.

Data required

To train an image classifier, you need labeled images, where each image belongs to one of a number of finite classes. For example, one of the standard datasets used to train image classifiers is the CIFAR 10 data, which has correctly labelled images of 10 classes:

Illustration of images of CIFAR-10 data. Source

Deep Learning Architecture

All the neural net architectures we’ll cover were motivated by thinking about how people would actually have to learn to solve the problem. How do we do this for image detection? When humans determine what is in an image we first would look for high-level visual features, like branches, noses, or wheels. In order to detect these, however, we would subconsciously need to determine lower level features like colors, lines, and other shapes. Indeed, to go from raw pixels to complex features that humans would recognize, like eyes, we would require detecting features of pixels, and then features of features of pixels, etc.

Prior to Deep Learning, researchers would manually try to extract these features and use them for prediction. Just before the advent of Deep Learning, researchers were starting to use techniques (mainly SVMs) that tried to find complex, nonlinear relationships between these manually-extracted features and whether an image was of a cat or dog, for example.

Convolutional Neural Network extracting features at each layer. Source

Now, researchers have developed neural net architectures that learn these features of the original pixels themselves; specifically, Deep Convolutional Neural Net architectures. These networks extract features of pixels, then features of features of pixels and so on, and then ultimately feed these through a regular neural net layer (similar to a logistic regression) to make the final prediction.

Samples of the predictions a leading CNN architecture made on images from the ImageNet dataset.

We’ll dive deeper into how convolutional neural nets are being used for image classification in a future post.

Breakthroughs

The consequence of this is that on the central task these architectures were designed to solve — image classification —algorithms can now achieve better results than humans. On the famous ImageNet dataset, which is most commonly used as a benchmark for convolutional architectures, trained neural nets now achieve better-than-human performance on image classification:

As of 2015, computers can be trained to classify objects in images better than humans. Source

In addition, researchers have figured out how to take images not immediately curated for image classification, segment out rectangles of the image most likely to represent objects of specific classes, feed each of these rectangles through a CNN architecture, and end up with classifications of the individual objects in the image along with boxes bounding their location (these are called “bounding boxes”):

Object detection using “Mask R-CNN”. Source

This entire multi-step process is technically known as “object detection”, though it uses “image classification” for the most challenging step.

Resources

Theoretical: For a deeper look at the theory of why CNNs work, read the tutorial from Andrej Karpathy’s Stanford course here. For a slightly more mathematical version, check out Chris Olah’s post on convolutions here.

Code: To get started quickly building an Image Classifier, check out this introductory example from the TensorFlow documentation.