What are artificial neural networks (ANN)?

One of the most influential technologies of the past decade is artificial neural networks, the fundamental piece of deep learning algorithms, the bleeding edge of artificial intelligence.

You can thank neural networks for many of applications you use every day, such as Google’s translation service, Apple’s Face ID iPhone lock and Amazon’s Alexa AI-powered assistant. Neural networks are also behind some of the important artificial intelligence breakthroughs in other fields, such as diagnosing skin and breast cancer, and giving eyes to self-driving cars.

The concept and science behind artificial neural networks have existed for many decades. But it has only been in the past few years that the promises of neural networks have turned to reality and helped the AI industry emerge from an extended winter.

While neural networks have helped the AI take great leaps, they are also often misunderstood. Here’s everything you need to know about neural networks.

Similarities between artificial and biological neural networks

The original vision of the pioneers of artificial intelligence was to replicate the functions of the human brain, nature’s smartest and most complex known creation. That’s why the field has derived much of its nomenclature (including the term “artificial intelligence”) from the physique and functions of the human mind.

Artificial neural networks are inspired from their biological counterparts. Many of the functions of the brain continue to remain a mystery, but what we know is that biological neural networks enable the brain to process huge amounts of information in complicated ways.

The brain’s biological neural network consists of approximately 100 billion neurons, the basic processing unit of the brain. Neurons perform their functions through their massive connections to each other, called synapses. The human brain has approximately 100 trillion synapses, about 1,000 per neuron.

Every function of the brain involves electrical currents and chemical reactions running across a vast number of these neurons.

How artificial neural networks functions

The core component of ANNs is artificial neurons. Each neuron receives inputs from several other neurons, multiplies them by assigned weights, adds them and passes the sum to one or more neurons. Some artificial neurons might apply an activation function to the output before passing it to the next variable.

The structure of an artificial neuron, the basic component of artificial neural networks (source: Wikipedia)

At its core, this might sound like a very trivial math operation. But when you place hundreds, thousands and millions of neurons in multiple layers and stack them up on top of each other, you’ll obtain an artificial neural network that can perform very complicated tasks, such as classifying images or recognizing speech.

Artificial neural networks are composed of an input layer, which receives data from outside sources (data files, images, hardware sensors, microphone…), one or more hidden layers that process the data, and an output layer that provides one or more data points based on the function of the network. For instance, a neural network that detects persons, cars and animals will have an output layer with three nodes. A network that classifies bank transactions between safe and fraudulent will have a single output.

Neural networks are composed of multiple layers (source: www.deeplearningbook.org)

Training artificial neural networks

Artificial neural networks start by assigning random values to the weights of the connections between neurons. The key for the ANN to perform its task correctly and accurately is to adjust these weights to the right numbers. But finding the right weights is not very easy, especially when you’re dealing with multiple layers and thousands of neurons.

This calibration is done by “training” the network with annotated examples. For instance, if you want to train the image classifier mentioned above, you provide it with multiple photos, each labeled with its corresponding class (person, car or animal). As you provide it with more and more training examples, the neural network gradually adjusts its weights to map each input to the correct outputs.

Basically, what happens during training is the network adjust itself to glean specific patterns from the data. Again, in the case of an image classifier network, when you train the AI model with quality examples, each layer detects a specific class of features. For instance, the first layer might detect horizontal and vertical edges, the next layers might detect corners and round shapes. Further down the network, deeper layers will start to pick out more advanced features such as faces and objects.

Each layer of the neural network will extract specific features from the input image.(source: arxiv.org)

When you run a new image through a well-trained neural network, the adjusted weights of the neurons will be able to extract the right features and determine with accuracy to which output class the image belongs.

One of the challenges of training neural networks is to find the right amount and quality of training examples. Also, training large AI models requires vast amounts of computing resources. To overcome this challenge, many engineers use “transfer learning,” a training technique where you take a pre-trained model and fine-tune it with new, domain-specific examples. Transfer learning is especially efficient when there’s already an AI model that is close to your use case.

Neural networks vs classical AI

Traditional, rule-based AI programs were based on principles of classic software. Computer programs are designed to run operations on data stored in memory locations, and save the results on a different memory location. The logic of the program is sequential, deterministic and based on clearly-defined rules. Operations are run by one or more central processors.

Neural networks, however are neither sequential, nor deterministic. Also, regardless of the underlying hardware, there’s no central processor controlling the logic. Instead, the logic is dispersed across the thousands of smaller artificial neurons. ANNs don’t run instructions; instead they perform mathematical operations on their inputs. It’s their collective operations that develop the behavior of the model.

Instead of representing knowledge through manually coded logic, neural networks encode their knowledge in the overall state of their weights and activations. Tesla AI chief Andrej Karpathy eloquently describes the software logic of neural networks in an excellent Medium post titled “Software 2.0”:

The “classical stack” of Software 1.0 is what we’re all familiar with — it is written in languages such as Python, C++, etc. It consists of explicit instructions to the computer written by a programmer. By writing each line of code, the programmer identifies a specific point in program space with some desirable behavior.

In contrast, Software 2.0 can be written in much more abstract, human unfriendly language, such as the weights of a neural network. No human is involved in writing this code because there are a lot of weights (typical networks might have millions), and coding directly in weights is kind of hard (I tried).

Neural networks vs other machine learning techniques

Artificial neural networks are just one of the several algorithms for performing machine learning, the branch of artificial intelligence that develops behavior based on experience. There are many other machine learning techniques that can find patterns in data and perform tasks such as classification and prediction. Some of these techniques include regression models, support vector machines (SVM), k-nearest methods and decision trees.

When it comes to dealing with messy and unstructured data such as images, audio and text, however, neural networks outperform other machine learning techniques.

For example, if you wanted to perform image classification tasks with classic machine learning algorithms, you would have to do plenty of complex “feature engineering,” a complicated and arduous process that would require the efforts of several engineers and domain experts. Neural networks and deep learning algorithms don’t require feature engineering and automatically extract features from images if trained well.

This doesn’t mean, however, that neural network is a replacement for other machine learning techniques. Other types of algorithms require less compute resources and are less complicated, which makes them preferable when you’re trying to solve a problem that doesn’t require neural networks.

Other machine learning techniques are also interpretable (more on this below), which means it’s easier to investigate and correct decisions they make. This might make them preferable in use cases where interpretability is more important than accuracy.

The limits of neural networks

In spite of their name, artificial neural networks are very different from their human equivalent. And although neural networks and deep learning are the state-of-the-art of AI today, they’re still a far shot from human intelligence. Therefore, neural networks will fail at many things that you would expect from a human mind:

Neural networks need lots of data: Unlike the human brain, which can learn to do things with very few examples, neural networks need thousands and millions of examples.
Neural networks are bad at generalizing: A neural network will perform accurately at a task it has been trained for, but very poorly at anything else, even if it’s similar to the original problem. For instance, a cat classifier trained on thousands of cat pictures will not be able to detect dogs. For that, it will need thousands of new images. Unlike humans, neural networks don’t develop knowledge in terms of symbols (ears, eyes, whiskers, tail)—they process pixel values. That’s why they will not be able to learn about new objects in terms of high-level features and they need to be retrained from scratch.
Neural networks are opaque: Since neural networks express their behavior in terms of neuron weights and activations, it is very hard to determine the logic behind their decisions. That’s why they’re often described as black boxes. This makes it hard to find out if they’re making decisions based on the wrong factors.

AI expert and neuroscientist Gary Marcus has explained the limits of deep learning and neural networks in an in-depth research paper last year.

Also neural networks aren’t a replacement for good-old fashioned rule-based AI in problems where the logic and reasoning is clear and can be codified into distinct rules. For instance, when it comes to solving math equations, neural networks perform very poorly.

There are several efforts to overcome the limits of neural network, such a DARPA-funded initiative to create explainable AI models. Other interesting developments include developing hybrid models that combine neural networks and rule-based AI to create AI systems that are interpretable and require less training data.

Although we still have a long way to go before we reach the goal of human-level AI (if we’ll ever reach it at all), neural networks have brought us much closer. It’ll be interesting to see what the next AI innovation will be.