Two Fundamental Neural Network Anatomical Structures

Arun Jagota Arun Jagota
April 16, 2021 AI & Machine Learning

Fully-connected versus Convolutional

Like other organisms, artificial neural networks have evolved through the ages. In this post, we cover two key anatomies that have emerged: fully-connected versus convolutional. The second one is better suited to problems in image processing in which there are local features in a space with geometry. The first one is generally appropriate on problems in which there isn’t a geometry and spatial locality of features is not paramount.

Single Neurons

Let’s start with models of single artificial neurons, the “Leggo bricks” of neural networks. A neuron takes a vector x as input and derives a scalar output y from it. Most neuron models conform to y = f(wx + b). Here w is the vector of weights of the same dimensionality as x, and b is a scalar called the neuron’s bias. This is graphically depicted below.

Two Fundamental Neural Network Anatomical Structures
Multiply the input vector with a weight vector, then feed through an activation function

The configurable part of this is the neuron’s activation function f. Different choices of f lead to neurons with quite different capabilities.

The most common choices are linear: f(u) = u, step: f(u) = sign(u), and sigmoid: f(u) = 1/(1+e^-u). These are depicted below.

Activation Functions
Activation functions: sigmoid, linear, step

When the target output y is expected to be (approximately) a linear function of x, a linear activation function is called for. This is the setting of linear regression.

When the target output y is binary, we have a choice: a sigmoidal activation function or a step activation function. Both theory and practice favor the sigmoid.

There are several reasons for this.

One is that the sigmoid is differentiable whereas the step function is not. This allows the neuron’s parameters (the weights w and the bias b) to be trained via gradient descent on a data set of input-output pairs.

A second one is that using a sigmoid captures more information which can often be used profitably. For example, consider two situations, one in which the neuron’s output is 0.9 and one in which it is 0.7. We might be inclined to classify both as 1s. (Remember we seek a binary output.) Should we do so, it makes sense to attach higher confidence to the first one since the neuron’s output was higher.

Another way we can use the additional information in the neuron’s output is by adjusting the binary classification threshold. This lets us become more (or less) conservative in our decision-making.

The key point here is that the classification threshold can be adjusted post-training. In fact, any time we wish to. (This is especially useful after the neuron starts making decisions in the “field” and we realize we’d like to tweak its behavior.) This threshold is not a parameter during training. Only the weight vector w and bias b are.

Simply put, if we observe the neuron is overly sensitive (i.e. its outputs are sometimes towards 1 when the target is 0) we can increase the classification threshold. Similarly, if the neuron is not sensitive enough, we can decrease the classification threshold.

Networks of Neurons

Well, actually a single neuron is already an example. A useful one at that. It can be used to map a vector of inputs to a numeric or binary output. That is, to solve regression and binary classification problems.

Often, not very well though.

The first breakthrough is an intermediate layer of neurons between the input and the output. This is graphically depicted below.

Two Fundamental Neural Network Anatomical Structures
A hidden layer between X and y. Sigmoidal hidden neurons are especially attractive

The intermediate layer is called a hidden layer. In the schematic, we have used sigmoidal neurons in the internal layer. Linear hidden neurons are less useful and step neurons have issues we discussed earlier.

A sigmoidal hidden neuron may be viewed as representing some binary feature of the input. The neuron’s value is derived from the input vector x. This value may be viewed as the probability that the associated feature is present in the input.

A neural network with a hidden layer maps an input vector to a vector in a space of features. This mapping is nonlinear. The feature vector is then mapped to the output. This indirect approach results in an architecture that is in principle more powerful than one without the hidden layer.

In practice, there are some issues. How many neurons should go into the hidden layer? This depends on the complexity of the input-to-output mapping problem. This complexity may not be known. For linear problems, we may not need any hidden neurons. In fact, having them might hurt.

The short answer is we don’t. That said, we could try a different number of hidden neurons and pick the one that works best or at least adequately.

On to the next question. For a fixed number of hidden neurons, how do the features they represent get learned? The short answer is via the learning process, typically a form of gradient descent called back-propagation. We won’t go into the details here.

That said, we will depict the roles the various weights touching a hidden neuron play.

Feature Function
The input-to-feature weights learn the feature function. The weights from the various features to the output control the relative influence of the various features on the final output.

Finally, let’s mention that, as before, the output neuron is sigmoidal for a binary classification problem and linear for a regression problem.

Image Classification Problems

Say we have a large set of images, some containing cats, others not. We’d like to learn a classifier that can tell whether or not an image has a cat in it.

Say each of our images has lots of pixels in them. 100 X 100 = 10,000 for concreteness.

In principle, we could map this problem to a neural network with one hidden layer, as depicted below.

Two Fundamental Neural Network Anatomical Structures
Lots of input neurons. With even a moderate number of hidden neurons, this means lots and lots of input-to-hidden weights!

In practice this approach has difficulties. We have 10,000 input neurons. With a hidden layer of m neurons, this means we have 10000*m input-to-hidden weights. It’s hard to imagine that we could use less than 20 hidden neurons to adequately learn a cat-or-not classifier.

20 hidden neurons means 200,000 input-to-hidden weights. That’s a lot of weights to learn! Even with a rich training set, overfitting is a significant risk.

Let’s think differently. Is there structure in this domain (images) that we might be able to exploit? It turns out the answer is yes. First some observations.

  1. The input neurons are in a spatial grid.
  2. Features in an image are often local.
  3. The same feature may occur at different locations in the input.

Let’s expand on 2 and 3. Consider the picture below. It shows horizontal edges at various locations. Each edge is the same feature but at a different location. Each of these feature occurrences is also local. Local just means that to detect the edge at any one location, one only needs to look at pixel values in the proximity of the edge.

Local Feature
A local feature (horizontal edge) that occurs at many locations in the image

Okay, we now see that each feature is also on the same spatial grid as the input pixels. The implication of this is that for any one feature, there isn’t a single value, rather a grid of values. For each location (i,j), feature f has a value that indicates whether f is present or absent at that location.

Great. Seems like we have gone in the opposite direction. Seems like we have made the problem more complex. Instead of a feature having a single value, it now has a spatial grid of values, one per location.

Not really. How would we try to represent a local feature in an MLNN? Separating out the actual feature function from the location where it applies? Plus, leverage its locality. We can’t.

The inability of the MLNN to (i) exploit feature locality and (ii) the ability to evaluate the feature at many different locations means that we need a lot of hidden neurons to model the combination of the functional and spatial aspects of a feature. (Functional meaning what the feature is, spatial meaning where it is evaluated.) On top of that, because we are unable to exploit locality, the number of input-to-hidden weights explodes.

lots of input neurons X lots of hidden neurons X fully-connected → lots and lots of input-to-hidden weights → Network way overly complex

The picture below depicts the alternative that leverages the input geometry, the locality of features, and the need to evaluate a feature at many different locations.

Two Fundamental Neural Network Anatomical Structures
The value of a feature is itself a map. The feature’s function is scanned over the entire input image to produce its feature map.

Let’s zoom into the feature’s neuron at location (i,j).

Locality made explicit with the receptive field. The feature’s weights depend only on its function, not on the location it is evaluated.
Locality made explicit with the receptive field. The feature’s weights depend only on its function, not on the location it is evaluated.

Okay, so any single (local) feature is represented by the same small set of weights. We just slide this feature’s function (called a Kernel) over the various locations (i, j) to get a reading on the feature’s values over the entire grid. This sliding process is called convolution.

By contrast, the MLP has no explicit mechanisms for exploiting either locality or sharing of weights.

Summary

We have covered the anatomical structures of the two most important (feedforward) neural network architectures: fully-connected multi-layer neural networks and convolutional neural networks. We have discussed why convolutional neural networks are better suited to image processing than multi-layer neural networks. On the MLNNs are well-suited to problems in which locality and convolutions don’t come into play in obvious ways.

Further Reading

  1. http://deeplearning.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork/
  2. http://deeplearning.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Arun Jagota

    Tags
    Artificial IntelligenceArtificial Neural NetworksConvolutional NetworkMultilayer PerceptronSigmoid Function
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    From Flatland To HOG Heaven

    From Flatland To HOG Heaven

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in AI & Machine Learning
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    5 AI Applications Changing the Energy Industry

    The energy industry faces some significant challenges, but AI applications could help. Increasing demand, population expansion, and climate change necessitate creative solutions that could fundamentally alter how businesses generate and utilize electricity. Industry researchers looking for ways to solve these problems have turned to data and new data-processing technology. Artificial intelligence, in particular — and

    3 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: support@experfy.com

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.