Activation Functions within Neural Networks

In this post you will learn the most common Activation Functions within Deep Learning and when you should use them. You will also discover why you mostly need to use non-linear activation functions.

It is important to know which activation functions to use within your neural network. Be aware of the fact that you can use different activation functions at different layers. In my previous posts I only used the sigmoid function but often other functions can work much better.

tanh 

A activation function that nearly almost works better than the sigmoid function is the tanh activation function.

The tanh function is actually mathematically a shifted version of the sigmoid function. The sigmoid function only maps values between 0 and 1 but the tanh function maps them between -1 and 1. 

 

 

Using it within the units of a neural network almost always works a lot better than using the sigmoid function.

Because of the values between -1 and +1 the mean of the activations that come out of the hidden layer are close to having a zero mean, which makes learning for the next layer a little bit easier.

The only exception for using the sigmoid function is using it at the output layer at binary classification problems while using the Relu function at the hidden layers. Because when you want to predict either 0 or 1 it makes sense that y-hat should be between 0 and 1 and not between -1 and +1.

rectified linear unit (relu)

Another very popular activation function within machine learning is the Rectified Linear Unit function which is also just called relu.

It looks like this:

ReLu Kopie.png

The derivative is 1 as long as z (a point at the x-axes) is positive and the derivative is 0 when z is negative.

If your not sure which function to use for your hidden layer then the rely function is a good choice but be aware of the fact that there are no perfect guidelines about which function to use because your data and your problems will always be very unique. Choosing the right one is more of an art than a science. Consequently you should try things out if your not very sure.

leaky rectified linear unit

The leaky relu function is a slightly changed version of the relu function. Instead of the slope being zero when z is equal to 1 the function has a slight slope.

 

ReLu.png

This works a bit better most of the time but isn’t used that much in practice.

An advantage of both functions is that for a lot of the space of z the slope of the activation function is very different to zero which let’s you neural network work much faster.

Why do you need non-linear activation functions?

If we use a linear activation function at the hidden layers our neural networks just outputs a linear function of the input. That will happen no matter how many layers a neural network has. This then makes a neural network no more better than logistic regression.

The key takeaway for you should be that linear activation functions within hidden layers are more or less useless except some very special cases.

One case where you could use it, is if you are working at a regression problem where y is a real number, like predicting the prices of houses. But only at the output layer, the hidden layers should use non-linear functions. You can see an example at the picture below.

Bildschirmfoto 2017-11-10 um 14.38.54.png

Nevertheless even then you could use a relu instead of a linear function at the output layer with the same result. This is one of the reasons why a sigmoid function is rarely used nowadays.

  • Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Leave a Comment
    Next Post

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    AI & Machine Learning
    The Impact of AI on App Development – Why Does It Progress at a Rapid Pace

    Image (source) Technological advancements have left a massive impact on nearly every aspect of society. So the idea of having an intelligent assistant with you at all times is not far from a dream come true. Since the turn of the century, mobile apps and user experiences have changed dramatically. Early apps offered very few

    7 MINUTES READ Continue Reading »
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »