Everything You Should Know About Gradient Descent

Arthur Mello Arthur Mello
September 23, 2020 Big Data, Cloud & DevOps

This is one of the most popular optimisation algorithms in Data Science. Do you know how it works?

Why should I read this?

Most of Machine Learning models use some sort of optimisation algorithm to find the parameters that yield the smallest error possible, and Gradient Descent is probably one of the most popular of those algorithms.

If you are a Data Scientist or work with Machine Learning, you certainly work with some algorithm that uses Gradient Descent. Even though you probably will never have to hard code it yourself, understanding how it works can help you understand why your model is not giving good results, or even not converging at all.

How does it work?

Gradient Descent finds the parameters that result in the local minimum of a function. In ML algorithms, this function is usually some sort of cost function, related to the difference between the model output and the actual observations.

Intuition

Gradient Descent chooses a random point in the function and starts “walking” towards the lowest point around, until it reaches a local minimum. The starting point is a random combination of parameters, that yield a certain cost (error). The following step is to find a point nearby where this cost is lower, and move to it.

Imagine you are walking on a mountain in the dark, trying to find the lowest point possible. You slowly feel the way, trying to always walk down. The more gentle the slope gets, the slower you walk, since the more you expect it to be reaching an end.

Gif from Gfycat.

Making it formal

How do you do control that speed when searching in a function, however? By using the derivative of the function on that specific point.

Image for post

You will then sequentially update your function parameters θ with the derivative of the cost function J(Θ) on the point you are right now, adjusted by ⍺.

⍺ is the learning rate (how much you adjust the size of the “step” when the algorithm “walks”). If ⍺ is too small, the algorithm can be too slow. If ⍺ is is too big, the algorithm can fail to converge or even diverge.

:= is an assignment operator, meaning we assign such value to our variable.

So, ideally, you want your learning rate to be as high as possible (so your algorithm runs as fast as possible), without being so high that it diverges. Next time your model doesn’t converge, try using a smaller learning rate.

Alternative versions

For really big datasets, Gradient Descent can be really computationally expensive, since it has to calculate J through the whole dataset for every iteration. So, let’s look at some alternatives that work better for a lot of data.

Stochastic Gradient Descent

Stochastic Gradient Descent starts by randomly shuffling your data. This is important because it will read through the observations step by step, so they shouldn’t be in any particular order. Then, it will calculate the cost function only for the first observation and take the first step towards the minimum based only on that observation. For the second iteration, it will do the same, based solely on the second observation. And so on, until it reaches the last observation in the training set. If you have a lot of data, this should still yield satisfactory results within reasonable time.

Mini-Batch Gradient Descent

The Mini-Batch version is somewhere in between the Stochastic and the classic ones. Here, instead of using only one training example at each iteration, it uses a mini-batch of size b, a user-defined parameter. This means that if we set b = 10, for example, we would be doing a Gradient Descent using 10 observations at each iteration. This can sometimes actually be faster than the Stochastic version. If you think about it, you will see that the Stochastic version is actually a special case of the Mini-Batch, when b = 1.

Conclusion

Now that you know how Gradient Descent works, you know how to fix it when it doesn’t converge, by changing the learning rate, and how to make it faster, by implementing its alternative versions.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Arthur Mello

    Tags
    AlgorithmsData ScienceGradient Descent
    Leave a Comment
    Next Post
    AI and the Future of Social Work

    AI and the Future of Social Work

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in Big Data, Cloud & DevOps
    Big Data, Cloud & DevOps
    Cognitive Load Of Being On Call: 6 Tips To Address It

    If you’ve ever been on call, you’ve probably experienced the pain of being woken up at 4 a.m., unactionable alerts, alerts going to the wrong team, and other unfortunate events. But, there’s an aspect of being on call that is less talked about, but even more ubiquitous – the cognitive load. “Cognitive load” has perhaps

    5 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    How To Refine 360 Customer View With Next Generation Data Matching

    Knowing your customer in the digital age Want to know more about your customers? About their demographics, personal choices, and preferable buying journey? Who do you think is the best source for such insights? You’re right. The customer. But, in a fast-paced world, it is almost impossible to extract all relevant information about a customer

    4 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    3 Ways Businesses Can Use Cloud Computing To The Fullest

    Cloud computing is the anytime, anywhere delivery of IT services like compute, storage, networking, and application software over the internet to end-users. The underlying physical resources, as well as processes, are masked to the end-user, who accesses only the files and apps they want. Companies (usually) pay for only the cloud computing services they use,

    7 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: support@experfy.com

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2023, Experfy Inc. All rights reserved.