3 Mathematical Laws To Know As A Data Scientist

Cornellius Yudha Wijaya Cornellius Yudha Wijaya
February 12, 2021 AI & Machine Learning

Some interesting laws that help you as a Data Scientist

While Data Scientist was working with Data for their main activity, it doesn’t mean that Mathematical knowledge is something we do not need. Data scientists need to learn and understand the mathematical theory behind machine learning to efficiently solving business problems.

The mathematical behind machine learning is not just a random notation thrown here and there, but it consisted of many theories and thoughts. This thought creates a lot of mathematical laws that contributed to the machine learning we able to use right now. Although you could use the mathematical in any way you want to solve the problem, mathematical laws are not limited to machine learning after all.

In this article, I want to outline some of the interesting mathematical laws that could help you as a Data Scientist. Let’s get into it.

Benford’s Law

Benford’s law also called the Newcomb–Benford law, the law of anomalous numbers, or the first-digit law, is a mathematical law about the leading digit number in a real-world dataset.

When we think about the first digit of the numbers, it should be distributed uniformly when we randomly took a number. Intuitively, the random number leading digit 1 should have the same probability as leading digit 9, which is ~11.1%. Surprisingly, this is not what happens.

Benford’s law states that the leading digit is likely to be small in many naturally occurring collections of numbers. Leading digit 1 happens more often than 2, leading digit 2 occurs more often than 3, and so on.

Let’s try using a real-world dataset to see how this law is applicable. For this article, I would use the data from Kaggle regarding Spotify Track song from 1921–2020. From the data, I would take the leading digit of the song durations.

3 Mathematical Laws To Know As A Data Scientist
Image created by Author

From the image above, we can see that the leading digit 1 occurs the most, then it is decreasing following the higher number. This is what Benford’s Law state above.

If we talk about the proper definition, Benford law state that a set of numbers is said to satisfy Benford’s law if the leading digit d ( ∈1,…,9) occurs with the equation.

3 Mathematical Laws To Know As A Data Scientist
Image created by Author

From this equation, we acquired the leading digit with the following distribution.

3 Mathematical Laws To Know As A Data Scientist
Image created by Author

With this distribution, we can predict that 1 as the leading digit is 30% likely to occurs more than the other leading digit.

Many applications for this law, for example, fraud detection on tax forms, election results, economic numbers, and accounting figures.

Law of Large Numbers (LLN)

The Law of Large Number stated that as the number of trials of a random process increases, the results’ average would get closer to the expected values or theoretical values.

For example, when rolling the dice. The possibility of 6-side dice is 1,2,3,4,5 and 6. The mean for the 6-side dice would be 3.5. As we are rolling the dice, the number we get would be random from 1 to 6, but as we keep rolling the dice, the result’s average would get closer to the expected value, which is 3.5. This is what the Law of Large Numbers denote.

While it is useful, the tricky part here is that you need many experiments or occurrences. However, a large number required means that it is good to predict long-term stability.

The Law of Large Numbers is different than the Law of Average, where it was used to express a belief that outcomes of a random event will “even out” within a small sample. This is what we called “Gambler’s Fallacy,” where we expect the expected value would occur in a smaller sample.

Zipf’s Law

Zipf’s law was created for quantitative linguistic, which states that given some natural language dataset corpus, any word’s frequency is inversely proportional to its frequency table rank. Thus the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word.

For example, in the previous Spotify dataset, I would try to split all the words and punctuation to count them. Below is the top 12 of the most common words and their frequency.

3 Mathematical Laws To Know As A Data Scientist
Image Created by Author

When I sum all the word that exists in the Spotify corpus, the total is 759389. We could see if Zipf’s law applies to this dataset by counting the probability when they occur. The first most occurring word or punctuation is ‘-’ with 32258, which has the probability of ~4% then followed by ‘the,’ which has the probability of ~2%.

Faithful to the law, the probability would keep going down in some of the words. Of course, there is a little deviation, but the probability would go down most of the time following the frequency rank increase.

Conclusion

These are some interesting mathematical laws to know as a Data Scientist and definitely would help you in your Data Science work. The laws are:

  • Benford’s Law
  • Law of Large Number
  • Zipf’s Law

I hope it helps!

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Cornellius Yudha Wijaya

    Tags
    Data ScientistMachine LearningMathematical Laws
    Leave a Comment
    Next Post
    Helping Future Workers Be Ready For The Future Of Work

    Helping Future Workers Be Ready For The Future Of Work

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in AI & Machine Learning
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    5 AI Applications Changing the Energy Industry

    The energy industry faces some significant challenges, but AI applications could help. Increasing demand, population expansion, and climate change necessitate creative solutions that could fundamentally alter how businesses generate and utilize electricity. Industry researchers looking for ways to solve these problems have turned to data and new data-processing technology. Artificial intelligence, in particular — and

    3 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: support@experfy.com

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2023, Experfy Inc. All rights reserved.