While performing statistical analysis, oftentimes, we face the dilemma about Frequentist Vs Bayesian Strategy for the problem. This choice becomes critical when working with limited-sized datasets. And, if you use one method over the other without having a fundamental understanding of the assumptions and limitations of the two approaches, then you could increase your chance of making a wrong inference.
The philosophical divide between Frequentist Vs Bayesian statistics goes back 250 years. The Bayesian approach dominated 19th-century statistics, while the Frequentist approach gained popularity in the 20th century. The million-dollar question that arises now is – which philosophy would rule the 21st century?
The Bayesian Vs Frequentist debate reflects upon two different attitudes towards statistical analysis. There’s a saying in statistics, “All models are wrong, but some are useful”. Both philosophical approaches carry their own merits and demerits. Hence, it becomes essential to understand them and be aware of them in order to adopt one that meets your requirements. In this blog, we’ll provide an intuitive understanding of the differences between the two methodologies.
Bayesian vs. Frequentist problem-solving approach
Bayesian statisticians build statistical models by using all the information they have to make the quickest possible progress. However, Frequentist statisticians conclude from sample data with emphasis on the frequency or proportion of the data only, without adding their prior knowledge about the data into the model.
Let’s understand this with an example. Suppose a Frequentist doctor and a Bayesian doctor diagnose a patient with fever caused by a sudden change in weather. How would each of them reach this diagnosis?
A Frequentist doctor would use a mental diagnostic model to find the problem in the patient by asking about all the symptoms the patient is experiencing, and then would give his diagnosis. However, a Bayesian doctor, along with a mental diagnostic model, would have a history of diagnosing this patient, and would be aware that the weather has changed recently and many people are catching fever due to that. So, the patient could also be susceptible to weather change. Therefore, by asking only a few symptom-related questions, he would diagnose fever in the patient.
This is how Frequentist and Bayesians take a different approach to problem-solving; one gives inference only based on the data available while the latter adds his own belief in the model to make an inference.
With this explanation, you may believe Bayesian to be better than frequentist as with that approach, you can make an inference quickly. This, however, depends on your prior belief. A strong incorrect prior belief can be very hard to change.
Let us now explain the interpretation of probability in the two different ideologies.
Frequentist vs. Bayesian probability
Can you find the probability
- of heads if you perform identical tosses of an unbiased coin?
- of getting 4 if you identically roll a fair die?
- of getting a king if you identically choose a card from a deck?
We’ve all solved problems like these several times during our school days. This has led to the notion of probability being hard-wired in us as a point value.
However, did you notice that the term ‘identical’ appears in all the above probability statements? This word is the key that challenges the fixed point notion of probability. It means that if you are performing the experiments and keeping all parameters (i.e., external forces acting on all experiments) fixed, then you’ll get a deterministic point estimate of probability if the experiment is done an infinite number of times. This is how a frequentist probability is defined –
However, in the real world, true identical experiments are impossible to perform. Thus, parameters can’t be kept fixed, and for a fixed set of data, you will obtain different probabilities. Hence, probability here would be –
Here, possibilities mean different parameters that are kept under consideration with all of their possible values, and the numerator is the total number of times an event has occurred in all those possibilities. So, the probability in Bayesian doesn’t represent the long-run frequency (or a point value), but it represents the uncertainties and these uncertainties are the initial conditions of parameters that have resulted in the observation. Hence, this probability in the Bayesian world would have multiple values in which all values are relatively likely/unlikely concerning the parameters. This representation is known as a probability distribution.
Frequentist vs. Bayesian in A/B testing
We now know that a Frequentist treats probability as a point estimate while a Bayesian is about representing probabilities as distribution. Let us understand the difference between the two approaches by using an example of A/B testing.
Essentially, as the name suggests, the basic idea of an A/B test is to determine which out of two variations A and B is better in terms of a particular metric. Suppose you are interested in finding which of the two has a higher conversion rate. Let us understand this with an example –
Say you have recently launched a new blogging website and you want to increase the number of subscribers to it. You have two layouts in your mind, but you are unsure which design is better. To answer that empirically with customer response data instead of your gut feeling, you’ll need to run an A/B test where some visitors will randomly see website A while others will see website B. After running this experiment for a while, suppose you obtained the following results –
|Hits||Conversions (No. of subscribers)|
Frequentist A/B testing
As we have learned in an earlier section, frequentists work with point estimates. A frequentist estimator will simply
- Compute the conversion rate of the two variations.
|Conversion Rate (%)|
- Take the difference between the conversion rates
Variation B – Variation A : 6.7 – 5.5 = 1.2%
- Compare it to a certain threshold. If the threshold is satisfied, then you have a winner.
This is the crux of a frequentist-based A/B test where only available data is used to make an inference. If you are interested in advanced calculations, you can play with the actual calculator.
Frequentist approaches are quite useful due to their analytical nature making them computationally inexpensive. However, when data is not sufficient and the analytical assumptions do not hold, they can result in wrong conclusions.
Bayesian A/B testing
As explained in an earlier section, in the real world, identical experiments are impossible to perform. So, as a Bayesian, you will consider a spectrum of possible conversion rates as beliefs, and based on the observed data, you will update your belief of conversion rates in your conceived spectrum. Therefore, the initial question you’ll ask is
Which prior should I start with?
Although Bayesian provides the capability to incorporate prior (one’s knowledge) to the model, when it comes to a practical application, most often, the choice of the prior distribution is vague prior where all possibilities are equally likely. For a conversion rate, it could be as below:
Often, the choice is a non-informative prior instead of a strong prior as the latter can dominate the posterior (our updated beliefs about the conversion rate after observing the data). If you have a firm prior belief, you don’t need any data to tell you something new. That’s why a non-informative prior is a good choice to start with, and after that, as the experiment progresses. You can update that based on your knowledge. You can then treat your posterior distribution as new prior to the next experiment.
In the case of A/B testing on a conversion rate, updating the prior is relatively easy as you can obtain the exact posteriors when specific mathematical functions are chosen as prior. So, if you apply Bayes’ update equation on the obtained observation data from your website, where for both variations, you start with non-informative prior, you will obtain the following posteriors:
Once you get the posteriors, you can compute decision metrics to determine which variation is better. Now, if you compute the difference in conversion rates of the two distributions, you’ll get a delta distribution:
Notice how beautifully the above plot captures your belief about how much Variation B is better than Variation A. If you are interested in advanced calculations, you can read their computation approach in more detail.
Even though Bayesian provides a convenient and more intuitive framework for learning and decision making for A/B testing, its adoption is still not mainstream in the industry due to the following reasons:
- If specific priors are not chosen, then determining the resultant posterior is computationally expensive.
- A Bayesian can never be 100% sure about anything while humans have a preference for binary outcomes in order to make decisions.
- Modelling approaches can be mathematically involved.
In this chapter, we have covered some of the fundamental differences between Bayesian and Frequentist philosophies. However, there is still a lot more that distinguishes the two ideologies, so if this topic interests you, you can read more at: bayesian vs frequentist ab testing.
After reading this article, if you have a preference for a statistical model, that’s great! If you don’t, that’s even better as you don’t have to choose any side. Many experimentation platforms use some flavor of a traditional statistical model (Bayesian or Frequentist) with some heuristics.
More than the methodology, what matters is how well you can understand the results and make a decision on them. This understanding can be useful for building a data-driven approach for assessing the risk that an organization is willing to take, and what the predicted improvement in business outcomes could be.
However, if you ever find yourself in a heated discussion concerning the pros and cons of the two approaches, then this article can be helpful for you to get the hang of that debate.