Duration Estimation In An A/B Test

Duration Estimation In An A/B Test

While running an experiment, waiting for data is often the most challenging period as you are likely to get impatient. All you want during that period is for the A/B test to end as quickly as possible so you can go in a  full-scale execution mode. And, the anxiety adds up when you don’t know how long you need to wait for the test to reach statistical significance. 

The impatience is entirely understandable as you do not want to lose conversions on suboptimal variations. Nothing much can be done about that anxiety as the statistical test will end when it ends. But, if you can have an estimated waiting time for the A/B test to end, it could undoubtedly appease the anxiety to some extent. 

Let me explain how to estimate the duration of an A/B test:

Visitor Sample Size Calculator

In statistics, you can never say with 100% confidence that an A/B test will end after X number of days. Instead, you say there is an 80% (or a 95%, whatever you choose) probability of getting a statistically significant result if it exists after X number of days. 

There could be the case when there is no difference in the performance of the variations, and no matter how long you wait, you will never get a statistically significant result. Thus, it becomes essential to estimate the number of visitors required to conduct an A/B test for statistical significance before even running a test.

There are three pieces of information you would need to determine the number of visitors for the A/B test – 

  • A base Conversion Rate(CR) – value that you are expecting the campaign would get at the least.
  • Expected Uplift – What percentage difference in CR you want to detect on the base CR (lower the uplift you wish to test, the more visitors it will need) 
  • Number of variations to test (the more variations you test, the more traffic you need)

The following image was taken from Statistical Rules of Thumb, by Gerald van Belle. 

Duration Estimation In An A/B Test

The formula described above is known as Lehr’s equation, which is obtained by using frequentist statistics. 

  • Type I Error (α) is the probability of rejecting the null hypothesis when it is true (if α=0.05, then it means that out of 100 independent tests where variations are the same, 5 tests will say variations are statistically different) 
  • Type II Error (β) is the probability of not rejecting the null hypothesis when it is false (if β=0.20, then it means that out of 100 independent tests where variations are different, 20 tests will say variations are the same)
  • z is the Z-score value obtained from the Z-table. Visit this to know more about the Z-test.
  • \mu_0 - \mu_1 = CR * Uplift
  • σ is the standard deviation of the visitor’s Bernoulli distribution. Hence,

By putting the values in Lehr’s equation, you’ll get the number of visitors (n) needed to get statistically significant results between two variations.

If there are multiple variations, then multiplying (n) with the number of variations (V) will give the overall number of visitors needed (n*V).    

Divide the obtained result by the average number of daily visitors and you’ll get the number of days the A/B test is likely to take in order to find the best variation. You can use the calculator at ab-test-duration-calculator built upon the same formula.

Lehr’s Equation’s Mathematical Intuition 

Duration Estimation In An A/B Test

There are two key ingredients to sample size calculations: the difference between the two variations’ conversion rates, and the variability in their measurements.

Each distribution in the above image is a model that represents the differences of conversion rates between two variations where the x-axis is the absolute difference scale of conversion rates (Δ=y0−y1). 

One distribution’s center is at 0, and the other’s center is at δ(δ=CR*Uplift). The null hypothesis that there is no difference between the two variations is represented by the distribution on the left (Δ=0). The alternative hypothesis that there is some difference between the two variations is represented by the right curve (Δ=δ). Each distribution also has a variance  (σ2), which is usually assumed to be the same for both. 

The relationship between the standard error (SE), the absolute difference of conversion rates of the two variations, and the standard deviation of the distribution allows us to set up calculations for the sample size, n. 

Standard Deviation

By multiplying SE with an appropriate z score, we add the confidence level we want in our estimation.

The critical value is where the α region of the null curve and the β region of the alternative curve meet. This point is: 

  • distance away from the mean of the null curve, and
  •  distance away from the mean of the alternate curve. 

As the sum of the two distances is δ, just by rearranging the resultant linear equation, you can get the number of visitors needed to obtain a statistical difference between the two variations. The errors in our tests would be α and β. The lesser the value of α and β, the more will be the visitor estimate. Thus, the equation we get is:

\begin{aligned} n = \frac{2(z_{1-\frac{\alpha}{2}} + z_{1- \beta})^2}{\big(\frac{\delta}{\sigma}\big)^2}\end{aligned}

You can read this sample size chapter to get a more in-depth understanding of how this equation is derived.

Conclusion

By using the method described above, you can estimate the time duration of the test needed to check statistical significance in a frequentist A/B test. However, if you perform a test using bayesian statistics, you can read the maths behind the bayesian duration calculator in order to understand its implementation.

It is a common practice to perform sample size calculations before starting an experiment to avoid bias in results. If we include very few subjects in an experiment, the results cannot be generalized to the population as this sample will not represent the target population. On the other hand, if we study more subjects than required, we could waste resources. Adequate sample size calculation thus becomes crucial in any statistical experiment to arrive at scientifically valid results. 

  • Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Senior Data Scientist
    Leave a Comment
    Next Post

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    AI & Machine Learning
    The Impact of AI on App Development – Why Does It Progress at a Rapid Pace

    Image (source) Technological advancements have left a massive impact on nearly every aspect of society. So the idea of having an intelligent assistant with you at all times is not far from a dream come true. Since the turn of the century, mobile apps and user experiences have changed dramatically. Early apps offered very few

    7 MINUTES READ Continue Reading »
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »