Bayesian A/B Testing

Imagine you want to know if adding images to your WhatsApp marketing messages will make more people buy your products. A traditional A/B test would tell you something like: "There's a small chance this result happened by luck, so let's trust it." But it doesn't tell you, for example, what the real chance is that the new message format is better — and that can be confusing.

Bayesian A/B testing works in a more natural way. It takes the test data and gives you answers like: "With the information we have, there's an 85% chance that messages with images perform better." This is much easier to understand and use for making decisions. Additionally, it also shows you the risk of making the wrong decision, and you can update the results as new data arrives — without having to wait until the end of the test.

In summary, Bayesian testing thinks in a way that's more similar to ours: it calculates chances and helps you decide with more clarity.

Ready to Start Testing?

Stop guessing and start making data-driven decisions with confidence. Our Bayesian A/B Testing toolkit gives you the power to understand not just what works, but how confident you can be in your results.

Access the Notebook Now

A Major Advantage

In Bayesian A/B testing, you don't need to define a fixed sample size before starting. Instead, you can define a decision criterion, such as:

"I want to have at least 95% chance that messages with images perform better."

(this is called probability of victory), or

"I'll only make a decision if the risk of choosing the wrong message format is less than 2%."

(this is called risk-based decision).

From there, you can collect data and update the results as they arrive. When your criterion is met — for example, the message format with images has a 95% chance of being better — you can stop the test and decide.

So yes, you can run until you reach the desired confidence or risk level, without needing to fix a sample size beforehand. This avoids wasting time and users, and allows for faster decisions when the data is clear from early on.

Usage Example

You have a WhatsApp number that you use to sell your products. An important part of your business is sending messages to customers (who asked to receive them) with current offers. So you call your team to discuss what the message will be like. One person says to make it text-only, that adding an image will only create work and won't make a difference. Meanwhile, another team member says that the image is essential because it stimulates the consumer to buy. Who's right?

You decide to do an A/B test to find out and will create two message models: one with and one without images. You'll send them to different customers each week and note how many people viewed each message, and how many people clicked the buy link.

In a "traditional" A/B test, you would have to define a minimum number of people who would need to view the message and run the test until you have that sample. But unlike traditional tests, you didn't define a fixed number of people to wait for. You were clear with your criterion:

"I'll only end this test if messages with images have more than 90% chance of being better, and less than 1% chance of being worse. That's what will make me feel secure."

The days pass. You keep track of the results. At some moments, the messages with images seem promising. At others, they seem almost the same as the text-only ones. But you stay firm with the criterion, without anxiety to decide too quickly — nor fear of acting too early.

Until one morning, you open the panel and see: "Chance of messages with images being better: 95%. Expected loss risk: 0.8%."

You smile. The decision is clear. Not only did you find the message format that has the greatest chance of being better... you know how confident you can be in this (95%), and if you're wrong, the expected average loss is only 0.8%.

I know — at first glance it might seem strange something like: "Why would I choose an A/B test that, instead of telling me for sure which message format is the best, only tells me which is probably the best?" And the answer is simple: no test can give you certainty. That's just an illusion.

The difference is that the Bayesian test assumes this uncertainty in an honest and useful way: it shows you the real chances of being right, the risk of being wrong — and the size of the loss if that happens.

How to Use

For this, I created a Python notebook. If you don't know what a Python notebook is, I advise you to learn. This goes beyond the scope of this project, and YouTube is full of tutorials much better than I would be able to create.

But if you already know, what I can say is the following: you can use this code directly on your computer, or upload the notebook to a platform like Google Colab and start using this tool in your tests, today!

Get Started with the Notebook