Decision making with Bayesian statistics

Read time: 2 minutes

Last edited: May 01, 2024

Overview

This topic explains how to make decisions about which variation to choose as the winner in a LaunchDarkly experiment.

In cases where you have many metrics to consider, it can be difficult to come up with a consistent decision-making strategy. This is why we recommended that you choose a single, primary metric or evaluation criterion for making decisions before you start the experiment. "Probability to be best" evaluates the primary metric and is the statistic you should use to decide which variation is the winner. The other statistics can help you further understand the difference between variations.

Example: Search engine optimization

Here is an example of how to make decisions using Bayesian statistics. Imagine you have to choose between three equivalent search algorithms for your website. Now imagine someone gives you three probabilities, one for each search algorithm. The probability percentage represents the chance that the search algorithm will return the best results.

This table displays each search engine's probability of being best:

Variation	Probability to be best
Search engine 1	23%
Search engine 2	31%
Search engine 3	46%

The most logical choice of algorithm to use is Search engine 3. This is considered an "optimal strategy" in decision making.

However, this decision making strategy includes the assumption that there are no costs to switching between options. In online experimentation, this assumption is usually true, but may not be for your particular experiment.

Switching between variations in a feature flag is a small configuration change, but there might be hidden costs associated with switching. For example, you may be testing out a new search algorithm for your website that uses more expensive hardware than your current solution. In that case, you should also consider the hardware costs, in addition to the probability to be best, when choosing the winner.

Another example where there may be additional costs is if you are testing a feature that isn’t fully developed. If the new feature performs only marginally better in the experiment than the variation you currently use, then you may not want to continue developing it.

Decision making when the best option changes

You may find in some experiments that the variation most likely to be best changes from day to day. For example, on Monday variation one is the winner, and on Tuesday variation two is the winner. This typically happens when there is no real difference between the variations, so the results change slightly day to day depending on the end users encountering the experiment.