Winning variations

Read time: 7 minutes

Last edited: Apr 15, 2024

Overview

This topic explains how to choose a winning variation for a completed experiment.

Choosing the winning variation

LaunchDarkly uses Bayesian statistics in its Experimentation model. The winning variation for an experiment is the variation that is most likely to be the best option out of all of the variations tested. To learn more, read Decision making with Bayesian statistics.

In addition to the variation that had the biggest effect on the final metric in the funnel metric group, funnel optimization experiments also display the winning variation for each step in the funnel. The winning variation is highlighted at the top of the results section.

In this example, the "Enabled" variation has the highest probability to be best:

A funnel optimization experiment's results tab.

Shipping winning variations

If enough contexts have encountered your experiment to determine a winning variation, you can stop the experiment and ship the winning variation to all of your contexts.

To ship a winning variation:

Navigate to the Experiments list.
Click on the name of the experiment you want to ship a variation for.
Click on the Results tab.
Scroll down to the probability chart. If enough contexts have encountered the experiment to determine a winning variation, a winning variation banner is visible.
Click Ship it. A "Ship the leading variation" dialog appears.
Click Ship it.

The experiment iteration stops. All contexts are now receiving the winning variation of your experiment. LaunchDarkly retains all of the data collected from stopped iterations.

LaunchDarkly provides additional statistics for further information, but you do not need to use these statistics to make a decision about the winning variation. To learn more, read The statistics details tab.

The probability report tab

The probability report tab includes:

probability charts for each metric, and
each variation's probability to be best, and
the conversion rate, conversion rate (sum), or posterior mean, depending on the metric type.

Probability charts

An experiment's probability chart provides a visual representation of the performance of each variation tested in the experiment:

Each experiment's probability chart is unique, and how you interpret the results depends on what metric you're measuring and the hypothesis of your experiment. The following sections provide general information about the x-axis and the y-axis to help you interpret experiment results.

To hide any of the variations from the probability chart, uncheck the box next to the variation's name:

The variation checkboxes on a probability chart.

Expand information about the x-axis

The horizontal x-axis displays the unit of the primary metric included in the experiment. For example, if the metric is measuring revenue, the unit might be dollars, or if the metric is measuring website latency, the unit might be milliseconds.

If the unit you're measuring on the x-axis is something you want to increase, such as revenue, account sign ups, and so on, then the farther to the right the curve is, the better. The variation with the curve farthest to the right means the unit the metric is measuring is highest for that variation.

If the unit you're measuring on the x-axis is something you want to decrease, such as website latency, then the farther to the left the curve is, the better. The variation with the curve farthest to the left means the unit the metric is measuring is lowest for that variation.

How wide a curve is on the x-axis determines the credible interval. Narrower curves mean the results of the variation fall within a smaller range of values, so you can be more confident in the likely results of that variation's performance.

In the example below, the green variation has a more precise credible interval than the purple variation:

An example experiment probability chart.

To learn more, read Credible interval.

Expand information about the y-axis

The vertical y-axis measures probability. You can determine how probable it is that the metric will equal the number on the x-axis by how high the curve is.

In the example above, the green variation has a high probability that the metric will measure 0.4 for any given context. In other words, if someone encounters the green variation, there's a high probability that the metric will measure 0.4 for that person.

Probability to be best

Probability to be best is the likelihood that a variation had the biggest effect on a particular metric.

The variation with the highest probability to be best is highlighted above the probability report:

A funnel optimization experiment's winning variation.

In funnel optimization experiments, the probability report tab provides each variation's probability to be best for each step in the funnel, but the final metric in the funnel is the metric you should use to decide the winning variation for the experiment as a whole.

LaunchDarkly includes all end users that reach the last step in a funnel in the experiment's winning variation calculations, even if an end user skipped some steps in the funnel. For example, if your funnel metric group has four steps, and an end user takes step 1, skips step 2, then takes steps 3 and 4, the experiment still considers the end user to have completed the funnel and includes them in the calculations for the winning variation.

Conversion rate

The conversion rate is the percentage of end users in the experiment who completed the action the metric is tracking, such as clicking on a button or entering information into a form.

The conversion rate displays only for metrics using the "average" unit aggregation method. To learn more, read Unit aggregation method.

For funnel optimization experiments, the conversion rate includes all end users who completed the step, even if they didn't complete a previous step in the funnel. LaunchDarkly calculates the conversion rate for each step in the funnel by dividing the number of end users who completed that step by the total number of end users who started the funnel. LaunchDarkly considers all end users in the experiment for whom the SDK has sent a flag evaluation event as having started the funnel.

Conversion rate (sum)

The conversion rate (sum) is the average conversions per context that encountered the experiment.

The conversion rate (sum) displays only for metrics using the "sum" unit aggregation method. To learn more, read Unit aggregation method.

Posterior mean

The posterior mean is the variation's average numeric value that you should expect in this experiment, based on the data collected so far.

The posterior mean displays only for numeric metrics.

All of the data in the results table are based on a posterior distribution. To learn more about posterior distributions, read Frequentist and Bayesian modeling. LaunchDarkly automatically performs checks on the results data, to make sure that actual context traffic matches the allocation you set. To learn more, read Understanding sample ratios.

You can also use the REST API: Get experiment results

To learn more about troubleshooting if your experiment hasn't received any metric events, read Experimentation Results page status: "This metric has never received an event for this iteration".

The statistics details tab

This section includes advanced concepts

This section includes an explanation of advanced statistical concepts. We provide them for informational purposes, but you do not need to understand these concepts in order to use Experimentation.

Expand The statistics details tab

Experiments display more information about the experiment's results on the Statistics details tab, including the relative difference from the control and the credible interval.

A funnel optimization experiment's statistics details tab.

Relative difference

To view the relative difference between variations, choose a variation from the Relative difference from menu. The relative difference displays in the table for each variation.

Relative difference is the difference between the mean of the chosen variation, and the upper and lower bounds of the credible interval of the variation in the table. This range contains 90% of the variation's probable values. For example, imagine you have a chosen variation with a mean of 1%, and the variation in the table has a lower credible interval of 1.1% and an upper credible interval of 1.5%. The difference between 1 and 1.1 is 10%, and the difference between 1 and 1.5 is 50%, so the treatment's relative difference from control is 10% to 50%.

The longer you run an experiment, the more the width of this interval decreases. This is because the more data you gather, the more confidence you can have because the range of plausible values is smaller.

Credible interval

The credible interval is the range that contains 90% of the metric's probable values for the variation.

This means the effect of the variation on the metric you're measuring has a 90% probability of falling between these two numbers. The longer you run an experiment, the more the width of this interval decreases. This is because the more data you gather, the more confidence you can have because the range of plausible values is smaller.

If the metric's aggregation method is by sum, the credible interval will be expressed in values rather than percentages. To learn more, read Unit aggregation method.

The Credible interval history tab

The Credible interval history tab displays the metric's results over time for each variation.

The solid line represents the point estimate of the metric, which is the estimated mean of the metric values for the variation. The larger shaded area represents the range that contains 90% of the metric's probable values for the variation, called the credible interval.

The longer the experiment runs, the narrower and more precise the credible interval should become. As the credible interval narrows, you can have more confidence in the results of the experiment.

A metric's results over time for each variation in an experiment.