Winning variations for funnel optimization experiments
Read time: 6 minutes
Last edited: Nov 10, 2023
This topic explains how to choose a winning variation for a completed funnel optimization experiment.
LaunchDarkly uses Bayesian statistics in its Experimentation model. The winning variation for a feature change experiment is the variation that is most likely to be the best option out of all of the variations tested. To learn more, read Decision making with Bayesian statistics.
The winning variation for a completed funnel optimization experiment is the variation that had the biggest effect on the final metric in a funnel metric group. The winning variation is highlighted at the top of the results section.
In this example, the "Enabled" variation has the highest probability to be best:
If enough contexts have encountered your experiment to determine a winning variation, you can stop the experiment and ship the winning variation to all of your contexts.
To ship a winning variation:
- Navigate to the Experiments list.
- Click on the name of the experiment you want to ship a variation for.
- Click on the Results tab.
- Scroll down to the probability chart. If enough contexts have encountered the experiment to determine a winning variation, a winning variation banner is visible.
- Click Ship it. A "Ship the leading variation" dialog appears.
- Click Ship it.
The experiment iteration stops. All contexts are now receiving the winning variation of your experiment. LaunchDarkly retains all of the data collected from stopped iterations.
LaunchDarkly provides additional statistics for further information, but you do not need to use these statistics to make a decision about the winning variation. To learn more, read The statistics details tab.
For each step in the funnel, the probability report tab includes information on each variation's probability to be best, conversion rate, and metric charts.
Probability to be best in a funnel optimization experiment is the likelihood that a variation had the biggest effect on a particular metric within a funnel metric group. The probability report tab provides each variation's probability to be best for each step in the funnel, but the final metric in the funnel is the metric you should use to decide the winning variation for the experiment as a whole.
The variation with the highest probability to be best for the final step is highlighted as the winning variation for the experiment as a whole above the probability report:
LaunchDarkly includes all end users that reach the last step in a funnel in the experiment's winning variation calculations, even if an end user skipped some steps in the funnel. For example, if your funnel metric group has four steps, and an end user takes step 1, skips step 2, then takes steps 3 and 4, the experiment still considers the end user to have completed the funnel and includes them in the calculations for the winning variation.
The conversion rate is the percentage of end users in the experiment who completed the action the metric is tracking, such as clicking on a button or entering information into a form. The conversion rate includes all end users who completed the step, even if they didn't complete a previous step in the funnel.
LaunchDarkly calculates the conversion rate for each step in the funnel by dividing the number of end users who completed that step by the total number of end users who started the funnel. LaunchDarkly considers all end users in the experiment for whom the SDK has sent a flag evaluation event as having started the funnel.
A funnel experiment's metric charts provide a visual representation of the performance of each variation tested in the experiment:
Each experiment's primary metric chart is unique, and how you interpret the results depends on what metric you're measuring and the hypothesis of your experiment. The following sections provide general information about the x-axis and the y-axis to help you interpret experiment results.
Expand information about the x-axis
The horizontal x-axis displays the unit of the primary metric included in the experiment. For example, if the metric is measuring revenue, the unit might be dollars, or if the metric is measuring website latency, the unit might be milliseconds.
If the unit you're measuring on the x-axis is something you want to increase, such as revenue, account sign ups, and so on, then the farther to the right the curve is, the better. The variation with the curve farthest to the right means the unit the metric is measuring is highest for that variation.
If the unit you're measuring on the x-axis is something you want to decrease, such as website latency, then the farther to the left the curve is, the better. The variation with the curve farthest to the left means the unit the metric is measuring is lowest for that variation.
How wide a curve is on the x-axis determines the credible interval. Narrower curves mean the results of the variation fall within a smaller range of values, so you can be more confident in the likely results of that variation's performance.
In the example below, the green variation has a more precise credible interval than the purple variation:
To learn more, read Credible interval.
Expand information about the y-axis
The vertical y-axis measures probability. You can determine how probable it is that the metric will equal the number on the x-axis by how high the curve is.
In the example above, the green variation has a high probability that the metric will measure 0.4 for any given context. In other words, if someone encounters the green variation, there's a high probability that the metric will measure 0.4 for that person.
This section includes an explanation of advanced statistical concepts. We provide them for informational purposes, but you do not need to understand these concepts in order to use Experimentation.
Expand The statistics details tab
Funnel optimization experiments display more information about the experiment's results on the Statistics details tab, including the relative difference from the control and the credible interval.
The relative difference from the control variation is the difference between the mean of the control variation, and the upper and lower bounds of the credible interval of the variation you're testing. This range contains 90% of the variation's probable values. For example, imagine you have a control variation with a mean of 1%, and the variation you're testing has a lower credible interval of 1.1% and an upper credible interval of 1.5%. The difference between 1 and 1.1 is 10%, and the difference between 1 and 1.5 is 50%, so the treatment's relative difference from control is 10% to 50%.
The longer you run an experiment, the more the width of this interval decreases. This is because the more data you gather, the more confidence you can have because the range of plausible values is smaller.
The credible interval is the range that contains 90% of probable values.
This means the effect of the variation on the metric you're measuring has a 90% probability of falling between these two numbers. The longer you run an experiment, the more the width of this interval decreases. This is because the more data you gather, the more confidence you can have because the range of plausible values is smaller.