Interpreting experiment data
Read time: 4 minutes
Last edited: Oct 22, 2021
This topic explains how to read an experiment's card in the flag's Experimentation tab and apply its findings to your product.
When your experiments are running, you can view information about them on the Experimentation tab for the flags connected to them. The Experimentation tab displays all the experiments a flag is participating in, including both experiments that are currently recording and experiments that are stopped.
An experiment's card shows details about what data the experiment is collecting and what that data means. You can see information about the experiment, like its name and experiment type, in the header line.
Here is an image of an experiment's card:
Here are some things you can do from an experiment's card:
- Stop the experiment or start a new iteration. To learn more, read The experiment lifecycle.
- Edit metrics connected to the experiment. Edit experiment metrics to start new experiments using this flag.
- Remove the metric from the experiment:
- Visualize experiment data over a set period of time. Click the iteration dropdown to select an timeframe to see:
- Visualize your experiment's data over time in a chart. Click Show to change views:
The data an experiment has collected is represented in a set of columns in the experiment's card.
Here is an image of numeric experiment's data:
Experiment data differs based on whether you're running a numeric or conversion (clicks, pageviews, or custom) experiment.
Not all columns appear for each experiment. Some are unique to numeric or conversion experiments.
The list below explains what each column means:
- Variation: Which flag variation's data is shown in the row
- Conversions / unique users: How many unique users take action based on the variation they see, relative to the total number of users who views the page. For click experiments, this means the denominator is the number of views matching the URL specified in the click metric UI.
- Conversion rate: The percentage of users who take action based on the variation they see, relative to the total number of unique users who encounter the flag variation.
- Total events: The number of times the metric connected to this flag variation has been evaluated. The total events column only updates when the metric associated with the flag variation registers an event. This is not the total number of times the flag variation has ever been evaluated.
- Average: The average numeric value this flag variation returns.
- Confidence interval: A range of values between which the actual value for the conversion rate likely falls. A smaller confidence interval equates a more confident prediction. For example, a confidence interval of 11%-13% is more reliable than a confidence interval of 10%-30%.
- Change: The difference, in positive or negative values, the flag variation has from the baseline variation. Baseline flag variations say Baseline rather than showing a value.
- P-Value: The likelihood of a variation to impact the users who see it. LaunchDarkly calculates this with a P-value of .05. This means that for every experiment you run with a statistical significance of 95%, there is a 5% chance that there is actually no difference between the comparisons. The smaller this number gets, the lower the probability our significant outcome is random.
variation calls to correlate conversions and unique user impressions within a 24-hour window. If there are more than 24 hours between the time of a
variation call and a
track call on a user key, then LaunchDarkly does not register them in the experiment.
If you want to correlate conversions and unique user impressions that occur more than 24 hours apart, call
variation again so LaunchDarkly registers them as part of the experiment.
If a user sees multiple variations before a conversion, the conversion is attributed to the variation the user saw most recently.
After your experiment has registered enough events to achieve statistical significance compared to the baseline flag variation, LaunchDarkly declares a winning variation. A winning variation appears when that variation reaches 95% statistical significance.
If you run an experiment over an extended period of time, the winning variation may change. Review your experiment data regularly to make sure you're making the most informed choices about your business and product.
You can view the statistical significance of your metrics, and the winning versions, in the Statistical significance column. The baseline variation does not show statistical significance.
Here is an image of an experiment with a winning variation designated:
Your winning variation is the flag variation that has the most positive impact compared to the baseline. If you're doing multivariate testing, which means comparing multiple different flag variations to the baseline simultaneously, you may have multiple winning variations. If you do, you can choose which variation best meets your needs and roll that flag out to your entire user base.
After you determine a winning variation, you can roll the winning variation out to 100% of your users from the flag's targeting page. To learn more, read Percentage rollouts.
If you're done with an experiment and have rolled the winning variation to your user base, it might be a good time to stop your experiment. Experiments on a userbase that only sees one flag variation do not return useful results. Stopping an experiment retains all the data collected so far. To learn more, read Managing experiments.
If you're using Data Export, you can find experiment data in your Data Export destinations to further analyze it using third-party tools of your own.
To learn more, read Data Export.