Interpreting experiment data
Read time: 3 minutes
Last edited: Mar 16, 2020
This topic explains how to read an experiment's card in the flag's Experiments tab and apply its findings to your product.
When your experiments are running, you can view information about them on the Experiments tab for the flags connected to them. The Experiments tab displays all the experiments a flag is participating in, including both experiments that are currently recording and experiments that are paused.
An experiment's card shows details about what data the experiment is collecting and what that data means. You can see information about the experiment, like its name and experiment type, in the header line.
Here are some things you can do from an experiment's card:
- Pause the experiment or resume recording. To learn more, read The experiment lifecycle.
- Edit metrics connected to the experiment. Edit experiment metrics to start new experiments using this flag.
- Reset the experiment data. This leaves the metric connected to the flag, but deletes all the data the experiment has collected so far. After the historical data is deleted, the experiment begins collecting new data.
- Delete the experiment.
- Visualize experiment data over a set period of time. Click the date menu to change the time period of the data you see.
- Visualize your experiment's data over time in a chart. Click Show to change views.
The data an experiment has collected is represented in a set of columns in the experiment's card.
Experiment data differs based on whether you're running a numeric or conversion (clicks, pageviews, or custom) experiment.
Not all columns appear for each experiment. Some are unique to numeric or conversion experiments.
The list below explains what each column means:
- Variation: Which flag variation's data is shown in the row
- Conversions / unique visitors: How many unique visitors take action based on the variation they see, relative to the total number of users who views the page. For click experiments, this means the denominator is the number of views matching the URL specified in the click metric UI.
- Conversion rate: The percentage of users who take action based on the variation they see, relative to the total number of unique visitors who encounter the flag variation.
- Total evaluations: The number of times this flag variation has been evaluated.
- Average: The average numeric value this flag variation returns.
- Confidence interval: A range of values between which the actual value for the conversion rate likely falls. A smaller confidence interval equates a more confident prediction. For example, a confidence interval of 11%-13% is more reliable than a confidence interval of 10%-30%.
- Change: The difference, in positive or negative values, the flag variation has from the baseline variation. Baseline flag variations say Baseline rather than showing a value.
- P-Value: The likelihood of a variation to impact the users who see it. LaunchDarkly calculates this with a P-value of .05. This means that for every experiment you run with a statistical significance of 95%, there is a 5% chance that there is actually no difference between the comparisons. The smaller this number gets, the lower the probability our significant outcome is random.
After your experiment has registered enough events to achieve statistical significance compared to the baseline flag variation, LaunchDarkly declares a winning variation. A winning variation appears when that variation reaches 95% statistical significance.
If you run an experiment over an extended period of time, the winning variation may change. Review your experiment data regularly to make sure you're making the most informed choices about your business and product.
You can view the statistical significance of your metrics, and the winning versions, in the Statistical significance column. The baseline variation does not show statistical significance.
Your winning variation is the flag variation that has the most positive impact compared to the baseline. If you're doing multivariate testing, which means comparing multiple different flag variations to the baseline simultaneously, you may have multiple winning variations. If you do, you can choose which variation best meets your needs and roll that flag out to your entire user base.
After you determine a winning variation, you can roll the winning variation out to 100% of your users from the flag's targeting page. To learn more, read Percentage rollouts.
If you're done with an experiment and have rolled the winning variation to your user base, it might be a good time to pause your experiment. Experiments on a userbase that only sees one flag variation do not return useful results. Pausing an experiment retains all the data collected so far. To learn more, read Managing experiments.
If you're using Data Export, you can find experiment data in your data export destinations to further analyze it using third-party tools of your own.
To learn more, read Data Export.