• Home
  • Integrations
  • SDKs
  • Guides
  • API docs
    No results for ""
    EXPAND ALL

    EDIT ON GITHUB

    Reading experiment results

    Read time: 5 minutes
    Last edited: Jan 10, 2023

    Overview

    This topic explains how to read and understand an experiment's results graph and table.

    An experiment's results graph and table provides information about each variation's performance in the experiment, how it compares to the other variations, and which variations are likely to be best out of all the tested options. Understanding how to read them can help you make informed decisions about when to edit, stop, or choose a winning variation for your experiments.

    Experiment result graphs

    The data an experiment has collected is represented in a results graph and table:

    An experiment's results graph and table.
    An experiment's results graph and table.

    To hide any of the variations from the results graph, uncheck the box next to the variation's name:

    An experiment's results graph with two variations unchecked.
    An experiment's results graph with two variations unchecked.

    Each Experimentation results chart is unique, and how you interpret the results depends on what metric you're measuring and the hypothesis of your experiment. However, we've provided some general guidelines for interpreting experiment results.

    You can also use the REST API: Get experiment results

    The x-axis

    The horizontal x-axis displays the unit of the primary metric included in the experiment. For example, if the metric is measuring revenue, the unit might be dollars, or if the metric is measuring website latency, the unit might be milliseconds.

    If the unit you're measuring on the x-axis is something you want to increase, such as revenue, account sign ups, and so on, then the farther to the right the curve is, the better. The variation with the curve farthest to the right means the unit the metric is measuring is highest for that variation.

    If the unit you're measuring on the x-axis is something you want to decrease, such as website latency, then the farther to the left the curve is, the better. The variation with the curve farthest to the left means the unit the metric measuring is lowest for that variation.

    How wide a curve is on the x-axis determines the credible interval. Narrower curves mean the results of the variation fall within a smaller range of values, so you can be more confident in the likely results of that variation's performance.

    In the example below, the green variation has a more precise credible interval than the purple variation:

    An example experiment results graph.
    An example experiment results graph.

    To learn more, read Credible interval.

    The y-axis

    The vertical y-axis measures probability. You can determine how probable it is that the metric will equal the number on the x-axis by how high the curve is.

    In the example above, the green variation has a high probability that the metric will measure 0.4 for any given end user. In other words, if an end user encounters the green variation, there's a high probability that the metric will measure 0.4 for that user.

    Experiment results tables

    Each experiment has a results table with a row for each variation tested and a column for the variation's probability to be best, relative difference from the control, credible interval, and mean. This section explains how to interpret each of these columns in your results table.

    An experiment's results table.
    An experiment's results table.

    Imagine you want to test offering a discount for your online store, to see if more customers complete their purchase when you offer a discount. You run an experiment using a click conversion metric when you offer various discounts. For your control variation, you offer no discount. The sections below explain what each column means.

    Probability to be best

    The probability to be best is the likelihood that a variation had the biggest effect on the primary metric. Of all the variations in the experiment, the one with highest probability is the best option to choose as a winner. In this example, the "15% off" variation has the highest probability to be best.

    Relative difference from [the control]

    The relative difference from the control variation is the range that contains 90% of the variation's probable values. It is written as a percentage of the values from the control variation.

    This means the effect of the variation on the metric you're measuring, compared to the control variation, has a 90% probability of falling within this range. In the above example, the "15% off" variation has a relative difference of [0.61%, 101.96%]. You can have 90% confidence that the effect of the variation on the metric, for any given user who encounters it, will be between 0.61% and 101.96% higher than the control variation.

    To summarize this column, you can have 90% confidence that offering a 15% discount in your store will result in users checking out between about 1% to about 102% more often than those who receive no discount.

    The longer you run an experiment, the more the width of this interval decreases. This is because the more data you gather, the more confidence you can have because the range of plausible values is smaller.

    Credible interval

    The credible interval is the range that contains 90% of probable values.

    This means the effect of the variation on the metric you're measuring has a 90% probability of falling between these two numbers. In the above example, the "15% off" variation has a credible interval of [43.12%, 86.55%]. This means you can have 90% confidence that the probable values of the metric, for any given user who encounters it, will be between 43.12% and 86.55%.

    To summarize this column, you can have 90% confidence that offering a 15% discount in your store will result in users checking out between about 43% to 87% of the time.

    If you view metric event values per unique user as a sum instead of an average, the credible interval will be expressed in values rather than percentages. To learn more, read Viewing metric event values.

    The longer you run an experiment, the more the width of this interval decreases. This is because the more data you gather, the more confidence you can have because the range of plausible values is smaller.

    Mean

    The mean is the average value of the posterior distribution of the variation. If you were to continue collecting data, the mean is the future average numeric value for the variation that you should expect. It doesn’t capture the uncertainty in the measurement, so it should not be the only measurement you use to make decisions.

    In the above example, if you continued to run the experiment, you could expect that offering a 15% discount in your store will result in users checking out about 71% of the time.

    All of the data in the results table are based on a posterior distribution. To learn more about posterior distributions, read Frequentist and Bayesian modeling. LaunchDarkly automatically performs checks on the results data, to make sure that actual user traffic matches the allocation you set. To learn more, read Understanding sample ratios.

    Viewing metric event values

    In each metrics results table you can view metric event values per unique user as an average or a sum, depending on your needs.

    Here's the View metric event values per unique user as option in a metric results table:

    A metric results table with "View metric event values per unique user as" called out.
    A metric results table with "View metric event values per unique user as" called out.

    Imagine you are tracking purchases made by users using a new checkout flow. A user makes three purchases for $15 each. If you view metric event values as an average, the average of their purchases is $15. If you view metric event values as a sum, the sum of their purchases is $45. If your goal is to increase revenue per user, viewing metric event values as a sum makes the most sense.

    Alternatively, imagine you want to decrease the time to checkout per transaction. If a user takes two minutes for their first purchase, four minutes for their second purchase, and three minutes for their third purchase, their average checkout time is three minutes, but their total checkout time is nine minutes. In this case, viewing metric event values as an average makes the most sense.

    Choosing a winning variation

    The winning variation for a completed experiment is the variation that is most likely to be the best option out of all of the variations tested. To learn more, read Winning variations.