No results for ""
  • Home
  • API docs


Experiment size and run time

Read time: 3 minutes
Last edited: May 01, 2024


This topic explains how to decide on the number of contexts to include and the run time for an experiment.

Choosing your sample size

The number of contexts included in an experiment is called the sample size. The larger the sample size for an experiment, the more confident you can be in its outcome. How big your sample size should be depends on how confident you want to be in the outcome and how large the credible intervals are for your metrics. Metrics with large credible intervals are sometimes called "noisy" metrics.

Sample size estimator

Experiments with two variations display a sample size estimator that gives an estimate of how much more traffic needs to encounter your experiment before reaching your chosen probability to be best. Experiments with more than two variations do not display a sample size estimator.

In this example, for a 90% probability of being best, 53,285 more request contexts should be in the experiment before you stop the iteration and roll out the winning variation to all contexts:

An experiment's sample size estimator results.
An experiment's sample size estimator results.

You can use the menu to see how many more contexts you need for a probability to be best of 80%, 90%, or 95%.

To be confident that the winning variation is the best out of the variations tested, wait until the sample size estimator indicates you have reached the needed number of contexts. Alternatively, if there is a low level of risk in rolling out the winning variation early, or if you don't anticipate a significant impact on your user base, you can end the experiment before you reach that number.

Traffic count

The traffic count table displays how many unique contexts have encountered each variation within the experiment:

An experiment's traffic count table showing each variation's total number of contexts.
An experiment's traffic count table showing each variation's total number of contexts.

Here's how the table counts unique contexts:

  • The table only counts contexts with the same context kind as the experiment's randomization unit. For example, if your experiment's randomization unit is user, then the table counts only user contexts. The table won't include any device or organization contexts that are in the experiment. To learn more, read Randomization units.
  • If the same context is in the experiment multiple times, the table counts the context only once.

Determining how long to run an experiment for

You may not always know how long to run an experiment for. To help decide, you should consider:

  • The current probability of the winning variation being the best and how long it would take to improve your confidence, and
  • the level of risk involved in rolling out the winning variation to all contexts.

We recommend running experiments for two to four weeks. If you’re time constrained, then run them as long as you can. In some cases, you only may be able to run an experiment for a few days, such as if your marketing campaign is only 48 hours long. It’s still valuable to run short experiments, because making decisions using some data is better than making decisions using no data.