No results for ""
EXPAND ALL
  • Home
  • API docs

GIVE DOCS FEEDBACK

Measuring Experimentation impact with holdout experiments

Read time: 7 minutes
Last edited: Apr 08, 2024

Overview

This guide explains how to create a holdout experiment to measure the overall effectiveness of your Experimentation program.

As you begin planning your Experimentation program, you may want to track how much of an impact your experiments have over time. Will there be any measurable differences in behavior between the end users you include in experiments, and those you do not? Which group of end users will spend more money, sign up for services, or affect other metrics at higher rates? Holdout groups can help you answer these questions.

A holdout group is a set of contexts that you exclude from all of your experiments. This creates a control group against which you can measure the impact of your Experimentation program. If, after a set period of time such as a month or quarter, there are no measurable differences between your Experimentation group and your holdout group, you may want to reconsider the number, scope, and design of the experiments you're running.

To form a holdout group, create a prerequisite flag that assigns all of your contexts to be either included or excluded from the holdout group, then add that flag as a prerequisite to all of your flags used in an experiment.

In this tutorial you will:

  • Create a prerequisite holdout flag
  • Add that flag as a prerequisite to all flags you use within an experiment
  • Create a holdout experiment that compares the holdout group to the Experimentation group

To learn more about LaunchDarkly's Experimentation offering, read Experimentation.

Prerequisites

To complete this tutorial, you must have the following prerequisites:

  • An active LaunchDarkly account with Experimentation enabled, and with permissions to create flags and edit experiments.
  • Familiarity with LaunchDarkly's Experimentation feature.
  • A basic understanding of your business's needs or key performance indicators (KPIs).

Concepts

To complete this guide, you should understand the following concepts:

Prerequisite flags

Prerequisites allow you to control feature dependencies in LaunchDarkly. You can configure feature flags to depend on another flag to take effect, making the other flag a prerequisite to enable the feature. To learn more, read Flag prerequisites.

Creating the holdout flag

To begin, create a prerequisite boolean flag that determines what percentage of your contexts to include in the holdout group. In this example, you will include 5% of your total user base in the holdout group.

To create the prerequisite flag:

  1. Navigate to the flags list.
  2. Click Create flag.
  3. Enter "Experimentation holdout group prerequisite" as the Name.
  4. Enter a Description of the flag such as "The prerequisite flag for all experiments."
  5. (Optional) Update the Maintainer for the flag.
  6. Select the "Boolean" Flag type.
  7. Enter "In holdout" in the Name field for the true variation.
  8. Enter "Not in holdout" in the Name field for the false variation.
The "Create new flag" dialog for a new holdout flag.
The "Create new flag" dialog for a new holdout flag.
  1. Click Create flag.

Next, add this flag as a prerequisite to every flag you use in an experiment.

Adding the flag as a prerequisite

When you create a new flag you plan to use in an experiment, or when you plan to use an existing flag in an experiment, you should add the "Experimentation holdout group prerequisite" flag as a prerequisite.

To add the prerequisite flag:

  1. Navigate to the flags list.
  2. Click on the name of the flag you plan to use in an experiment. The Targeting tab appears.
  3. Click + Add rule and select "Set prerequisites."
  4. In the Flag menu, choose "Experimentation holdout group prerequisite."
  5. In the variation menu, choose "Not in holdout."
  6. Click Review and save.
The "Prerequisites" section of the dependent flag with a prerequisite flag added.
The "Prerequisites" section of the dependent flag with a prerequisite flag added.

Repeat this procedure for every flag you use in an experiment.

Building a holdout experiment

After you have been running experiments for a few months or a quarter, you may want to evaluate your experimentation program by comparing the overall behavior of end users included in and excluded from experiments.

One way to measure the impact of your Experimentation program is by creating a relevant metric and running an experiment with it on the "Experimentation holdout group prerequisite" flag.

Creating the metric

First, decide what metric you want to measure. Choose a metric that aligns with the same KPIs or goals you want to experiment on, such as average revenue per customer or percentage of customers that sign up for your service. In this example, you will measure average revenue per customer.

To create your metric:

  1. Navigate to the Metrics list.
  2. Click Create metric. The "Create metric" panel appears.
  3. Enter "Average revenue per user" in the Name field.
  4. Choose Custom in the "Event information" section.
  5. Select Numeric.
  6. Enter the existing event key from your codebase in the Event key field. In this example, the event key is "Average revenue per customer."
Metric keys and event keys are different

LaunchDarkly automatically generates a metric key when you create a metric. You can use the metric key to identify the metric in API calls. To learn more, read Creating metrics.

Custom conversion/binary and custom numeric metrics also require an event key. You can set the event key to anything you want. Adding this event key to your codebase lets your SDK track actions customers take in your app as events. To learn more, read Sending custom events.

  1. Enter "USD" in the Unit of measure field.
  2. Choose "Higher than baseline" from the Success criteria menu.
  3. Choose "Average individual unit values" for the Unit aggregation method.
  4. Choose "Set the value for units with no events to zero" for Units without events.
  5. Choose user as the Randomization unit:
The "Event information" section of a custom numeric metric.
The "Event information" section of a custom numeric metric.

  1. Click Create metric.

Building the experiment

Next, build your holdout comparison experiment. In this example, you will build an experiment to run for the first quarter of the year. To learn more about the experiment builder, read Creating experiments.

To build your experiment:

  1. Navigate to the Experiments list.
  2. Click Create experiment.
  3. Enter "Holdout comparison for Q1 2023" in the Experiment name field.
  4. Enter a Hypothesis for your experiment.
  5. Choose "user" as the Randomization unit.
  6. Click Next. The "Select metrics" step opens.
  7. Choose "Average revenue per user" from the Primary metric menu:
The "Select metrics" section of a new experiment.
The "Select metrics" section of a new experiment.
  1. Click Next. The "Define variations" step opens.
  2. Choose the "Experimentation holdout group prerequisite" flag from the Select flag menu.
  3. Click Next. The "Set audience" step opens.
  4. Assign 5% of traffic to the true variation, and 95% of traffic to the false variation.
  5. Select false to serve to the remaining population.
The "Set audience" section of a new experiment.
The "Set audience" section of a new experiment.
  1. Click Finish. You are returned to the experiment's Design tab.

Next, begin an iteration of your experiment.

Starting the experiment iteration

When you are ready to run the experiment, toggle On the "Experimentation holdout group prerequisite" flag. Then, start an iteration of your experiment.

To start your experiment iteration:

  1. Navigate to the Experiments list.
  2. Click on "Holdout comparison for Q1 2023."
  3. Click Start.

You are now running a holdout experiment. When your quarter is over, you can stop the experiment iteration and analyze your experiment results. To learn more, read Analyzing experiments.

After you stop the experiment iteration, you can start a new iteration at any time. To learn more, read Starting experiment iterations.

Conclusion

In this guide you learned how to create a holdout experiment using a prerequisite flag to measure the overall impact of your Experimentation program. By assessing the impact of your experiments as a whole, you can fine-tune your audiences and the metrics you're measuring, and ensure you're getting the most value out of LaunchDarkly Experimentation.

Want to know more? Start a trial.

Your 14-day trial begins as soon as you sign up. Learn to use LaunchDarkly with the app's built-in tutorial. You'll discover how easy it is to manage the whole feature lifecycle from concept to launch to control.

Want to try it out? Start a trial.