An example of a retention experiment for a subscription SaaS product: a writing tool hypothesizes that users who receive a weekly progress summary email will retain at 12% higher 30-day rates than those who do not — the experiment runs for 6 weeks with a 50/50 holdout, produces a statistically significant 9% retention lift, and confirms the intervention is worth scaling while identifying that the email timing (Tuesday 10am) drives higher open rates than Wednesday.
Retention experiments in subscription SaaS are high-stakes because they operate on long measurement timelines (30–90 days to see the signal) and any given user is only at risk once per cohort. A poorly designed experiment wastes months of measurement time and a one-time opportunity to test the hypothesis.
This framework gives you the experiment design, the hypothesis template, and the interpretation guide for running retention experiments that produce learning, not just data.
The Retention Experiment Framework
The Hypothesis Template
Every retention experiment begins with a written hypothesis:
We believe that [intervention] will [direction] [primary retention metric]
for [user segment] by [expected magnitude]
because [underlying mechanism].
We will confirm this if [specific outcome] after [time period].
We will reject this if [opposite outcome].
Primary metric: [what we measure]
Counter-metrics: [what must not regress]
Minimum sample size: [calculated from expected effect size]
Duration: [time period to reach minimum sample]
Example hypothesis:
"We believe that a weekly progress summary email will increase 30-day retention for newly activated users (signed up in last 30 days) by 10–15% because progress visibility increases perceived value and creates a re-engagement trigger for users who haven't logged in. We will confirm this if the email cohort shows ≥8% higher 30-day retention than the control. We will reject this if the difference is <3%."
The Experiment Design
Experimental group (50%): Users receive the weekly progress email Control group (50%): Users receive no intervention email
Why 50/50 vs. smaller test group?
The minimum detectable effect for a 10% retention improvement at 95% confidence and 80% power requires a specific sample size per cohort. For most subscription SaaS products at growth stage, 50/50 provides the fastest path to statistical significance. A 10/90 split would take 5x longer to reach the same confidence.
H3: The Sample Size Calculation
Before starting any retention experiment, calculate the minimum sample size needed to detect your expected effect:
For a baseline retention rate of 35% and an expected effect size of 10% absolute improvement (to 38.5%):
- Required sample per cohort: ~1,400 users
- At 200 new activations per week: ~7 weeks to minimum sample
Do not analyze results before reaching minimum sample. This is the most common retention experiment failure — analyzing at week 3 because the results "look promising."
The Intervention Design
H3: What Makes a Good Retention Experiment Intervention
Good retention interventions:
- Target a specific mechanism (progress visibility, social proof, habit reinforcement)
- Are proportional to the expected effect size (a small email intervention won't produce a 30% retention lift)
- Are reproducible at scale (if it works, you can ship it)
- Have a clear causal story (why does this intervention cause users to stay?)
Example intervention tiers:
| Intervention | Complexity | Expected effect range | |-------------|-----------|----------------------| | Weekly progress summary email | Low | 5–15% retention lift | | Personalized recommendation in-app | Medium | 10–20% retention lift | | AI-powered check-in message | Medium | 8–18% retention lift | | Onboarding redesign | High | 15–30% retention lift | | Pricing change (annual vs. monthly) | High | 20–40% retention lift |
Start with low-complexity, high-confidence interventions. Reserve high-complexity interventions for when low-complexity options are exhausted.
Interpreting the Results
The Four Outcomes
Outcome 1 — Confirmed (expected lift achieved) Ship the intervention and plan scale-up. Investigate which user segments benefited most — the lift may be higher for specific cohorts, revealing a targeting opportunity.
Outcome 2 — Partial (some lift but below target) Investigate why. Was the intervention delivered as designed? Did all users in the experimental group receive the email? Is the effect concentrated in a specific segment? A 9% lift when the target was 15% may still be worth shipping at lower cost.
Outcome 3 — No effect Disconfirm the hypothesis. Document what was tested and what was learned. The causal story was wrong — update your retention model accordingly.
Outcome 4 — Negative effect The intervention hurt retention. Investigate urgently: did the email trigger unsubscribes? Did the progress summary expose negative progress that demotivated users? Document and do not scale.
H3: The Retention Learning Document
For each retention experiment, regardless of outcome, write a one-page learning document:
Hypothesis: [what we tested]
Intervention: [what we did]
Result: [outcome with confidence level]
What we learned: [mechanism that explains the result]
What we should test next: [next hypothesis that follows from this result]
What we now know to be false: [hypothesis we can stop pursuing]
According to Lenny Rachitsky's writing on retention experiments, the most valuable outcome of a retention experiment is often the "what we now know to be false" section. "Teams that only run experiments to confirm their beliefs make slower progress than teams that treat disconfirmed experiments as the most valuable learning. Every false hypothesis ruled out narrows the search space for what actually drives retention."
The Retention Experiment Calendar
Running one retention experiment at a time prevents confounding — if two experiments run simultaneously, you cannot attribute a retention change to either.
A healthy retention experiment calendar:
- One active experiment at a time
- 6–8 weeks per experiment (including run-up to minimum sample)
- Experiment review and planning session between each
- Maximum of 6 retention experiments per year for a growth-stage SaaS
According to Shreyas Doshi on Lenny's Podcast, the biggest retention experimentation mistake he sees is running too many experiments simultaneously. "You end up with correlation, not causation. The whole point of a controlled experiment is to isolate the variable. If three things changed at once, you learned nothing."
FAQ
Q: What is an example of a retention experiment for a subscription SaaS product? A: A writing tool tests whether weekly progress summary emails increase 30-day retention for newly activated users by running a 50/50 holdout for 6 weeks, confirming a statistically significant 9% retention lift.
Q: How do you design a retention experiment for a SaaS product? A: Write a hypothesis with a specific user segment, intervention, expected effect size, and causal mechanism. Calculate the minimum sample size, run a 50/50 holdout for the required duration, and analyze only after reaching minimum sample.
Q: How long should a retention experiment run? A: Long enough to reach the minimum sample size calculated from your expected effect size and baseline retention rate. Most SaaS retention experiments require 6–12 weeks to reach statistical significance at 95% confidence and 80% power.
Q: What should you do when a retention experiment shows no effect? A: Document what was tested and what the null result tells you about the causal mechanism. The causal story was wrong — update your retention model and test the next hypothesis in the search space.
Q: How many retention experiments should a SaaS team run per year? A: Maximum 6 for a growth-stage SaaS team running one at a time. More experiments simultaneously create confounding. Fewer than 4 means insufficient learning velocity about retention drivers.
HowTo: Run a Retention Experiment for a Subscription SaaS Product
- Write a formal hypothesis specifying the user segment, intervention, expected effect magnitude, causal mechanism, confirmation threshold, and rejection threshold
- Calculate the minimum sample size needed to detect your expected effect at 95 percent confidence and 80 percent power before starting the experiment
- Design a 50/50 holdout split and run the experiment for the full duration needed to reach minimum sample — do not analyze results early even if they look promising
- Track the primary retention metric plus counter-metrics (unsubscribe rate, support contacts, feature usage) throughout the experiment without intervening
- Interpret results using the four outcomes framework: confirmed, partial, no effect, or negative effect — each has different next steps
- Write a one-page learning document regardless of outcome covering what was tested, what was found, what the result means for the causal model, and what to test next