An example of a product hypothesis for an A/B test should follow the format: if we make change X, then metric Y will increase by Z percent, because of mechanism M — and if the hypothesis is disproven, we learn that our assumption about mechanism M was wrong, which is as valuable as a positive result.
Most A/B test hypotheses fail in one of two ways: they're so vague that a positive result proves nothing ("adding a CTA button will increase engagement"), or they're so specific about the solution that a negative result produces no learning ("making the button green instead of blue will improve click rate"). A good hypothesis tests a mechanism, not just a change.
The Anatomy of a Strong Product Hypothesis
Hypothesis structure:
"If we [change X],
then [metric Y] will [increase/decrease] by [Z%],
because [users will do M differently due to N assumption]."
Learning if confirmed: N assumption was correct.
Learning if disproven: N assumption was wrong — investigate why.
Product Hypothesis Examples
Example 1: Onboarding Hypothesis
Hypothesis: If we add a progress indicator to the onboarding flow (showing "Step 2 of 4"), then Day 7 activation rate will increase by 15%, because users who see the end state have lower abandonment anxiety and are more likely to complete multi-step flows.
Metric: Day 7 activation rate Minimum detectable effect: 15% relative increase (from 34% to 39%) What we learn if confirmed: Progress visibility reduces abandonment anxiety in multi-step flows What we learn if disproven: Either the anxiety assumption is wrong, or progress visibility doesn't address the real abandonment cause
Example 2: Pricing Page Hypothesis
Hypothesis: If we add social proof (number of companies using each plan) to the pricing page, then trial-to-paid conversion will increase by 10%, because enterprise buyers use social proof to reduce vendor selection risk.
Metric: Trial-to-paid conversion rate (14-day window) Minimum detectable effect: 10% relative increase (from 4.2% to 4.6%) What we learn if confirmed: Social proof reduces purchase anxiety for this ICP What we learn if disproven: Either enterprise buyers don't use this type of social proof, or conversion is blocked by a different friction point
Example 3: Engagement Hypothesis
Hypothesis: If we send a personalized digest email summarizing each user's top 3 unread items (instead of a generic weekly update), then email open rate will increase by 20% and in-app sessions from email will increase by 30%, because personalized content reduces the cognitive load of deciding whether the email is relevant to open.
Metric: Email open rate and session starts attributed to email What we learn if confirmed: Personalization reduces the open-rate cost of the relevance decision What we learn if disproven: Either the content isn't the barrier (subject line may be), or users don't have unread item anxiety
According to Shreyas Doshi on Lenny's Podcast, the discipline of writing the learning statement before running an A/B test is what separates experimental cultures from testing cultures — testing cultures run experiments to get positive results, experimental cultures run experiments to update their model of how users behave, and they treat disproven hypotheses as equally valuable to confirmed ones.
Setting Up the Hypothesis: Statistical Requirements
Minimum Detectable Effect (MDE)
The MDE is the smallest effect size worth detecting. Setting it too small requires enormous sample sizes; setting it too large means you'll miss real but small effects.
Rule of thumb: Set MDE at the smallest effect that would change a product decision. If a 5% improvement in conversion wouldn't change your roadmap priority, don't bother running an experiment sensitive enough to detect 5%.
Sample Size Requirements
Use a sample size calculator before starting. Key inputs:
- Baseline rate: Current metric value (e.g., 4.2% conversion)
- MDE: Your minimum detectable effect (e.g., 10% relative = 4.6% target)
- Statistical power: 80% is standard (20% chance of missing a real effect)
- Significance level: 5% is standard (5% chance of a false positive)
According to Gibson Biddle on Lenny's Podcast, the most common A/B testing mistake in product development is ending tests early when results look positive — stopping at the first significant-looking result dramatically inflates the false positive rate and produces a product team that believes its experiments are working when many of the positive results are statistical noise.
Writing Hypotheses That Produce Learning on Failure
The hallmark of a high-quality hypothesis is that disproving it teaches you something specific about user behavior.
Low learning hypothesis: "Making the button more prominent will increase clicks."
- If disproven: You learn that button prominence doesn't matter. That's it.
High learning hypothesis: "Making the button more prominent will increase clicks because users currently miss it due to low visual contrast against the page background."
- If disproven: You learn that visibility isn't the barrier. Users see the button but aren't motivated to click it, which redirects investigation toward copy, value proposition, or timing rather than visual design.
According to Lenny Rachitsky's writing on product experimentation culture, the teams that build the best product intuition over time are those that write explicit mechanism hypotheses — they accumulate a model of how their specific users respond to changes, which compounds into better hypothesis quality with each experiment cycle.
FAQ
Q: What is a product hypothesis for an A/B test? A: A structured statement in the format: if we make change X, then metric Y will change by Z percent, because mechanism M will cause users to behave differently. The mechanism is what you're really testing.
Q: How do you write a good A/B test hypothesis? A: State the change, the expected metric movement with a specific percentage, and the causal mechanism you believe explains the effect. Include what you'll learn if the hypothesis is disproven — that forces you to test a mechanism, not just a change.
Q: What is a minimum detectable effect in A/B testing? A: The smallest effect size worth detecting, set at the level where the result would actually change a product decision. This determines the required sample size and test duration.
Q: When should you stop an A/B test? A: When you've reached your pre-determined sample size, not when results look positive. Stopping early at first significance dramatically increases the false positive rate.
Q: What do you do when an A/B test hypothesis is disproven? A: Document what assumption was wrong and why. A disproven hypothesis updates your model of user behavior just as much as a confirmed one — the learning is equally valuable if you wrote the mechanism explicitly.
HowTo: Write a Product Hypothesis for an A/B Test
- Write the hypothesis using the if-then-because format: if we make change X, then metric Y will change by Z percent, because mechanism M will cause users to behave differently
- State explicitly what you will learn if the hypothesis is confirmed and what you will learn if it is disproven — the disproven learning statement validates that you are testing a mechanism, not just a change
- Define the minimum detectable effect as the smallest improvement that would actually change a product decision, then calculate the required sample size using a statistical power calculator
- Set the test duration based on sample size requirements and traffic volume — never end the test early based on promising interim results
- Document the baseline metric value, the test start date, and the traffic allocation percentage before launching so the test conditions are unambiguous
- After the test concludes, update your product model based on the result — whether confirmed or disproven — and use the mechanism insight to inform the next hypothesis in the iteration cycle