📊 Reading A/B tests well is 50% discipline, 50% not fooling yourself

PM A/B Test Analysis Guide
(2026 Edition)

Q: What p-value should PMs use for A/B tests?

0.05 is the industry default. For high-stakes tests (major redesigns, monetisation changes), use 0.01 — you want higher certainty before shipping. For quick iteration on low-risk features, 0.1 is sometimes acceptable. The trade-off: tighter p-value = more certainty but longer runs.

Q: How long should A/B tests run?

Pre-determined by your sample size calculation. For most consumer products, 7–14 days is typical (covers weekday/weekend patterns). Shorter tests miss cyclical effects; longer tests delay decisions unnecessarily. The cardinal sin: extending tests mid-run to 'wait for significance' — that's p-hacking, not patience.

Reading an A/B test well means working through a seven-point checklist covering sample size, statistical significance, effect size, guardrail health, segment consistency, and novelty effects, then applying five decision rules that separate a real win from a costly illusion. The default significance threshold is p < 0.05, tightened to 0.01 for high-stakes launches like major redesigns or monetisation changes.

7-point checklist for reading results, 5 segmentation lenses, 6 common biases, and 5 decision rules for shipping or killing.

By Naman Goyal · Product manager · Builder of PM Streak · Updated July 3, 2026

Build Experimentation Skills Daily — Free →

7-Point Reading Checklist

☐

Did we reach pre-committed sample size? If not, it's not done yet.

☐

Is the effect statistically significant? (p-value < 0.05)

☐

Is the effect size meaningful? (Practical significance, not just statistical)

☐

Did guardrail metrics stay healthy? Winning primary + broken guardrail = net loss.

☐

Does the effect hold across segments? If only 1 segment drives it, that's important context.

☐

Are there novelty effects that might fade? (Run 2 weekly cycles to confirm)

☐

Is the AA check clean? (A/A test during the run should show no difference)

5 Segmentation Lenses

New vs existing users — often move opposite directions

Mobile vs web — mobile-first products ship differently to each

Geographic — Tier-1 vs Tier-2/3 may behave differently

Acquisition channel — organic vs paid users have different baselines

Cohort (signup date) — recent cohorts can differ from old ones

6 Common Biases to Avoid

⚠️

Peeking early and stopping when you see significance — p-hacking

⚠️

Running multiple tests, picking the one that 'won' — multiple-comparison problem

⚠️

Attributing lift to the feature when seasonality explains it — correlation vs causation

⚠️

Ignoring guardrails that moved — primary won, but at what cost?

⚠️

Reading a flat test as 'no effect' vs 'effect too small to detect' — different conclusions

⚠️

Using the test as confirmation of your hypothesis rather than a test of it

5 Decision Rules

Primary wins significantly + guardrails healthy → ship

Primary flat + guardrails healthy → don't ship, but learnings are valuable

Primary wins but a guardrail breaks → don't ship, investigate trade-off

Primary wins in 1 segment only → ship to that segment if big enough; don't generalise

Result is inconclusive (underpowered) → decide: extend the test, run at higher N, or call based on judgment

FAQ

What p-value should PMs use for A/B tests?

0.05 is the industry default. For high-stakes tests (major redesigns, monetisation changes), use 0.01 — you want higher certainty before shipping. For quick iteration on low-risk features, 0.1 is sometimes acceptable. The trade-off: tighter p-value = more certainty but longer runs.

How long should A/B tests run?

Pre-determined by your sample size calculation. For most consumer products, 7–14 days is typical (covers weekday/weekend patterns). Shorter tests miss cyclical effects; longer tests delay decisions unnecessarily. The cardinal sin: extending tests mid-run to 'wait for significance' — that's p-hacking, not patience.

Keep learning

PM Reading Metrics

Read guide →

PM Launch Metrics

Read guide →

PM Funnel Analysis

Read guide →

PM Cohort Analysis

Read guide →

Build Experimentation Intuition Daily

Daily scenarios on reading experiment results and making correct ship/kill calls.

Start Free Trial →

PM A/B Test Analysis Guide(2026 Edition)

7-Point Reading Checklist

5 Segmentation Lenses

6 Common Biases to Avoid

5 Decision Rules

FAQ

What p-value should PMs use for A/B tests?

How long should A/B tests run?

Related guides

Build Experimentation Intuition Daily

PM A/B Test Analysis Guide
(2026 Edition)