Chapter 5 · Product Analyst

5. A/B testing & experimentation

~11 min read

Experimentation is how product teams establish causation instead of guessing from correlation. For a product analyst it is a core skill and a heavily tested interview topic.

5.1 Why randomization is everything#

In an A/B test, users are randomly assigned to a control (current experience) or a variant (the change). Randomization makes the two groups statistically equivalent on everything except the change, so any difference in outcome can be attributed to the change. Remove randomization and you only have correlation.

An A/B test. Randomize, run for a full cycle, then compare with significance and guardrails.

5.2 The lifecycle of a sound experiment#

Hypothesis. A specific, falsifiable prediction with a primary metric chosen in advance.
Power analysis. Compute the sample size to detect the smallest effect worth caring about.
Randomize & run. Assign users, run for full business cycles (cover weekends), avoid peeking.
Analyze. Compare the primary metric, compute significance and CI, check guardrails.
Decide. Ship, iterate, or kill, weighing statistical and practical significance together.

5.3 The concepts interviewers probe#

Concept	Plain meaning	Common trap
Null hypothesis	'No real difference'	Forgetting it's the default
p-value	Chance of data this extreme if null is true	'95% chance B is better', wrong
Significance (α)	False-positive rate you accept (5%)	Confusing with effect size
Power (1−β)	Chance of catching a real effect (80%)	Underpowered = inconclusive
Confidence interval	Plausible range for the true effect	More useful than p alone
MDE	Smallest effect you can reliably detect	Set before, not after

5.4 When you can't run a clean A/B test#

Sometimes randomization isn't possible, network effects, a feature everyone sees, a pricing change. Reach for quasi-experimental methods: difference-in-differences, holdout groups, or staged geo rollouts. Naming one when asked 'what if you can't A/B?' sets you apart.

5.5 Sample size, intuitively#

If you want to…	Sample size…
Detect a smaller effect	Increases (often dramatically)
Measure a noisier metric	Increases
Raise confidence (lower α) or power	Increases
Test on a higher base rate	Decreases

5.6 Novelty and primacy effects#

Early results can lie in two opposite ways. A novelty effect is a temporary lift because the change is new. A primacy effect is a temporary dip because users are jarred before they adapt. Both resolve over time, run full cycles.

5.7 Network interference#

Standard A/B testing assumes one user's experience doesn't affect another's. That breaks for social, marketplace, and communication products. Teams handle this with cluster-randomized designs, randomize whole communities or geographies rather than individuals.

5.8 Failure modes, a reference#

Pitfall	What goes wrong	The fix
Peeking	Stopping early at significance inflates false positives	Fix sample size up front
Underpowered test	Too few users to detect the real effect	Power analysis before launch
Ignoring guardrails	Primary metric wins, key metric quietly breaks	Pre-define guardrails
Novelty effect	Early lift fades as novelty wears off	Run full cycles; watch trend
Network interference	Variant affects control via the network	Cluster-randomize
Multiple comparisons	Testing many metrics finds false winners	Correct α (e.g. Bonferroni)
Sample ratio mismatch	Split isn't actually 50/50 → broken assignment	Check ratio first

Get the next chapter and weekly interview tips by email

One short email per week. Skim in a minute. Unsubscribe anytime.

4. Funnels, cohorts & retention

6. Product sense & diagnosing metric changes