Chapter 5 · Product Analyst

5. A/B testing & experimentation

~11 min read

Experimentation is how product teams establish causation instead of guessing from correlation. For a product analyst it is a core skill and a heavily tested interview topic.

5.1 Why randomization is everything#

In an A/B test, users are randomly assigned to a control (current experience) or a variant (the change). Randomization makes the two groups statistically equivalent on everything except the change, so any difference in outcome can be attributed to the change. Remove randomization and you only have correlation.

An A/B test. Randomize, run for a full cycle, then compare with significance and guardrails.

5.2 The lifecycle of a sound experiment#

  1. Hypothesis. A specific, falsifiable prediction with a primary metric chosen in advance.
  2. Power analysis. Compute the sample size to detect the smallest effect worth caring about.
  3. Randomize & run. Assign users, run for full business cycles (cover weekends), avoid peeking.
  4. Analyze. Compare the primary metric, compute significance and CI, check guardrails.
  5. Decide. Ship, iterate, or kill, weighing statistical and practical significance together.

5.3 The concepts interviewers probe#

ConceptPlain meaningCommon trap
Null hypothesis'No real difference'Forgetting it's the default
p-valueChance of data this extreme if null is true'95% chance B is better', wrong
Significance (α)False-positive rate you accept (5%)Confusing with effect size
Power (1−β)Chance of catching a real effect (80%)Underpowered = inconclusive
Confidence intervalPlausible range for the true effectMore useful than p alone
MDESmallest effect you can reliably detectSet before, not after

5.4 When you can't run a clean A/B test#

Sometimes randomization isn't possible, network effects, a feature everyone sees, a pricing change. Reach for quasi-experimental methods: difference-in-differences, holdout groups, or staged geo rollouts. Naming one when asked 'what if you can't A/B?' sets you apart.

5.5 Sample size, intuitively#

If you want to…Sample size…
Detect a smaller effectIncreases (often dramatically)
Measure a noisier metricIncreases
Raise confidence (lower α) or powerIncreases
Test on a higher base rateDecreases

5.6 Novelty and primacy effects#

Early results can lie in two opposite ways. A novelty effect is a temporary lift because the change is new. A primacy effect is a temporary dip because users are jarred before they adapt. Both resolve over time, run full cycles.

5.7 Network interference#

Standard A/B testing assumes one user's experience doesn't affect another's. That breaks for social, marketplace, and communication products. Teams handle this with cluster-randomized designs, randomize whole communities or geographies rather than individuals.

5.8 Failure modes, a reference#

PitfallWhat goes wrongThe fix
PeekingStopping early at significance inflates false positivesFix sample size up front
Underpowered testToo few users to detect the real effectPower analysis before launch
Ignoring guardrailsPrimary metric wins, key metric quietly breaksPre-define guardrails
Novelty effectEarly lift fades as novelty wears offRun full cycles; watch trend
Network interferenceVariant affects control via the networkCluster-randomize
Multiple comparisonsTesting many metrics finds false winnersCorrect α (e.g. Bonferroni)
Sample ratio mismatchSplit isn't actually 50/50 → broken assignmentCheck ratio first

Get the next chapter and weekly interview tips by email

One short email per week. Skim in a minute. Unsubscribe anytime.