Chapter 6 · Data Analyst
Statistics analysts actually use
~8 min read
You do not need a statistics degree to be an excellent analyst. You need a working, intuitive grip on a small set of ideas and the discipline to know when a difference is real versus noise. This chapter is the practical core, and it skips the parts you will rarely touch.
6.1 Describing data before testing it#
When someone asks for average revenue per user, ask whether they want the mean or the median. With a few whales in the data, the mean lies and the median tells the truth. Knowing which to use, and why, is a maturity signal interviewers look for and a mistake juniors make constantly.
6.2 Correlation is not causation#
Two things moving together does not mean one causes the other. A third factor may drive both, or it may be coincidence. The useful version of this idea is not the slogan; it is being able to propose how you would establish causation. That usually means a controlled experiment, which is the next section.
6.3 A working grasp of A/B testing#
An A/B test, or controlled experiment, is how teams learn what actually works. Users are randomly split; one group sees the current experience (control), another sees a change (variant); you compare a target metric and decide whether the difference is real. Randomization is what lets you claim causation rather than correlation.
Four concepts to be able to explain#
- Hypothesis. A specific, measurable prediction, with a null hypothesis of no difference.
- p-value. The probability of seeing a result this extreme if there were truly no difference. Below your threshold, usually 0.05, means unlikely to be chance.
- Statistical significance. The result is unlikely to be noise. It says nothing about whether the effect is large enough to matter.
- Practical significance. Whether the effect is big enough to be worth acting on. A tiny lift can be significant and still pointless.
6.4 Samples and confidence#
Analysts almost always work with a sample of reality, not the whole truth, so a little humility about uncertainty goes a long way. A confidence interval expresses a range that likely contains the true value: 12 percent plus or minus 2 percent at 95 percent confidence is far more honest than a bare 12 percent. And sample size matters enormously: a 12 percent result from 50 people and from 50,000 people carry very different weight, and treating a tiny sample as solid is a classic mistake.
6.5 Statistics practice#
Explain a p-value to a non-technical stakeholder.
When would you report the median instead of the mean?
Get the next chapter and weekly interview tips by email
One short email per week. Skim in a minute. Unsubscribe anytime.
