Chapter 6 · Data Analyst

Statistics analysts actually use

~8 min read

You do not need a statistics degree to be an excellent analyst. You need a working, intuitive grip on a small set of ideas and the discipline to know when a difference is real versus noise. This chapter is the practical core, and it skips the parts you will rarely touch.

6.1 Describing data before testing it#

When someone asks for average revenue per user, ask whether they want the mean or the median. With a few whales in the data, the mean lies and the median tells the truth. Knowing which to use, and why, is a maturity signal interviewers look for and a mistake juniors make constantly.

6.2 Correlation is not causation#

Two things moving together does not mean one causes the other. A third factor may drive both, or it may be coincidence. The useful version of this idea is not the slogan; it is being able to propose how you would establish causation. That usually means a controlled experiment, which is the next section.

6.3 A working grasp of A/B testing#

An A/B test, or controlled experiment, is how teams learn what actually works. Users are randomly split; one group sees the current experience (control), another sees a change (variant); you compare a target metric and decide whether the difference is real. Randomization is what lets you claim causation rather than correlation.

Figure 5. Anatomy of an A/B test. Random assignment is the ingredient that makes the comparison causal.

Four concepts to be able to explain#

Hypothesis. A specific, measurable prediction, with a null hypothesis of no difference.
p-value. The probability of seeing a result this extreme if there were truly no difference. Below your threshold, usually 0.05, means unlikely to be chance.
Statistical significance. The result is unlikely to be noise. It says nothing about whether the effect is large enough to matter.
Practical significance. Whether the effect is big enough to be worth acting on. A tiny lift can be significant and still pointless.

6.4 Samples and confidence#

Analysts almost always work with a sample of reality, not the whole truth, so a little humility about uncertainty goes a long way. A confidence interval expresses a range that likely contains the true value: 12 percent plus or minus 2 percent at 95 percent confidence is far more honest than a bare 12 percent. And sample size matters enormously: a 12 percent result from 50 people and from 50,000 people carry very different weight, and treating a tiny sample as solid is a classic mistake.

6.5 Statistics practice#

Explain a p-value to a non-technical stakeholder.

When would you report the median instead of the mean?

Get the next chapter and weekly interview tips by email

One short email per week. Skim in a minute. Unsubscribe anytime.

Data quality and cleaning

Business metrics and KPIs