Chapter 2 · Data Analyst
SQL, the analyst's primary language
~10 min read
If you learn one thing well, learn SQL. It is the most-tested skill in analyst interviews and the tool you reach for every day. The encouraging news is that the twenty percent of SQL you use eighty percent of the time is learnable in weeks. This chapter takes you from the mental model through the core verbs; the next chapter covers the advanced patterns that separate strong candidates from the rest.
2.1 SQL is set logic, not a script#
Beginners read a query top to bottom like a recipe. The database does not. It resolves clauses in a logical order that differs from how you write them. Internalizing one diagram fixes the majority of confusing errors, including the classic complaint that you cannot filter on an alias you just created in SELECT.
2.2 The core verbs#
| Clause | Job | Example fragment |
|---|---|---|
| SELECT | Choose columns or compute values | SELECT name, price*qty |
| FROM / JOIN | Choose and combine tables | FROM orders o JOIN users u |
| WHERE | Filter rows before grouping | WHERE status = 'paid' |
| GROUP BY | Collapse rows into groups | GROUP BY region |
| HAVING | Filter groups after aggregating | HAVING SUM(rev) > 1000 |
| ORDER BY | Sort the result | ORDER BY rev DESC |
| LIMIT | Cap the rows returned | LIMIT 10 |
2.3 Combining tables with JOINs#
Data lives in many tables; insight requires combining them. A JOIN matches rows from two tables on a shared key. Four types cover almost everything you will do.
| Join | Returns | Typical use |
|---|---|---|
| INNER | Only rows matched in both tables | Orders that have a valid customer |
| LEFT | All left rows plus matches, NULL if none | All customers, even those with zero orders |
| RIGHT | All right rows plus matches | Rare; usually rewritten as LEFT |
| FULL OUTER | All rows from both sides | Reconciling two systems or finding gaps |
-- Customers who have never placed an order
SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
WHERE o.order_id IS NULL;2.4 Aggregation, turning rows into answers#
Aggregate functions (COUNT, SUM, AVG, MIN, MAX) collapse many rows into one value, almost always paired with GROUP BY to produce one value per group. This is how revenue by region or signups per day get built.
-- Monthly revenue and order count, most recent first
SELECT DATE_TRUNC('month', created_at) AS month,
COUNT(*) AS orders,
SUM(amount) AS revenue,
ROUND(AVG(amount),2) AS avg_order_value
FROM orders
WHERE status = 'paid'
GROUP BY 1
ORDER BY month DESC;2.5 SQL practice#
What is the difference between WHERE and HAVING?
Find customers with no orders. What is the cleanest approach?
Get the next chapter and weekly interview tips by email
One short email per week. Skim in a minute. Unsubscribe anytime.
