Chapter 2 · Data Analyst

SQL, the analyst's primary language

~10 min read

If you learn one thing well, learn SQL. It is the most-tested skill in analyst interviews and the tool you reach for every day. The encouraging news is that the twenty percent of SQL you use eighty percent of the time is learnable in weeks. This chapter takes you from the mental model through the core verbs; the next chapter covers the advanced patterns that separate strong candidates from the rest.

2.1 SQL is set logic, not a script#

Beginners read a query top to bottom like a recipe. The database does not. It resolves clauses in a logical order that differs from how you write them. Internalizing one diagram fixes the majority of confusing errors, including the classic complaint that you cannot filter on an alias you just created in SELECT.

Figure 2. Written order versus logical execution order. The database evaluates FROM first and SELECT almost last.

2.2 The core verbs#

ClauseJobExample fragment
SELECTChoose columns or compute valuesSELECT name, price*qty
FROM / JOINChoose and combine tablesFROM orders o JOIN users u
WHEREFilter rows before groupingWHERE status = 'paid'
GROUP BYCollapse rows into groupsGROUP BY region
HAVINGFilter groups after aggregatingHAVING SUM(rev) > 1000
ORDER BYSort the resultORDER BY rev DESC
LIMITCap the rows returnedLIMIT 10

2.3 Combining tables with JOINs#

Data lives in many tables; insight requires combining them. A JOIN matches rows from two tables on a shared key. Four types cover almost everything you will do.

Figure 3. The four JOINs.
JoinReturnsTypical use
INNEROnly rows matched in both tablesOrders that have a valid customer
LEFTAll left rows plus matches, NULL if noneAll customers, even those with zero orders
RIGHTAll right rows plus matchesRare; usually rewritten as LEFT
FULL OUTERAll rows from both sidesReconciling two systems or finding gaps
sql
-- Customers who have never placed an order
SELECT c.customer_id, c.name
FROM   customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
WHERE  o.order_id IS NULL;

2.4 Aggregation, turning rows into answers#

Aggregate functions (COUNT, SUM, AVG, MIN, MAX) collapse many rows into one value, almost always paired with GROUP BY to produce one value per group. This is how revenue by region or signups per day get built.

sql
-- Monthly revenue and order count, most recent first
SELECT DATE_TRUNC('month', created_at) AS month,
       COUNT(*)             AS orders,
       SUM(amount)          AS revenue,
       ROUND(AVG(amount),2) AS avg_order_value
FROM   orders
WHERE  status = 'paid'
GROUP BY 1
ORDER BY month DESC;

2.5 SQL practice#

What is the difference between WHERE and HAVING?

Find customers with no orders. What is the cleanest approach?

Get the next chapter and weekly interview tips by email

One short email per week. Skim in a minute. Unsubscribe anytime.