Chapter 3 · Product Analyst

3. SQL for product analysts

~10 min read

Product analyst SQL has a distinct flavor: it is mostly about events, timestamped user actions, and the cohort, funnel, and retention questions you ask of them. This chapter focuses on the patterns that come up daily and in interviews.

3.1 The events table mental model#

Most product data is a long, narrow events table: one row per action, with a user id, an event name, a timestamp, and properties. Almost every product question becomes 'filter to the right events, group by the right key, and count or window over time.'

sql
-- events(user_id, event_name, ts, platform, properties)
-- Daily active users by platform, last 30 days
SELECT DATE_TRUNC('day', ts) AS day,
       platform,
       COUNT(DISTINCT user_id) AS dau
FROM   events
WHERE  ts >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1, 2
ORDER BY day;

3.2 Window functions, your power tool#

The two you'll reach for most are ROW_NUMBER() (first/last event per user) and LAG() (gap between a user's actions).

sql
-- Each user's FIRST purchase (the activation event)
WITH ranked AS (
  SELECT user_id, ts, amount,
         ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY ts) AS rn
  FROM   events WHERE event_name = 'purchase'
)
SELECT user_id, ts AS first_purchase_at, amount
FROM   ranked WHERE rn = 1;

3.3 Cohort retention in SQL#

sql
-- Day-7 retention by signup-week cohort
WITH signups AS (
  SELECT user_id, DATE_TRUNC('week', MIN(ts)) AS cohort
  FROM events GROUP BY user_id
),
activity AS (
  SELECT DISTINCT user_id, DATE_TRUNC('day', ts) AS active_day
  FROM events
)
SELECT s.cohort,
       COUNT(DISTINCT s.user_id) AS cohort_size,
       COUNT(DISTINCT CASE WHEN a.active_day = s.cohort + INTERVAL '7 days'
             THEN a.user_id END) AS retained_d7
FROM signups s
LEFT JOIN activity a ON a.user_id = s.user_id
GROUP BY s.cohort ORDER BY s.cohort;

3.4 Funnels in SQL#

sql
-- View -> Add to cart -> Purchase funnel
SELECT
  COUNT(*)                                    AS viewed,
  COUNT(cart_ts)                              AS added,
  COUNT(CASE WHEN buy_ts > cart_ts THEN 1 END) AS bought
FROM (
  SELECT user_id,
    MIN(CASE WHEN event_name='view' THEN ts END) AS view_ts,
    MIN(CASE WHEN event_name='cart' THEN ts END) AS cart_ts,
    MIN(CASE WHEN event_name='buy'  THEN ts END) AS buy_ts
  FROM events GROUP BY user_id
) u;

3.5 Sessionization#

sql
WITH gaps AS (
  SELECT user_id, ts,
    CASE WHEN ts - LAG(ts) OVER (PARTITION BY user_id ORDER BY ts)
           > INTERVAL '30 minutes' THEN 1 ELSE 0 END AS new_session
  FROM events
)
SELECT user_id, ts,
  SUM(new_session) OVER (PARTITION BY user_id ORDER BY ts) AS session_id
FROM gaps;

Get the next chapter and weekly interview tips by email

One short email per week. Skim in a minute. Unsubscribe anytime.