Chapter 10 · Data Analyst
Handling data responsibly
~5 min read
Most training teaches you to query data and ignores the question of whether you are allowed to. Mishandling personal data is a legal and ethical risk, not just a technical one, and it is increasingly tested in interviews. This chapter covers what a data analyst specifically needs. A separate cross-role Compliance and Domain Knowledge guide goes deeper across industries; here we focus on the habits that apply while you write SQL and build reports.
10.1 Know what you are holding#
Personally identifiable information, or PII, is any data that can identify a specific person, directly (name, email, national ID) or indirectly in combination (birth date plus postal code plus gender can re-identify someone). Two subsets carry extra rules: health data under HIPAA and payment-card data under PCI DSS. Many privacy laws now also single out a sensitive category, including government IDs, precise location, and biometric data, for stricter handling.
10.2 Safe-handling habits#
- Pull the minimum. If you do not need the email to answer the question, do not select it. The safest field is the one you never queried.
- Prefer aggregated or de-identified data. Work at the level the question needs. A trend by region rarely requires individual records.
- Keep regulated data in approved systems. Do not export PII to a local spreadsheet, a personal drive, or an unapproved AI tool.
- Respect access and purpose. Having query access to a field is not the same as being allowed to use it for this purpose.
- Ask before, not after. If you are unsure whether a field is permitted, check with data governance before you run the query.
10.3 Using AI tools responsibly#
- Always validate the output. AI can produce confident, wrong queries. Check results against known totals exactly as you would your own work, and be able to explain any generated query line by line, because you own the result.
- Never paste regulated data into an unapproved tool. Feeding PII, health, payment, or confidential records into a public AI tool can be a serious data breach. Use only employer-approved, governed environments for anything touching sensitive data.
10.4 Domain knowledge#
| Industry | Know that it exists | Why it matters |
|---|---|---|
| Banking / finance | FFIEC reporting; Federal Reserve FR Y-14 stress-test data | Regulatory reports demand accuracy and lineage |
| Healthcare | HIPAA and protected health information | Default to de-identified data; strict access |
| Retail / e-commerce | PCI DSS for card data; retail metrics | Never store raw card data; know the KPIs |
| Technology | GDPR and CCPA on user data | User-level data is tightly access-controlled |
Get the next chapter and weekly interview tips by email
One short email per week. Skim in a minute. Unsubscribe anytime.
