PII redaction in logs
Don't leak personal data into log files
Logs are often forwarded to external systems (SIEM, log aggregator, ELK), so PII in logs leaks to those systems. Some PII is harmless; some is regulated. Mismanaged logs are a GDPR risk.
What is PII
For GDPR / CCPA purposes:
- Email addresses.
- Names.
- Phone numbers.
- IP addresses (yes, in EU).
- Identity UUID (linkable identifier).
Less obviously:
- User-Agent (linkable).
- Geographic info (city, country).
- Free-text user-provided fields.
What's "safe enough"
Internal logs in your own DB (audit log): PII OK with retention policy. Logs in external SaaS (Datadog, Sentry): redact. Logs in shared file system (rotated to S3): minimize.
Redaction strategies
Strategy A: Don't log it
// Don't
logger.info({ email: user.email, action: "login" });
// Do
logger.info({ user_id: user.id, action: "login" });Use IDs as references. Audit log (separate, retention-managed) has the email.
Strategy B: Hash
import { createHash } from "crypto";
function maskEmail(email: string) {
// alice@example.com → "a***@example.com"
const [local, domain] = email.split("@");
return `${local[0]}***@${domain}`;
}
logger.info({ user_email_masked: maskEmail(user.email) });Pattern-preserving, useful for debugging without exposing.
Strategy C: Tokenize
Replace PII with a token; keep the mapping in a secure separate store:
const token = await tokenize(user.email);
// Stores `email_token:X → real email` in encrypted vault
logger.info({ user_email_token: token });Reversible only by someone with vault access. Rarely worth the complexity for most apps.
Implementation
Pino (Node.js)
import pino from "pino";
const logger = pino({
redact: {
paths: [
"*.email",
"*.phone",
"request.headers.authorization",
"request.headers.cookie",
"response.body.access_token",
"response.body.refresh_token",
],
censor: "***REDACTED***",
},
});Pino strips these from every log output.
Bunyan / Winston
Similar redaction options. Document a list of fields to strip.
Caddy access logs
log {
format json
exclude {
Authorization
Cookie
Set-Cookie
}
}Removes those headers from access logs.
Sentry / Bugsnag / error-trackers
Configure scrubbing:
Sentry.init({
beforeSend(event) {
if (event.request?.cookies) delete event.request.cookies;
if (event.user?.email) event.user.email = maskEmail(event.user.email);
return event;
},
});Datadog / Loki / Splunk
Most log aggregators have field-redaction config server-side. Configure ASAP after onboarding, don't rely on it solely (defense in depth).
Stack traces
Stack traces can include variable values, which can include PII:
Error: failed to send
at sendEmail (...)
email = "alice@example.com"
attempts = 3Most logging frameworks suppress local variables in production. Verify yours does.
NODE_ENV=production node app.js # Pino: trims source info by defaultDatabase queries
Don't log queries that include PII:
// Bad
db.query("SELECT * FROM users WHERE email = $1", [email]).then(...)
// drizzle's logger:
logger.debug(`Query: SELECT * FROM users WHERE email = 'alice@example.com'`);Configure your ORM to log parameterized form, not interpolated:
SELECT * FROM users WHERE email = $1
[REDACTED]URLs in logs
GET /users/01HZQ2X.../sessionsIdentity UUIDs in paths are PII (linkable). Either:
- Don't log full paths.
- Mask the UUID portion.
function redactPath(path: string) {
return path.replace(/\/users\/[^/]+/, "/users/<id>");
}What you DO want to log
Despite redaction, logs should be useful:
| Useful | Sensitive |
|---|---|
| Event types (login_failed, registration) | The user's email |
| Outcomes (success, validation_error) | The actual password attempt |
| Response codes | Cookie values |
| Latency | Token contents |
Path patterns (/users/<id>) | Full URLs with PII |
Audit logs are different
The audit log (security_audit table) is supposed to contain PII, that's the point. It's separate from app logs:
- Stored encrypted at rest.
- Retention-limited (90 days for routine).
- Access-controlled.
- Not forwarded to external systems by default.
Test for leaks
Periodic grep over a sample of recent logs:
grep -E "(@|password|token|secret)" /var/log/app.log | head -100If you see PII / secrets, fix the source.
Automated: use an external scanner (gitleaks, trufflehog) on log streams to catch secrets leaks.
GDPR-specific
If you're notified by a user of a DSR (data subject request):
- The audit log gets cleared per their request.
- App logs SHOULD already be PII-free.
If app logs DO contain their email, you'll need to expunge those too. Painful, operationally avoid.