Anonymizing audit logs after retention
Keep aggregate stats; drop personal data
GDPR's data minimization: don't keep PII longer than necessary. But you want long-term metrics (year-over-year login trends, etc.). Solution: anonymize after retention.
What anonymization means
After N days, replace identifiers with random values, drop PII fields:
UPDATE security_audit
SET
identity_id = NULL, -- can't link to user
source_ip = '0.0.0.0', -- generic
user_agent = 'redacted', -- no fingerprint
metadata = '{}'::jsonb -- no specifics
WHERE created_at < NOW() - INTERVAL '90 days';What's left: event_type, outcome, created_at. Useful for aggregate stats.
Schedule
Daily cron:
# /etc/cron.daily/audit-anonymize
psql -d olympus -c "
UPDATE security_audit
SET identity_id = NULL, source_ip = '0.0.0.0', user_agent = 'redacted', metadata = '{}'::jsonb
WHERE created_at < NOW() - INTERVAL '90 days'
AND identity_id IS NOT NULL;
"Run nightly during low-traffic. Indexed on created_at, fast.
Aggregate aggregates
Once anonymized, build aggregate tables for analytics:
CREATE TABLE daily_audit_stats (
day DATE PRIMARY KEY,
total_logins BIGINT,
failed_logins BIGINT,
total_registrations BIGINT,
total_password_changes BIGINT,
-- ... per event_type
unique_user_estimate BIGINT -- HyperLogLog
);
INSERT INTO daily_audit_stats
SELECT DATE_TRUNC('day', created_at) AS day,
SUM(CASE WHEN event_type = 'login' THEN 1 ELSE 0 END) AS total_logins,
SUM(CASE WHEN event_type = 'login' AND outcome = 'failure' THEN 1 ELSE 0 END) AS failed_logins,
-- ...
FROM security_audit
WHERE created_at > NOW() - INTERVAL '1 day'
GROUP BY 1;Aggregate before anonymizing. Lose user-level detail, keep stats.
Retention policy
| Data | Retention |
|---|---|
| Full audit (identifiable) | 90 days |
| Anonymized event log | 2 years |
| Aggregate stats | 7+ years |
Trade-off: less granular over time, but storage grows linearly. Aggregate-only is tiny.
What you lose
- Investigating "who did X" only works within 90 days.
- DSR requests beyond 90 days: less to return.
- Long-tail abuse patterns are obscured.
If you need longer history of specific events (security-relevant), keep those longer:
UPDATE security_audit
SET identity_id = NULL, ...
WHERE created_at < NOW() - INTERVAL '90 days'
AND event_type NOT IN ('admin_role_changed', 'admin_account_locked', 'admin_account_deleted');Admin actions retained longer (2 years).
"Pseudonymous" vs "anonymous"
- Anonymous: cannot be re-identified, even with effort.
- Pseudonymous: identifiers replaced with random IDs; could be re-identified with mapping.
True anonymization is hard:
SELECT event_type, source_ip, user_agent FROM audit;Even with names removed, the combination of IP + UA + timestamp can identify a person. Aggressive anonymization replaces ALL.
For GDPR claim: prefer aggregate stats over per-event records past retention.
Pseudonymization for retention
If you want to be able to research patterns ("this user pattern repeats"):
UPDATE security_audit
SET identity_id = digest(identity_id::text || 'secret', 'sha256')
WHERE created_at < NOW() - INTERVAL '90 days';Now you can ask "events for identity_pseudonym_X", but you can't get back to the real user.
Pseudonyms are still personal data under GDPR if reversible. Keep the secret well-secured.
Verify anonymization
Periodic check:
SELECT COUNT(*) FROM security_audit
WHERE created_at < NOW() - INTERVAL '90 days'
AND identity_id IS NOT NULL;Should be 0. If non-zero, the cron isn't running.
Alert:
audit_unanonymized_old_count > 100Audit log of anonymization
Anonymize is itself an audit event:
INSERT INTO security_audit (event_type, metadata, created_at)
VALUES ('audit_anonymized', '{"rows_affected": $count, "cutoff_date": "$date"}', NOW());Trace of what got anonymized when.
Schema design for anonymization
Plan from day 1, design schema to facilitate:
CREATE TABLE security_audit (
id UUID PRIMARY KEY,
identity_id UUID, -- nullable, for post-anon
identity_id_anonymized BOOLEAN DEFAULT false,
source_ip INET,
source_ip_redacted BOOLEAN DEFAULT false,
...
);Flags signal "this was anonymized." Useful for differentiating.
DSR conflict
A user requests deletion of their data. You comply: delete identity, but their audit events might still be linkable (pre-anonymization).
If their data is within retention (< 90 days), anonymize their specific events early:
async function deleteIdentity(id) {
await kratos.adminDelete(id);
await db`
UPDATE security_audit
SET identity_id = NULL, source_ip = '0.0.0.0', user_agent = 'redacted', metadata = '{}'
WHERE identity_id = ${id}
`;
}GDPR right to be forgotten + audit retention can conflict, usually right-to-be-forgotten wins, but document retention for "legal obligation" cases.