Olympus Docs
CookbookDefensive security

Anonymizing audit logs after retention

Keep aggregate stats; drop personal data

GDPR's data minimization: don't keep PII longer than necessary. But you want long-term metrics (year-over-year login trends, etc.). Solution: anonymize after retention.

What anonymization means

After N days, replace identifiers with random values, drop PII fields:

UPDATE security_audit
SET
  identity_id = NULL,                       -- can't link to user
  source_ip = '0.0.0.0',                    -- generic
  user_agent = 'redacted',                  -- no fingerprint
  metadata = '{}'::jsonb                    -- no specifics
WHERE created_at < NOW() - INTERVAL '90 days';

What's left: event_type, outcome, created_at. Useful for aggregate stats.

Schedule

Daily cron:

# /etc/cron.daily/audit-anonymize
psql -d olympus -c "
  UPDATE security_audit
  SET identity_id = NULL, source_ip = '0.0.0.0', user_agent = 'redacted', metadata = '{}'::jsonb
  WHERE created_at < NOW() - INTERVAL '90 days'
    AND identity_id IS NOT NULL;
"

Run nightly during low-traffic. Indexed on created_at, fast.

Aggregate aggregates

Once anonymized, build aggregate tables for analytics:

CREATE TABLE daily_audit_stats (
  day DATE PRIMARY KEY,
  total_logins BIGINT,
  failed_logins BIGINT,
  total_registrations BIGINT,
  total_password_changes BIGINT,
  -- ... per event_type
  unique_user_estimate BIGINT  -- HyperLogLog
);

INSERT INTO daily_audit_stats
SELECT DATE_TRUNC('day', created_at) AS day,
       SUM(CASE WHEN event_type = 'login' THEN 1 ELSE 0 END) AS total_logins,
       SUM(CASE WHEN event_type = 'login' AND outcome = 'failure' THEN 1 ELSE 0 END) AS failed_logins,
       -- ...
FROM security_audit
WHERE created_at > NOW() - INTERVAL '1 day'
GROUP BY 1;

Aggregate before anonymizing. Lose user-level detail, keep stats.

Retention policy

DataRetention
Full audit (identifiable)90 days
Anonymized event log2 years
Aggregate stats7+ years

Trade-off: less granular over time, but storage grows linearly. Aggregate-only is tiny.

What you lose

  • Investigating "who did X" only works within 90 days.
  • DSR requests beyond 90 days: less to return.
  • Long-tail abuse patterns are obscured.

If you need longer history of specific events (security-relevant), keep those longer:

UPDATE security_audit
SET identity_id = NULL, ...
WHERE created_at < NOW() - INTERVAL '90 days'
  AND event_type NOT IN ('admin_role_changed', 'admin_account_locked', 'admin_account_deleted');

Admin actions retained longer (2 years).

"Pseudonymous" vs "anonymous"

  • Anonymous: cannot be re-identified, even with effort.
  • Pseudonymous: identifiers replaced with random IDs; could be re-identified with mapping.

True anonymization is hard:

SELECT event_type, source_ip, user_agent FROM audit;

Even with names removed, the combination of IP + UA + timestamp can identify a person. Aggressive anonymization replaces ALL.

For GDPR claim: prefer aggregate stats over per-event records past retention.

Pseudonymization for retention

If you want to be able to research patterns ("this user pattern repeats"):

UPDATE security_audit
SET identity_id = digest(identity_id::text || 'secret', 'sha256')
WHERE created_at < NOW() - INTERVAL '90 days';

Now you can ask "events for identity_pseudonym_X", but you can't get back to the real user.

Pseudonyms are still personal data under GDPR if reversible. Keep the secret well-secured.

Verify anonymization

Periodic check:

SELECT COUNT(*) FROM security_audit
WHERE created_at < NOW() - INTERVAL '90 days'
  AND identity_id IS NOT NULL;

Should be 0. If non-zero, the cron isn't running.

Alert:

audit_unanonymized_old_count > 100

Audit log of anonymization

Anonymize is itself an audit event:

INSERT INTO security_audit (event_type, metadata, created_at)
VALUES ('audit_anonymized', '{"rows_affected": $count, "cutoff_date": "$date"}', NOW());

Trace of what got anonymized when.

Schema design for anonymization

Plan from day 1, design schema to facilitate:

CREATE TABLE security_audit (
  id UUID PRIMARY KEY,
  identity_id UUID,                  -- nullable, for post-anon
  identity_id_anonymized BOOLEAN DEFAULT false,
  source_ip INET,
  source_ip_redacted BOOLEAN DEFAULT false,
  ...
);

Flags signal "this was anonymized." Useful for differentiating.

DSR conflict

A user requests deletion of their data. You comply: delete identity, but their audit events might still be linkable (pre-anonymization).

If their data is within retention (< 90 days), anonymize their specific events early:

async function deleteIdentity(id) {
  await kratos.adminDelete(id);
  await db`
    UPDATE security_audit 
    SET identity_id = NULL, source_ip = '0.0.0.0', user_agent = 'redacted', metadata = '{}' 
    WHERE identity_id = ${id}
  `;
}

GDPR right to be forgotten + audit retention can conflict, usually right-to-be-forgotten wins, but document retention for "legal obligation" cases.

On this page