Olympus Docs
TroubleshootingInfrastructure issues

Postgres disk full

Reclaiming space, root causes, prevention

No space left on device from Postgres. Service is down or write-failing.

Immediate triage

df -h

Find which volume is at 100%. Usually /var/lib/postgresql or /var/lib/containers.

Free a quick win

# Truncate Postgres logs older than 1 day
find /var/lib/postgresql/data/log -mtime +1 -delete

Truncate WAL if archiving is disabled

Risky, only if you don't have point-in-time recovery enabled:

# Check WAL usage
ls -la /var/lib/postgresql/data/pg_wal/

# If recovery is not configured, archived WALs can be removed:
podman exec ciam-postgres pg_archivecleanup /var/lib/postgresql/data/pg_wal $(podman exec ciam-postgres pg_controldata /var/lib/postgresql/data | grep "Latest checkpoint's REDO WAL file" | awk '{print $NF}')

DON'T just rm files in pg_wal, corrupts the database.

Identify what's consuming space

SELECT 
  schemaname, 
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname IN ('public', 'kratos', 'hydra', 'athena')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 20;

Common culprits:

  • security_audit table, audit logs accumulate.
  • hydra_oauth2_access, issued access tokens (Hydra cleans these but not always).
  • kratos_session, sessions, if you don't expire old ones.
  • Old Kratos flows, registration/login/recovery flow records, often not GC'd.

Cleanups by table

Kratos: old flows

DELETE FROM identity_recovery_codes WHERE expires_at < NOW() - INTERVAL '7 days';
DELETE FROM identity_verification_codes WHERE expires_at < NOW() - INTERVAL '7 days';
DELETE FROM selfservice_login_flows WHERE expires_at < NOW() - INTERVAL '7 days';
DELETE FROM selfservice_registration_flows WHERE expires_at < NOW() - INTERVAL '7 days';
-- etc. for recovery, verification, settings

Or use Kratos's CLI:

podman exec ciam-kratos kratos janitor --config /etc/config/kratos.yml --keep-last 168h

Hydra: expired tokens

podman exec ciam-hydra hydra janitor \
  --tokens \
  --access-token-lifespan 8760h \
  --refresh-token-lifespan 8760h \
  --consent-request-lifespan 720h \
  postgres://...

Audit log retention

DELETE FROM security_audit
WHERE event_type = 'login' 
  AND created_at < NOW() - INTERVAL '90 days';

DELETE FROM security_audit
WHERE event_type NOT IN ('login') 
  AND created_at < NOW() - INTERVAL '2 years';

After delete, VACUUM:

VACUUM (ANALYZE, VERBOSE) security_audit;

VACUUM FULL (extreme)

VACUUM FULL security_audit;

This reclaims disk space by rewriting the table. Requires exclusive lock, table unavailable during. Plan during maintenance window.

Add storage

Cheapest fix if you can't shrink fast enough:

# Hetzner: resize volume in web console, then on host:
sudo resize2fs /dev/disk/by-id/scsi-0HC_Volume_XXX

Prevention

Daily janitor cron

# /etc/cron.daily/olympus-janitor
podman exec ciam-kratos kratos janitor --keep-last 168h
podman exec ciam-hydra hydra janitor --tokens postgres://...

Audit log retention enforcer

# /etc/cron.daily/audit-retention
podman exec ciam-postgres psql -U olympus -c "
  DELETE FROM security_audit WHERE event_type='login' AND created_at < NOW() - INTERVAL '90 days';
  DELETE FROM security_audit WHERE created_at < NOW() - INTERVAL '2 years';
"

Disk monitoring

Alert on disk > 80%:

node_filesystem_avail_bytes / node_filesystem_size_bytes < 0.2

You should be alerted weeks before disk is full. If you got the alert at 100%, your alerts are misconfigured.

When you need a bigger DB

If you genuinely need more space than your VPS allows, time to migrate to managed Postgres (Neon, RDS, Cloud SQL). See Cookbook, Managed Postgres (Neon).

On this page