Postgres tuning from zero
Setting up Postgres for Olympus from default
Olympus's default Postgres is the official image with default config. Fine for dev. For production, tune.
Default vs tuned
Default postgres:16 settings:
shared_buffers: 128 MB.work_mem: 4 MB.max_connections: 100.
For 4 GB RAM host, these are wrong. Should be higher.
Tuning for Olympus workload
Olympus is mixed:
- Many small reads (whoami, introspect).
- Steady writes (sessions, audit).
- Occasional larger reads (admin queries).
Tune for OLTP.
Suggested config (8GB RAM host)
# postgresql.conf
# Memory
shared_buffers = 2GB
work_mem = 16MB
maintenance_work_mem = 256MB
effective_cache_size = 6GB
# Connections
max_connections = 100
# WAL
wal_buffers = 64MB
max_wal_size = 4GB
min_wal_size = 1GB
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
# Vacuum
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 30s
# Logging
log_min_duration_statement = 1000 # log queries > 1s
log_line_prefix = '%t [%p]: '
log_statement = 'none' # too noisy for `all`
log_connections = on
log_disconnections = on
log_lock_waits = on
# Other
random_page_cost = 1.1 # SSDs are fast
effective_io_concurrency = 200 # SSDs handle concurrencyFor larger hosts
Scale memory ~proportionally:
- 16 GB host:
shared_buffers = 4GB,effective_cache_size = 12GB. - 32 GB:
shared_buffers = 8GB.
shared_buffers rule of thumb: 25% of RAM.
In Compose
ciam-postgres:
image: postgres:16
command: ["postgres", "-c", "config_file=/etc/postgresql.conf"]
volumes:
- ./postgresql.conf:/etc/postgresql.conf:ro
- data:/var/lib/postgresql/data
environment:
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}Indexes
Olympus's tables have basic indexes. Add custom for your queries:
-- For finding identities by email
CREATE INDEX CONCURRENTLY ix_identities_email ON identities ((traits->>'email'));
-- For audit log filtering
CREATE INDEX CONCURRENTLY ix_audit_identity_created ON security_audit (identity_id, created_at DESC);
-- For session lookup
CREATE INDEX CONCURRENTLY ix_sessions_token ON kratos.sessions (token); -- usually already thereCONCURRENTLY: no exclusive lock during creation.
Vacuum
Postgres auto-vacuums but for heavy-write tables, tune:
ALTER TABLE security_audit SET (
autovacuum_vacuum_scale_factor = 0.05, -- vacuum at 5% dead rows (default 20%)
autovacuum_analyze_scale_factor = 0.02
);Audit log grows fast, more frequent vacuum.
Statistics target
For better query plans:
ALTER TABLE identities ALTER COLUMN traits SET STATISTICS 1000;
ANALYZE identities;Default 100. 1000 gives Postgres more data to pick good plans.
Long-running query monitoring
SELECT pid, query_start, state, query
FROM pg_stat_activity
WHERE state = 'active' AND query_start < NOW() - INTERVAL '5 minutes';Stuck queries. Cancel:
SELECT pg_cancel_backend(pid);
SELECT pg_terminate_backend(pid); -- harderIndex usage
SELECT
schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;Indexes never used: drop. Indexes with high idx_tup_read per scan: investigate (might not be selective enough).
Slow query log
log_min_duration_statement = 1000 # log queries > 1sPeriodically review:
podman logs ciam-postgres | grep "duration:" | head -100Top slow queries → optimize them (index, rewrite).
Connection pooling
Each connection: ~10 MB RAM. 100 connections = 1 GB just for backends.
For more concurrent apps: PgBouncer in front (see PgBouncer).
Backups
pg_basebackup for full backup:
pg_basebackup -D /backup -Fp -X stream -POr pg_dump for logical:
pg_dump olympus | gzip > backup.sql.gzDaily cron. See Backups - Postgres.
Replication
For HA:
- Configure primary for WAL streaming.
- Set up replica via pg_basebackup.
- Promote replica if primary fails.
See read replicas.
Monitoring
Postgres exporter for Prometheus:
postgres-exporter:
image: quay.io/prometheuscommunity/postgres-exporter
environment:
DATA_SOURCE_NAME: "postgresql://exporter:pass@ciam-postgres:5432/olympus?sslmode=disable"Grafana dashboard: ID 9628 (postgres overview).
Key metrics:
- Connections in use.
- Cache hit ratio (should be > 95%).
- Replication lag.
- Disk usage.
Disk space
SELECT
schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 10;Top space-consumers. If security_audit is huge: retention enforcement.
Major version upgrade
Postgres 16 → 17 ← BREAKING (in-place upgrade with pg_upgrade)
Postgres 16.3 → 16.4 ← minor, drop-inFor minor: stop, replace image, start. Quick.
For major: pg_upgrade with downtime, OR pg_dump+restore (longer downtime, simpler).
Common mistakes
Default work_mem too low
Complex queries spill to disk. Slow. Increase work_mem (per-session).
max_connections too high
Each backend is RAM. 500 connections = 5 GB just for backends. Use pgbouncer instead.
Tuning shared_buffers > 40% RAM
Above 40%, diminishing returns. Postgres benefits from OS filesystem cache too.