Postgres tuning from zero

Olympus's default Postgres is the official image with default config. Fine for dev. For production, tune.

Default vs tuned

Default postgres:16 settings:

shared_buffers: 128 MB.
work_mem: 4 MB.
max_connections: 100.

For 4 GB RAM host, these are wrong. Should be higher.

Tuning for Olympus workload

Olympus is mixed:

Many small reads (whoami, introspect).
Steady writes (sessions, audit).
Occasional larger reads (admin queries).

Tune for OLTP.

Suggested config (8GB RAM host)

# postgresql.conf

# Memory
shared_buffers = 2GB
work_mem = 16MB
maintenance_work_mem = 256MB
effective_cache_size = 6GB

# Connections
max_connections = 100

# WAL
wal_buffers = 64MB
max_wal_size = 4GB
min_wal_size = 1GB
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9

# Vacuum
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 30s

# Logging
log_min_duration_statement = 1000   # log queries > 1s
log_line_prefix = '%t [%p]: '
log_statement = 'none'              # too noisy for `all`
log_connections = on
log_disconnections = on
log_lock_waits = on

# Other
random_page_cost = 1.1              # SSDs are fast
effective_io_concurrency = 200      # SSDs handle concurrency

For larger hosts

Scale memory ~proportionally:

16 GB host: shared_buffers = 4GB, effective_cache_size = 12GB.
32 GB: shared_buffers = 8GB.

shared_buffers rule of thumb: 25% of RAM.

In Compose

ciam-postgres:
  image: postgres:16
  command: ["postgres", "-c", "config_file=/etc/postgresql.conf"]
  volumes:
    - ./postgresql.conf:/etc/postgresql.conf:ro
    - data:/var/lib/postgresql/data
  environment:
    POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}

Indexes

Olympus's tables have basic indexes. Add custom for your queries:

-- For finding identities by email
CREATE INDEX CONCURRENTLY ix_identities_email ON identities ((traits->>'email'));

-- For audit log filtering
CREATE INDEX CONCURRENTLY ix_audit_identity_created ON security_audit (identity_id, created_at DESC);

-- For session lookup
CREATE INDEX CONCURRENTLY ix_sessions_token ON kratos.sessions (token);  -- usually already there

CONCURRENTLY: no exclusive lock during creation.

Vacuum

Postgres auto-vacuums but for heavy-write tables, tune:

ALTER TABLE security_audit SET (
  autovacuum_vacuum_scale_factor = 0.05,  -- vacuum at 5% dead rows (default 20%)
  autovacuum_analyze_scale_factor = 0.02
);

Audit log grows fast, more frequent vacuum.

Statistics target

For better query plans:

ALTER TABLE identities ALTER COLUMN traits SET STATISTICS 1000;
ANALYZE identities;

Default 100. 1000 gives Postgres more data to pick good plans.

Long-running query monitoring

SELECT pid, query_start, state, query
FROM pg_stat_activity
WHERE state = 'active' AND query_start < NOW() - INTERVAL '5 minutes';

Stuck queries. Cancel:

SELECT pg_cancel_backend(pid);
SELECT pg_terminate_backend(pid);  -- harder

Index usage

SELECT 
  schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

Indexes never used: drop. Indexes with high idx_tup_read per scan: investigate (might not be selective enough).

Slow query log

log_min_duration_statement = 1000   # log queries > 1s

Periodically review:

podman logs ciam-postgres | grep "duration:" | head -100

Top slow queries → optimize them (index, rewrite).

Connection pooling

Each connection: ~10 MB RAM. 100 connections = 1 GB just for backends.

For more concurrent apps: PgBouncer in front (see PgBouncer).

Backups

pg_basebackup for full backup:

pg_basebackup -D /backup -Fp -X stream -P

Or pg_dump for logical:

pg_dump olympus | gzip > backup.sql.gz

Daily cron. See Backups - Postgres.

Replication

For HA:

Configure primary for WAL streaming.
Set up replica via pg_basebackup.
Promote replica if primary fails.

See read replicas.

Monitoring

Postgres exporter for Prometheus:

postgres-exporter:
  image: quay.io/prometheuscommunity/postgres-exporter
  environment:
    DATA_SOURCE_NAME: "postgresql://exporter:pass@ciam-postgres:5432/olympus?sslmode=disable"

Grafana dashboard: ID 9628 (postgres overview).

Key metrics:

Connections in use.
Cache hit ratio (should be > 95%).
Replication lag.
Disk usage.

Disk space

SELECT 
  schemaname, tablename,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 10;

Top space-consumers. If security_audit is huge: retention enforcement.

Major version upgrade

Postgres 16 → 17 ← BREAKING (in-place upgrade with pg_upgrade)
Postgres 16.3 → 16.4 ← minor, drop-in

For minor: stop, replace image, start. Quick.

For major: pg_upgrade with downtime, OR pg_dump+restore (longer downtime, simpler).

Common mistakes

Default work_mem too low

Complex queries spill to disk. Slow. Increase work_mem (per-session).

max_connections too high

Each backend is RAM. 500 connections = 5 GB just for backends. Use pgbouncer instead.

Tuning shared_buffers > 40% RAM

Above 40%, diminishing returns. Postgres benefits from OS filesystem cache too.

Postgres tuning from zero

On this page