Postgres read replicas
Offload read traffic to replicas
For larger deployments, read traffic dominates: every whoami, every token introspection, every audit log query reads. Offload them to read replicas to reduce primary load.
Topology
┌─ writes ──► primary ─────► replication ──► replica ─┐
│ (streaming) │
┌────┴────┐ ┌──────▼─────┐
│ Kratos │ │ reads │
│ Hydra │ │ Athena │
│ Athena │ │ (selects) │
└─────────┘ └────────────┘Set up replication
On primary:
# postgresql.conf
wal_level = replica
max_wal_senders = 5
max_replication_slots = 5CREATE USER replicator REPLICATION PASSWORD 'X';
SELECT pg_create_physical_replication_slot('replica1');# pg_hba.conf
host replication replicator <replica-ip>/32 scram-sha-256Restart primary.
On replica:
pg_basebackup -h <primary> -U replicator -D /var/lib/postgresql/data -R -S replica1 -P# postgresql.conf
primary_conninfo = 'host=primary user=replicator password=X application_name=replica1'
primary_slot_name = 'replica1'
hot_standby = onStart replica:
systemctl start postgresqlVerify:
-- On primary
SELECT * FROM pg_stat_replication;Should show replica's state, lag.
Route reads
Olympus services use a single DSN today. To split reads/writes, edit each service:
Kratos
Kratos doesn't directly support a separate read URL. Workaround: route via PgBouncer with two pools (olympus for writes, olympus_ro for reads) and configure your code to use the right one for each query.
Practical: don't split for Kratos. The savings are marginal because Kratos is mostly latency-sensitive auth flows.
Hydra
Same, single DSN. Skip splitting.
Athena
Athena's reads (admin queries, list views) are the heavy ones. Provide a separate read URL:
// athena/src/lib/db.ts
import { sql } from "drizzle-orm";
export const db = drizzle(pool); // primary, for writes
export const dbRO = drizzle(replicaPool); // replica
// In code:
await db`UPDATE identities SET ...`;
const identities = await dbRO`SELECT * FROM identities`;Manual choice per query. Worth the effort for analytics-heavy admin UIs.
Replication lag
Replicas are eventually consistent. Lag is typically < 100ms but can spike under load.
-- On replica
SELECT NOW() - pg_last_xact_replay_timestamp() AS lag;Monitor:
pg_replication_lag_seconds > 5If lag > 5s consistently, your replica can't keep up. Either:
- Reduce write load on primary.
- Faster network / disk on replica.
- Larger replica instance.
Read-your-writes
A user updates their profile, refreshes, sees old data because the replica hasn't caught up.
Avoid by:
- Routing reads from the same session to primary for N seconds after a write.
- Reading from primary for "current user" queries.
async function getCurrentUserData(userId) {
const recentWrite = await redis.get(`recent_write:${userId}`);
if (recentWrite) return await db`SELECT * FROM identities WHERE id=${userId}`;
return await dbRO`SELECT * FROM identities WHERE id=${userId}`;
}
async function updateUserData(userId, data) {
await db`UPDATE ...`;
await redis.setex(`recent_write:${userId}`, 5, "1");
}Failover
If primary fails, promote a replica:
pg_ctl promote -D /var/lib/postgresql/dataReplica is now writable. Point all services at it.
This is a manual failover. For automatic, use Patroni or repmgr.
Olympus doesn't ship with HA. If you need it, layer Patroni on top.
Caveats
- Replica is read-only, writes go to primary.
- DDL (schema changes) on primary replicate automatically.
- Replication uses WAL, large transactions can lag.
- Replica restart re-syncs from primary (small) or pg_basebackup (large).
Cost
Each replica:
- Roughly same resources as primary (or smaller if read-only).
- 2x storage (replica has its own copy).
For < 100k MAU, one replica is plenty.