Blue/green deployment
Zero-downtime upgrades for Olympus
Blue/green: two complete environments. Blue is live; green is the new version. Cut traffic over atomically; if anything's wrong, cut back.
Topology
┌─ Load balancer (Caddy / Cloudflare LB) ─┐
│ │
┌─────▼─────┐ ┌─────▼─────┐
│ BLUE │ │ GREEN │
│ v1.4.2 │ ← all traffic │ v1.5.0 │ ← idle, warm
│ │ │ │
│ Hera Hydra│ │ Hera Hydra│
│ Kratos │ │ Kratos │
│ Athena │ │ Athena │
└─────┬─────┘ └─────┬─────┘
│ │
└──────────► Postgres ◄────────────────────┘
(shared, primary)Both environments share the same database. The database is upgraded in advance (backward-compatible migrations).
Process
1. Migrations first
# On primary
podman exec ciam-kratos kratos migrate sql up
podman exec ciam-hydra hydra migrate sql up postgres://...Migrations must be backward-compatible with the currently-running version. Specifically:
- Add columns nullable.
- Add tables (no harm to readers).
- Don't drop columns yet.
2. Deploy green
# Green host
git pull
# pin to new version
sed -i 's/HERA_TAG=.*/HERA_TAG=v1.5.0/' .env
podman-compose pull
podman-compose up -d
# Smoke test green directly
curl https://green.your-domain/healthz3. Cut traffic
In your load balancer:
# Caddy
ciam.your-domain.com {
reverse_proxy {
to green-host:443
# (was blue-host:443)
}
}caddy reloadCloudflare LB: change the priority / weights via API.
4. Watch
Tail metrics, errors, logs. Look for:
- Error rate spikes.
- Latency regressions.
- Auth failures specific to green.
5a. Rollback
If anything bad:
reverse_proxy to blue-host:443Done in seconds.
5b. Cleanup
If green is happy after an hour:
- Stop blue.
- Run any forward-only migrations (drop old columns, etc.).
- Mark green as new blue.
Sessions across blue/green
Both environments share the database. Kratos session created on blue is valid on green. The user doesn't have to re-login.
OAuth2 tokens are signed with the same Hydra keys (in shared DB). Tokens issued by blue verify on green.
Database upgrades
If you need a non-backward-compatible schema change:
- Plan a maintenance window (or accept brief unavailability).
- Stop blue.
- Run migration.
- Start green.
OR use the expand/contract pattern:
- Expand: add new column (nullable). Deploy v1.5 that writes to both old and new. Blue/green deploy.
- Backfill data: populate new column for existing rows.
- Contract: deploy v1.6 that reads only new column. Drop old column.
3 deploys but zero downtime.
Caveats
- Shared DB load: both environments query the same DB. Spin up extra DB capacity before deploy.
- Cache invalidation: if either environment caches DB results, deploys can show stale data briefly. Bound TTL.
- WebSocket connections: persistent connections to blue don't transfer to green. Acceptable, clients reconnect.
Compare to rolling deploys
| Aspect | Blue/Green | Rolling |
|---|---|---|
| Resource cost | 2x (briefly) | 1x |
| Rollback | Instant (LB cut) | Slower (rollback rolls) |
| Database migrations | Same | Same |
| Operational simplicity | Higher | Lower |
For Olympus's typical single-host deployments, blue/green is easy. For multi-host, rolling is more common.