Blue/green deployment

Blue/green: two complete environments. Blue is live; green is the new version. Cut traffic over atomically; if anything's wrong, cut back.

Topology

                  ┌─ Load balancer (Caddy / Cloudflare LB) ─┐
                  │                                          │
            ┌─────▼─────┐                              ┌─────▼─────┐
            │   BLUE    │                              │   GREEN   │
            │  v1.4.2   │ ← all traffic                │  v1.5.0   │ ← idle, warm
            │           │                              │           │
            │ Hera Hydra│                              │ Hera Hydra│
            │ Kratos    │                              │ Kratos    │
            │ Athena    │                              │ Athena    │
            └─────┬─────┘                              └─────┬─────┘
                  │                                          │
                  └──────────► Postgres ◄────────────────────┘
                            (shared, primary)

Both environments share the same database. The database is upgraded in advance (backward-compatible migrations).

Process

1. Migrations first

# On primary
podman exec ciam-kratos kratos migrate sql up
podman exec ciam-hydra hydra migrate sql up postgres://...

Migrations must be backward-compatible with the currently-running version. Specifically:

Add columns nullable.
Add tables (no harm to readers).
Don't drop columns yet.

2. Deploy green

# Green host
git pull
# pin to new version
sed -i 's/HERA_TAG=.*/HERA_TAG=v1.5.0/' .env
podman-compose pull
podman-compose up -d
# Smoke test green directly
curl https://green.your-domain/healthz

3. Cut traffic

In your load balancer:

# Caddy
ciam.your-domain.com {
  reverse_proxy {
    to green-host:443
    # (was blue-host:443)
  }
}

caddy reload

Cloudflare LB: change the priority / weights via API.

4. Watch

Tail metrics, errors, logs. Look for:

Error rate spikes.
Latency regressions.
Auth failures specific to green.

5a. Rollback

If anything bad:

reverse_proxy to blue-host:443

Done in seconds.

5b. Cleanup

If green is happy after an hour:

Stop blue.
Run any forward-only migrations (drop old columns, etc.).
Mark green as new blue.

Sessions across blue/green

Both environments share the database. Kratos session created on blue is valid on green. The user doesn't have to re-login.

OAuth2 tokens are signed with the same Hydra keys (in shared DB). Tokens issued by blue verify on green.

Database upgrades

If you need a non-backward-compatible schema change:

Plan a maintenance window (or accept brief unavailability).
Stop blue.
Run migration.
Start green.

OR use the expand/contract pattern:

Expand: add new column (nullable). Deploy v1.5 that writes to both old and new. Blue/green deploy.
Backfill data: populate new column for existing rows.
Contract: deploy v1.6 that reads only new column. Drop old column.

3 deploys but zero downtime.

Caveats

Shared DB load: both environments query the same DB. Spin up extra DB capacity before deploy.
Cache invalidation: if either environment caches DB results, deploys can show stale data briefly. Bound TTL.
WebSocket connections: persistent connections to blue don't transfer to green. Acceptable, clients reconnect.

Compare to rolling deploys

Aspect	Blue/Green	Rolling
Resource cost	2x (briefly)	1x
Rollback	Instant (LB cut)	Slower (rollback rolls)
Database migrations	Same	Same
Operational simplicity	Higher	Lower

For Olympus's typical single-host deployments, blue/green is easy. For multi-host, rolling is more common.

Blue/green deployment

On this page