Canary deployment
Gradual rollout of a new Olympus version
Canary: deploy the new version to a small percentage of traffic, observe, gradually increase.
When to canary
Most Olympus changes are routine and don't warrant a canary, blue/green is fine. Canary for:
- Risky changes (Hydra major version bump).
- Behavior changes (new MFA enforcement, password policy change).
- Performance-sensitive changes.
Setup
Two backends weighted by your load balancer:
ciam.your-domain.com {
reverse_proxy {
to stable-host:443 9
to canary-host:443 1
lb_policy weighted_round_robin
}
}This sends ~10% of traffic to canary.
What to watch
For the canary backend only:
# Error rate
rate(http_requests_total{instance="canary",status=~"5.."}[5m])
/ rate(http_requests_total{instance="canary"}[5m])# Login success rate
rate(kratos_login_total{instance="canary",outcome="success"}[5m])
/ rate(kratos_login_total{instance="canary"}[5m])# Latency p99
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{instance="canary"}[5m]))Compare to stable. If canary is worse on any:
- Reduce to 0% (back to stable-only).
- Investigate.
Gradual ramp
0% → 1% → 5% → 25% → 50% → 100%Each step: wait at least 15 min (or one busy period), check metrics, advance.
Automate this via your deploy tool (Argo Rollouts, Flagger, custom script reloading Caddy with new weights).
Sticky canary
A returning user should see the same version they saw last time, otherwise their flow breaks mid-session.
Option A: stick by session cookie:
reverse_proxy {
to stable-host:443
to canary-host:443
lb_policy cookie ory_kratos_session
}Caddy hashes the cookie value; same value → same backend. Cleanest.
Option B: stick by hashed user ID at app level. Requires app-level routing logic. Skip unless you need precise targeting.
Canary-only flags
Beyond traffic-splitting, use feature flags so the behavior is gated:
const useNewLoginFlow = canary && userId.startsWith("0"); // 1/16 of canary usersCombines low blast radius (10% of traffic) × low blast radius (1/16 of those) = ~0.6% of users. Very safe.
When canary fails
Symptoms:
- Error rate up.
- Auth failures up.
- Latency degraded.
Actions:
- Set canary weight to 0 (Caddy reload).
- Capture last 15 min of canary logs.
- Compare with stable.
- If reversible: investigate, fix forward.
- If irreversible damage (e.g., bad data written): incident response, restore from backup of affected scope.
Canary in single-host deployments
If you only run one host, canary is harder, there's no second backend. Two options:
Option A: Local two-stack. Run a separate set of containers on the same host with a different external port, route via Caddy.
# docker-compose.canary.yml
ciam-hera-canary:
image: ghcr.io/olympusoss/hera:v1.5.0-rc1
ports: ["13000:3000"]reverse_proxy {
to localhost:3000 9 # stable
to localhost:13000 1 # canary
}Option B: Feature-flag only. Skip traffic splitting; deploy the new version everywhere but gate new behavior behind a flag. Ramp the flag.
This is what Olympus uses internally for most behavior changes.