Rate Limiting
Caddy rate_limit module configuration for Hera, Athena, and Ory APIs
Overview
The Olympus CIAM login endpoint has two independent rate limiting layers that operate at different levels of the stack and produce distinct error responses. Integration consumers must handle both.
How It Works
Internet → Caddy (per-IP layer) → Hera login flow → SDK lockout check → KratosLayer 1 (Caddy) fires first. If a request passes the IP rate limit, Layer 2 (SDK) checks the per-account lockout state before forwarding credentials to Kratos. The two layers are independent, hitting Layer 1 does not affect Layer 2 counters, and vice versa.
Layer 1: Per-IP Throttling (Caddy)
| Property | Value |
|---|---|
| Scope | Per source IP address |
| Limit | 5 requests per 10 seconds |
| Response | HTTP 429 Too Many Requests |
| Reset | Sliding window, the counter resets after 10 seconds of no requests |
| Applied to | All requests to CIAM Hera login routes |
The Caddy rate limit is the network-layer backstop. It fires before any application code runs. There is no exponential backoff, the response is a flat 429 for every request that exceeds the limit.
Consumer action: Implement request queuing or exponential backoff on receipt of a 429. Do not retry immediately. A 429 from Caddy contains no retry_after field.
Layer 2: Per-Account Lockout (SDK)
| Property | Value |
|---|---|
| Scope | Per account identifier (email or username, lowercased) |
| Default threshold | 5 failed login attempts |
| Default window | 600 seconds (10 minutes), sliding |
| Default lockout duration | 900 seconds (15 minutes) |
| Response (browser) | HTTP 200 with Hera lockout page |
| Response (API consumer) | See JSON error shape below |
| Configuration | SDK settings table (security.brute_force.* keys) |
The SDK lockout check runs inside the Hera login flow, before credentials are forwarded to Kratos. A locked account never contacts Kratos, the lockout is enforced at the application layer.
Consumer action: Surface the lockout message to the user. Do not retry automatically, the account remains locked until lockout_duration_seconds expires or an administrator unlocks it manually via the Athena admin panel.
API / Technical Details
Error Response Shapes
Layer 1, Caddy 429:
HTTP/1.1 429 Too Many Requests
Content-Type: text/plain
Too Many RequestsNo Retry-After or X-RateLimit-* headers are returned. The response body is plain text. This is a known V1 limitation, the Caddy rate limit module does not emit structured headers or a JSON body in the current configuration. Standardizing the Caddy 429 response (adding Retry-After and X-RateLimit-* headers) is tracked as a follow-on infrastructure story (DX-P10-2).
Layer 2, SDK lockout (browser flow):
HTTP/1.1 200 OK
Content-Type: text/html
<!-- Hera lockout page with message: "Account temporarily locked. Try again in N minutes." -->Layer 2, SDK lockout (API consumer, JSON):
Note: The structured JSON lockout response is planned for a follow-on story (DX-P10-3, tracked in hera#26). The V1 implementation returns HTML for all lockout responses regardless of the
Acceptheader. Do not implement integrations against the JSON shape until hera#26 ships.
When the Accept: application/json header is present, Hera will return a structured error (planned, not yet implemented):
{
"error": "account_locked",
"message": "Account temporarily locked due to too many failed attempts.",
"retry_after": 847,
"retry_at": "2026-04-06T14:23:00Z"
}The retry_after field will be the number of seconds remaining until the lockout expires. The retry_at field will be the ISO 8601 timestamp at which the lockout expires.
Configuration Keys (SDK Settings Table)
Lockout thresholds are configurable via the SDK settings table. Changes take effect within the 60-second cache TTL, no service restart required.
| Key | Type | Default | Min | Max | Description |
|---|---|---|---|---|---|
security.brute_force.max_attempts | integer | 5 | 1 | 100 | Failed attempts before lockout |
security.brute_force.window_seconds | integer | 600 | 60 | 86400 | Sliding window duration |
security.brute_force.lockout_duration_seconds | integer | 900 | 60 | 86400 | How long lockout lasts |
security.brute_force.fail_open | boolean | true | - | , | Behavior if the database is unavailable |
Set configuration values via the Athena admin settings panel or directly via the SDK:
import { setSetting } from "@olympusoss/sdk";
// Tighten threshold to 3 attempts in 5 minutes, lockout for 30 minutes
await setSetting("security.brute_force.max_attempts", "3");
await setSetting("security.brute_force.window_seconds", "300");
await setSetting("security.brute_force.lockout_duration_seconds", "1800");Port Architecture
In production, Caddy is the only service that exposes host ports (80 and 443). All application services (Hera, Athena, Kratos, Hydra) are accessible only within the internal container network, they have no direct host port bindings. This constraint is documented inline in platform/prod/compose.prod.yml and enforced by deployment policy (see CLAUDE.md).
| Service | Accessible from | Not accessible from |
|---|---|---|
| Hera (login UI) | Caddy (internal) | Public internet directly |
| Athena (admin panel) | Caddy (internal) | Public internet directly |
| Kratos admin API | Internal network only | Caddy, public internet |
| Hydra admin API | Internal network only | Caddy, public internet |
Examples
Handling a 429 from Caddy
async function loginWithRetry(email: string, password: string) {
const response = await fetch("/self-service/login", {
method: "POST",
body: JSON.stringify({ identifier: email, password }),
});
if (response.status === 429) {
// Caddy layer, back off before retrying
await sleep(10_000); // Wait at least one Caddy window
return loginWithRetry(email, password);
}
return response;
}Handling a lockout response (JSON API consumer, planned)
Note: The JSON lockout response requires hera#26 (DX-P10-3). The V1 SDK lockout response is HTML-only. This example applies after hera#26 ships.
const response = await fetch("/self-service/login", {
method: "POST",
headers: { "Accept": "application/json" },
body: JSON.stringify({ identifier: email, password }),
});
const data = await response.json();
if (data.error === "account_locked") {
showLockoutMessage(`Account locked. Try again in ${Math.ceil(data.retry_after / 60)} minutes.`);
return;
}Distinguishing the two layers (planned, after hera#26)
Note: Distinguishing Layer 1 (Caddy 429) from Layer 2 (SDK lockout) by
Accept: application/jsoncontent negotiation requires hera#26. Until then, Layer 2 lockout responses are HTML pages and cannot be distinguished programmatically from a successful HTML flow.
const response = await fetch("/self-service/login", {
method: "POST",
headers: { "Accept": "application/json" },
body: JSON.stringify({ identifier: email, password }),
});
if (response.status === 429) {
// Layer 1: Caddy per-IP limit, too many requests from this IP
handleRateLimit();
} else if (response.ok) {
const data = await response.json();
if (data.error === "account_locked") {
// Layer 2: SDK per-account lockout, this specific account is locked
handleAccountLockout(data.retry_after);
} else {
handleLoginResult(data);
}
}Edge Cases
Database unavailable during lockout check
If the olympus PostgreSQL database is unavailable when Hera calls checkLockout(), the SDK logs the
failure at ERROR level and returns { locked: false }, the login proceeds (fail-open). This is a
documented design decision: a database outage is already a P0 incident; blocking all logins on top of it
compounds the impact for marginal security benefit.
The log entry carries the tag [security][brute_force][fail_open] for monitoring alert wiring:
[ERROR][security][brute_force][fail_open] Database unavailable, lockout check bypassed. Login proceeding.The Caddy layer remains active as a backstop during a database outage.
Distributed credential stuffing (many IPs, many accounts)
The per-account lockout layer (Layer 2) is not effective against distributed attacks that spread attempts across many accounts from many IP addresses. The per-IP Caddy layer (Layer 1) provides partial mitigation for high-volume distributed attacks.
Distributed brute force via botnet IP rotation, where an attacker cycles through many source IPs, is an accepted residual risk not mitigated by per-IP rate limiting. The CAPTCHA layer (platform#17) provides additional protection. Botnet-scale attacks require a WAF or Cloudflare-level control.
Proxy-in-front topology
If Olympus is deployed behind a load balancer, CDN, or reverse proxy, the Caddy rate limit key must be
updated from remote.host (TCP peer address) to an X-Forwarded-For-based key. Failing to do this causes
all users to share a single rate limit bucket (the proxy IP address).
See caddy-supply-chain.md, Proxy-in-Front Topology
for the required configuration changes, CIDR scoping requirements, and the caddy validate verification step.
Low max_attempts off-by-one
At max_attempts=1 or max_attempts=2, concurrent login requests processed simultaneously can cause
one extra attempt to pass before the lockout record is committed. This is a known V1 behavior of the
append-then-count pattern. At the default of max_attempts=5, the practical effect is negligible. Avoid
setting max_attempts below 3 in production.
Related Documentation
- caddy-supply-chain.md, version pinning, post-build smoke test, SHA-tagged image releases, and proxy topology configuration for the Caddy layer
- kratos-production-config.md, required Kratos production configuration including
leak_sensitive_values: false
Security Considerations
- The lockout check runs before Kratos credential validation. A locked account never reaches Kratos, preventing timing-based username enumeration via response latency differences.
- The lockout response does not reveal whether the account exists or how many attempts remain. The message is generic regardless of lockout cause.
- IP addresses are recorded in
ciam_login_attemptsfor administrative visibility. Hera reads the client IP from the trusted proxy header (X-Real-IPas set by Caddy), not fromX-Forwarded-For, which is spoofable by clients. - All manual account unlocks by administrators are recorded in
ciam_security_auditwith the admin's identity ID, the target identifier, and a timestamp. Audit records are append-only and never deleted. - The
security.brute_force.lockout_duration_secondssetting enforces a minimum of 60 seconds, values below 60 are rejected with a warning and the default (900s) is used instead. This prevents misconfiguration where a 0-second lockout silently disables protection.