Monitoring, PKCE Analytics
Observability for PKCE-protected OAuth2 traffic
Ticket: hera#32, hera#46 Last updated: 2026-04-12
Overview
hera/src/lib/analytics.ts provides a structured analytics emitter. hera/src/lib/auth.ts uses it to emit three auth.pkce.* events that cover the lifetime of an OAuth2 PKCE authorization code flow. These events feed the Athena analytics dashboard (PKCE adoption coverage, enforcement rejection rate, auth flow completion funnel, and code interception attack signal).
How It Works
Emitter Architecture
emitAnalyticsEvent() in analytics.ts writes newline-delimited JSON to stdout with a type:"analytics" discriminator. The Loki log pipeline consumes these with the query:
{app="hera"} | json | type="analytics"Events are emitted unconditionally, there is no NODE_ENV gate on the emit call. Every event carries an env property set to process.env.NODE_ENV. The Athena dashboard filters all production panels to env === 'production' at query time, so dev and staging traffic never reaches production dashboards. Do not add a NODE_ENV guard to the emit call, doing so would suppress production events if the check were ever inverted.
Event Flow
auth.pkce.started , emitted when code_verifier is generated and cookie is set
(before the authorization redirect to Hydra)
auth.pkce.completed , emitted when Hydra returns tokens (token exchange succeeded)
duration_ms = Date.now() - cookie.started_at
auth.pkce.failed , emitted when token exchange fails for any PKCE-related reason
reason maps to error_code and error_source fieldsThe started_at Unix ms timestamp is stored inside the pkce_code_verifier cookie as part of the PkceCookiePayload struct, not as a property of auth.pkce.started. This design is intentional: the callback handler must be able to compute duration_ms at auth.pkce.completed time without access to the original server action that emitted started.
API / Technical Details
emitAnalyticsEvent(properties), analytics.ts
The base emitter. All auth.pkce.* helpers in auth.ts call this.
emitAnalyticsEvent({
event: "auth.pkce.started",
client_id: "hera-ciam-client",
domain: "ciam",
method: "S256",
});Output written to stdout:
{
"type": "analytics",
"event": "auth.pkce.started",
"client_id": "hera-ciam-client",
"domain": "ciam",
"method": "S256",
"timestamp": "2026-04-06T00:00:00.000Z",
"env": "production"
}auth.pkce.started
Emitted inside redirectToDefaultClient() immediately after the pkce_code_verifier cookie is set and before the redirect to Hydra.
Properties: { client_id, domain, method: "S256", timestamp, env }
Scope limitation (hera#46): As of commit d16122e, this event only fires on the fallback path (direct Hydra redirect for non-standard clients). On the primary path, redirectToDefaultClient() calls getSiteLoginUrl() and redirects to the Site's /login/<domain> route before reaching the PKCE setup code. The Site app handles PKCE and state cookie setup on its own origin. If the Site app does not emit its own auth.pkce.started event, PKCE telemetry will not cover the primary login flow.
Important: emitPkceStarted is not an exported function. It is inlined inside redirectToDefaultClient() as a direct emitAnalyticsEvent() call. This is intentional, auth.pkce.started is only ever emitted in one place (the fallback non-Site-client path), so there is no benefit to exporting a helper. The export pattern (emitPkceCompleted, emitPkceFailed) is used only where the event may need to be emitted from an external callback route handler. Because auth.pkce.started is always emitted within the same function that generates the verifier, it does not need to be callable from outside this module.
emitPkceCompleted(clientId, startedAt), exported
Emits auth.pkce.completed with duration_ms computed from the stored started_at timestamp.
Pre-condition: Only call this after the Hydra token endpoint returns a 200 response. Calling it before confirming a successful token exchange would misrepresent the event as a completed flow.
Scope: This helper is exported for use by Hera callback route handlers only. It is not intended for import by Site or other apps, the Site app runs its own PKCE flow independently and has no dependency on Hera's auth module. Importing emitPkceCompleted into Site would create a cross-app dependency that violates the per-app module boundary.
Properties: { client_id, domain, duration_ms, timestamp, env }
// In a Hera callback route handler, after confirming 200 from Hydra:
const raw = cookieStore.get("pkce_code_verifier")?.value;
if (raw) {
const payload: PkceCookiePayload = JSON.parse(raw);
emitPkceCompleted(clientId, payload.started_at);
}emitPkceFailed(clientId, reason), exported
Emits auth.pkce.failed with mapped error_code and error_source fields.
Reason values and error_code mapping:
reason | error_code | When to use |
|---|---|---|
cookie_missing | invalid_request | pkce_code_verifier cookie absent at callback time |
missing_verifier | invalid_request | code_verifier not sent in the token exchange request |
challenge_mismatch | invalid_grant | Hydra returned invalid_grant (verifier/challenge mismatch) |
invalid_request | invalid_request | Hydra returned invalid_request on the token endpoint |
Note on error_code: "invalid_request" for cookie_missing and missing_verifier: These are pre-Hydra failures, the token exchange request is never sent to Hydra when the cookie is missing or the verifier is absent. Despite the failure happening before any Hydra call, the error_code field uses the Hydra-style invalid_request string for consistency with the analytics schema. This means the error_code field alone cannot distinguish a pre-Hydra failure from a Hydra-returned invalid_request. Use the error_source field (which carries the reason value) to distinguish them when querying dashboards or correlating against Hydra logs.
Properties: { client_id, domain, reason, error_code, error_source, timestamp, env }
PkceCookiePayload interface, exported from auth.ts
The JSON payload stored in the pkce_code_verifier cookie:
export interface PkceCookiePayload {
code_verifier: string; // passed to Hydra at token exchange time
started_at: number; // Unix ms, used to compute duration_ms at auth.pkce.completed
}Cookie value is JSON.stringify(payload), callers must JSON.parse to read it.
Examples
Emitting auth.pkce.started (inline in redirectToDefaultClient)
// Inside redirectToDefaultClient(), after cookie is set:
emitAnalyticsEvent({
event: "auth.pkce.started",
client_id: DEFAULT_OAUTH2_CLIENT_ID,
domain: IS_IAM ? "iam" : "ciam",
method: "S256",
});Emitting auth.pkce.completed from a callback route
import { emitPkceCompleted, PkceCookiePayload } from "@/lib/auth";
// In a Hera callback route handler, after confirming 200 from Hydra:
const raw = cookieStore.get("pkce_code_verifier")?.value;
if (raw) {
const payload: PkceCookiePayload = JSON.parse(raw);
emitPkceCompleted(clientId, payload.started_at);
}
cookieStore.delete("pkce_code_verifier");Emitting auth.pkce.failed from a callback route
import { emitPkceFailed } from "@/lib/auth";
// Cookie absent at callback time:
if (!cookieStore.get("pkce_code_verifier")) {
emitPkceFailed(clientId, "cookie_missing");
return new Response("Missing PKCE verifier", { status: 400 });
}
// Hydra returned invalid_grant:
if (hydraError === "invalid_grant") {
emitPkceFailed(clientId, "challenge_mismatch");
}Querying events in Athena
# All PKCE events from production:
{app="hera"} | json | type="analytics" | env="production" | event=~"auth\\.pkce\\..*"
# Failed flows only, grouped by error source:
{app="hera"} | json | type="analytics" | env="production" | event="auth.pkce.failed"
| stats count() by error_source
# Challenge mismatch, code interception attack signal:
{app="hera"} | json | type="analytics" | env="production" | event="auth.pkce.failed"
| error_source="challenge_mismatch"Edge Cases
auth.pkce.abandoned is absent
The auth.pkce.abandoned event is not implemented. An abandoned flow is one where auth.pkce.started fires but no auth.pkce.completed or auth.pkce.failed ever follows within the 10-minute authorization code TTL.
Implementing this requires the server to know which started flows never finished. Next.js server actions are stateless and ephemeral, there is no in-process mechanism to observe that a started event has not been followed by a completed event within 10 minutes. Detecting abandonment server-side would require a Redis store or database table with a background job scanning for uncompleted flows.
The deferred approach: the Athena analytics pipeline can compute abandonment post-hoc by querying auth.pkce.started events with no matching auth.pkce.completed or auth.pkce.failed within 10 minutes. This requires no new infrastructure and no background job. It is not implemented as a dedicated emitted event in the current cycle.
emitPkceCompleted called before Hydra confirms 200
If emitPkceCompleted is called before confirming a 200 response from Hydra, the auth.pkce.completed event is emitted for a flow that has not actually completed. This inflates the completion rate metric and corrupts the duration_ms baseline. Always confirm the token response is a success before calling emitPkceCompleted.
Dev traffic in production dashboards
Events emitted in development (e.g., local npm run dev) carry env: "development". The Athena dashboard filters all production panels to env === 'production' at query time. Dev events are not suppressed at emit time, they are excluded at query time. This means dev events appear in raw log queries but not in dashboard panels.
Security Considerations
- No PII in any event property. The
client_idis an OAuth2 client identifier, not a user identifier. - The
challenge_mismatchreason inauth.pkce.failedis the highest-severity security signal. A legitimate user'scode_verifieralways matches the storedcode_challenge. Anychallenge_mismatchevent from a non-dev environment is either a code interception attack or a bug, both require immediate investigation. - The
cookie_missingandmissing_verifierreasons useerror_code: "invalid_request"even though the failure occurs before the Hydra token endpoint is called. When correlatingauth.pkce.failedevents against Hydra access logs, filter byerror_source(noterror_code) to isolate pre-Hydra failures from Hydra-returned errors. Usingerror_codealone for correlation will produce false matches against actual Hydrainvalid_requestresponses. emitPkceFailedis exported for use by Hera callback routes only. Importing it into Site would bypass the per-app module boundary and could result in events with incorrectdomainvalues (thedomainis computed from Hera'sIS_IAMconfig, which may differ from Site's environment).
References
hera/src/lib/analytics.ts, base emitterhera/src/lib/auth.ts,emitPkceCompleted,emitPkceFailed,PkceCookiePayloadhera/docs/oauth2-pkce.md, PKCE S256 implementation guide (cookie spec, deployment order, Hydra client config)- hera#32, origin ticket