Redis for session caching
Cache whoami / token introspection results
whoami and token introspection are called on most API requests. Each round-trip to Kratos/Hydra adds ~30-100ms. Redis can cache these for ~60s and dramatically reduce load.
Architecture
Your API ──► Redis cache ──► Kratos/Hydra (on miss)
↑
└── 60s TTLCache hit: response in 1ms. Cache miss: full lookup in 30-100ms, populate cache.
Setup
Add Redis to your stack:
# docker-compose.yml
redis:
image: redis:7-alpine
restart: unless-stopped
command: ["redis-server", "--maxmemory", "256mb", "--maxmemory-policy", "allkeys-lru"]
volumes: ["redis-data:/data"]
volumes:
redis-data:Caching whoami
import { createClient } from "redis";
const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();
async function whoamiCached(cookie: string) {
const key = `whoami:${hash(cookie)}`; // hash the cookie, don't store raw
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
const session = await kratos.toSession({ cookie });
if (!session) return null;
await redis.setEx(key, 60, JSON.stringify(session));
return session;
}60s TTL: balance freshness vs hit rate. For sensitive endpoints, 10s.
Caching introspection
async function introspectCached(token: string) {
const key = `intro:${hash(token)}`;
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
const intro = await hydra.introspectToken({ token });
if (!intro.active) {
// Cache negative result briefly
await redis.setEx(key, 5, JSON.stringify({ active: false }));
return intro;
}
// Cache positive result up to token's exp
const ttl = Math.min(60, intro.exp - Math.floor(Date.now() / 1000));
await redis.setEx(key, ttl, JSON.stringify(intro));
return intro;
}For tokens about to expire, short cache. For tokens with plenty of life, 60s.
Cache invalidation
When user logs out or admin revokes:
async function invalidateSession(cookieOrToken: string) {
const cookieKey = `whoami:${hash(cookieOrToken)}`;
const introKey = `intro:${hash(cookieOrToken)}`;
await redis.del(cookieKey);
await redis.del(introKey);
}Hook into logout / revoke flows.
For broader invalidation (revoke all sessions for a user):
async function invalidateUser(userId: string) {
// Tag cached entries with user
const keys = await redis.sMembers(`sessions_by_user:${userId}`);
for (const k of keys) await redis.del(k);
await redis.del(`sessions_by_user:${userId}`);
}At cache-set time:
await redis.setEx(key, 60, JSON.stringify(session));
await redis.sAdd(`sessions_by_user:${session.identity.id}`, key);Stampede prevention
When many requests come in simultaneously for the same key, all miss simultaneously. They all stampede to Kratos.
Single-flight pattern:
const inFlight = new Map<string, Promise<any>>();
async function whoamiCached(cookie: string) {
const key = `whoami:${hash(cookie)}`;
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
if (inFlight.has(key)) return inFlight.get(key);
const promise = kratos.toSession({ cookie }).then(async (session) => {
if (session) await redis.setEx(key, 60, JSON.stringify(session));
return session;
});
inFlight.set(key, promise);
promise.finally(() => inFlight.delete(key));
return promise;
}Only one Kratos call even with 100 concurrent requests.
TTL trade-offs
| TTL | Hit rate | Staleness |
|---|---|---|
| 5s | Low (~30%) | < 5s |
| 60s | High (~95%) | < 60s |
| 300s | Very high (~99%) | < 5 min |
For most: 60s is the sweet spot. Sensitive: 10s.
Memory usage
Cached session: ~1 KB. Redis with 100k unique sessions in cache: ~100 MB.
Configure maxmemory:
maxmemory 256mb
maxmemory-policy allkeys-lru # evict LRU when fullLRU policy: keeps active sessions, evicts inactive. Good fit.
High availability
Single Redis = SPOF. If Redis is down, falls back to direct Kratos/Hydra calls (slow but works).
async function whoamiCached(cookie: string) {
try {
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
} catch (err) {
log.warn("redis_unavailable", err);
// Fall through to non-cached lookup
}
return kratos.toSession({ cookie });
}Resilient, Redis outage degrades performance but doesn't break auth.
For HA Redis: sentinel (3+ instances) or Redis Cluster. Overkill for most Olympus deployments.
Pub/sub for revocation
For cross-region invalidation: Redis pub/sub.
// On revoke:
await redis.publish("invalidations", JSON.stringify({ userId, type: "session" }));
// All API instances subscribe:
redis.subscribe("invalidations", (message) => {
const { userId, type } = JSON.parse(message);
invalidateUser(userId);
});Within ms, every API instance clears caches for that user.
Don't cache
- Failed logins (we want to count each).
- Errors (might be transient).
- Tokens during issuance (cache window <1s, not worth it).
Logging cache effectiveness
metrics.increment("auth.cache.hit");
// or
metrics.increment("auth.cache.miss");Calculate hit rate. Aim for > 80%.
Low hit rate = high cache cost, low benefit. Adjust TTL or check that calls share user context.