Build an "Operations Dashboard"
At-a-glance Olympus health metrics for ops
For ops monitoring, a single-page dashboard showing key metrics. Doesn't replace Grafana, provides quick context.
Widgets
Auth volume
[Login attempts (last 24h): 12,345]
[Login success rate: 99.2%]
[Registration: 234]
[Recovery: 12]Big numbers. Trends over yesterday.
Active users
[Active sessions: 1,234]
[Active in last 5 min: 234]
[New today: 45]Issues
⚠ 2 services degraded
⚠ Disk 78% full
✓ All other checks greenRecent incidents
[Last incident: 3 days ago, 30 min outage]
[Resolved: 2026-05-10 ✓]Latency
[Login p50: 45 ms]
[Login p99: 180 ms]
[Token introspect p99: 12 ms]Implementation
Query Prometheus + render:
import { getMetric } from "@/lib/prom";
export default async function Dashboard() {
const [loginRate, errorRate, latency, activeSessions, diskUsage] = await Promise.all([
getMetric("sum(rate(kratos_login_total[24h]))"),
getMetric("sum(rate(kratos_login_total{outcome='failure'}[1h])) / sum(rate(kratos_login_total[1h]))"),
getMetric("histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{path='/login'}[5m])) by (le))"),
getMetric("kratos_active_sessions"),
getMetric("node_filesystem_avail_bytes / node_filesystem_size_bytes"),
]);
return (
<Grid>
<Metric label="24h logins" value={formatNumber(loginRate * 86400)} />
<Metric label="Error rate" value={(errorRate * 100).toFixed(2) + "%"} trend={errorRate < 0.01 ? "good" : "bad"} />
<Metric label="Login p99" value={`${(latency * 1000).toFixed(0)}ms`} />
<Metric label="Active sessions" value={formatNumber(activeSessions)} />
<Metric label="Disk free" value={`${(diskUsage * 100).toFixed(0)}%`} trend={diskUsage > 0.2 ? "good" : "bad"} />
</Grid>
);
}Color coding
Visual signals:
- Green: all good.
- Yellow: warning threshold.
- Red: alert threshold.
function Metric({ label, value, trend }) {
const color = trend === "good" ? "green" : trend === "warn" ? "yellow" : "red";
return (
<Card className={`metric metric-${color}`}>
<CardLabel>{label}</CardLabel>
<CardValue>{value}</CardValue>
</Card>
);
}Color signals at-a-glance.
Sparklines
Beyond single numbers, mini-charts:
<Card>
<CardLabel>Logins (last hour)</CardLabel>
<CardValue>{currentLoginRate}/s</CardValue>
<Sparkline data={historicalRate} />
</Card>Trend visible. Spikes obvious.
Refresh
useEffect(() => {
const interval = setInterval(() => {
router.refresh();
}, 30_000); // 30 sec
return () => clearInterval(interval);
}, []);Live updates without manual refresh.
Linking to deep-dive
Each widget links to deeper view:
<Metric
label="Error rate"
value="0.12%"
href="/admin/audit?q=event:login_failed"
/>Click → audit log filtered to failed logins.
Status of services
{services.map(s => (
<ServiceStatus key={s.name}>
{s.name}: <StatusBadge status={s.status} />
</ServiceStatus>
))}const services = await Promise.all([
checkService("kratos", `${KRATOS_URL}/health/ready`),
checkService("hydra", `${HYDRA_URL}/health/ready`),
checkService("postgres", "...", checkPostgres),
checkService("redis", "...", checkRedis),
]);
async function checkService(name, url, method = "http") {
try {
const start = Date.now();
const res = await fetch(url, { signal: AbortSignal.timeout(5000) });
const latency = Date.now() - start;
return { name, status: res.ok ? "up" : "down", latency };
} catch (err) {
return { name, status: "down", error: err.message };
}
}Live ping every refresh.
Recent activity
Stream recent audit events:
const recent = await db`
SELECT created_at, event_type, identity_id
FROM security_audit
ORDER BY created_at DESC
LIMIT 20
`;
<ActivityFeed>
{recent.map(e => (
<ActivityItem key={e.id} time={e.created_at} event={e.event_type} user={e.identity_id} />
))}
</ActivityFeed>What's happening right now.
Anomaly highlighting
If a metric is unusual, call it out:
{loginRate < expected * 0.5 && (
<Alert>
Login rate is 50% below typical. Investigate.
</Alert>
)}Compute typical via 7-day rolling average.
Access control
Only ops / admin can see this:
if (!session.identity.traits.role?.includes("admin")) {
return redirect("/forbidden");
}Some metrics are sensitive (user count). Don't expose to all.
Mobile-friendly
Ops may need this from phone during incident. Responsive:
<Grid columns={{ base: 1, md: 2, lg: 3 }}>
<Metric ... />
</Grid>Single column on phone; grid on desktop.
Don't replace dashboards
This is a quick-glance. For deep analysis: Grafana, Datadog, etc.
This dashboard answers: "are things broken right now?"
Grafana answers: "why?"