Alerts for cert expiry
Configure proactive notifications before certs expire
Caddy auto-renews certificates 30 days before expiry, but in failure modes (Let's Encrypt rate limit, DNS issue, ACME outage) renewal can fail silently. Set up monitoring to catch this.
Built-in: cert-expiry-check.yml
The platform repo ships a daily workflow:
# .github/workflows/cert-expiry-check.yml
on:
schedule:
- cron: "0 7 * * *" # Daily at 07:00 UTC
jobs:
check:
steps:
- name: Check cert expiry
run: |
for domain in ciam.example.com iam.example.com www.example.com; do
expiry=$(echo | openssl s_client -connect $domain:443 -servername $domain 2>/dev/null | openssl x509 -noout -enddate | cut -d= -f2)
days=$(( ($(date -d "$expiry" +%s) - $(date +%s)) / 86400 ))
if [ $days -lt 30 ]; then
# Open a GitHub issue
gh issue create --title "Cert expiring in $days days: $domain" \
--body "Domain: $domain\nExpires: $expiry"
fi
doneConfigure your apex domain in the workflow. The workflow opens a GitHub Issue at the 30/14/7-day windows. Tag yourself for notifications.
External uptime monitoring
UptimeRobot, StatusCake, Better Uptime, most include cert-expiry checks. Free tiers cover small deployments.
Configure them to monitor:
https://ciam.<your-domain>/.well-known/openid-configurationhttps://iam.<your-domain>/.well-known/openid-configurationhttps://<your-domain>/(Site)
These services alert via email, SMS, or Slack on:
- HTTP error (covers most outages).
- Cert expiry within configurable threshold.
- TLS handshake failure (covers cert issues before expiry).
Slack / Discord webhook
Integrate cert-expiry-check.yml with Slack:
- name: Notify Slack
if: env.DAYS_LEFT < 7
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "🚨 Olympus cert for ${{ env.DOMAIN }} expires in ${{ env.DAYS_LEFT }} days"
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}Internal check
If you'd rather check from inside the deployment:
# Cron on the VPS
*/30 * * * * /usr/local/bin/cert-check.sh
# cert-check.sh
#!/bin/bash
for d in ciam.example.com iam.example.com; do
expiry=$(podman exec olympus-caddy caddy fmt /data/caddy/certificates/.../$d/$d.json | jq -r .expires)
# ... compute days, alert via curl to webhook
doneThis catches issues earlier than the daily GitHub workflow.
Caddy's own logs
Caddy logs ACME-related events at INFO level. Watch:
podman compose logs caddy --since 1d | grep -iE "(error|fail|renew)"If you see renewal errors days before expiry, that's your early-warning system.
Multi-step alert
Recommended setup:
- 30 days: GitHub Issue (informational).
- 14 days: Slack message in #ops.
- 7 days: SMS / PagerDuty page.
Tune to your team's escalation needs.
After fixing expiry
After resolving the underlying issue (DNS, rate limit, etc.):
ssh prod 'podman exec olympus-caddy caddy reload --config /etc/caddy/Caddyfile'Caddy attempts renewal. Verify:
echo | openssl s_client -connect ciam.your-domain:443 -servername ciam.your-domain 2>/dev/null \
| openssl x509 -noout -datesnotAfter should be ~90 days in the future.