No leader fencing exists — during a network partition both nodes run
checks and fire alerts independently. Document the behavior honestly:
duplicate alerts, doubled history, ~15s takeover, converges on heal.
smtp.SendMail ignores context entirely — a blackholed SMTP server
hangs the alert goroutine for the OS TCP timeout (minutes), while the
30s context from the engine does nothing.
Replace with sendMailContext: dials with ctx deadline, sets connection
deadlines, handles STARTTLS and AUTH when advertised. Behavioral
parity with smtp.SendMail but cancellation works throughout.
Cluster-secret holder could POST a backup with their own admin key to
/api/backup/import, replacing all users — privilege escalation from
cluster-auth to admin. Also, Kuma imports produced zero users but
ImportWipe unconditionally deleted the users table — locking out all
accounts until restart reseeded UPTOP_ADMIN_KEY.
- Server handlers strip data.Users (set nil) before calling ImportData
- ImportData only wipes+replaces users when data.Users != nil
- New ImportWipeUsers dialect method separates user wipe from data wipe
- CLI restore (main.go) unchanged — full import still replaces users
Pre-check resolved and validated the target IP, then runPingCheck and
runPortCheck re-resolved by hostname — a DNS rebind between the two
lookups could redirect to a private IP, bypassing the SSRF guard.
Resolve once in RunCheck, pin the validated IP, and pass it down:
- runPingCheck: SetIPAddr with pinned IP (skips internal resolve)
- runPortCheck: dial pinned IP literal instead of hostname
HTTP checks are unaffected (SafeDialContext resolves+validates at
dial time). DNS checks validate the server address, not the target.