fix(security): pre-release security batch #121

Merged
lerko merged 4 commits from fix/prerelease-security-batch into main 2026-06-12 17:13:26 +00:00
Owner

Final security items before v1.0.0 tag. Cleans section B of review-findings.local.md.

Changes

1. DNS-rebind TOCTOU on ping/port checks (checker.go)
Pre-check resolved+validated the target, then runPingCheck/runPortCheck re-resolved by hostname — DNS rebind between lookups bypassed SSRF guard. Fix: resolve once, pin validated IP, pass to check functions. HTTP unaffected (SafeDialContext validates at dial time).

2. API import no longer replaces user accounts (server.go, sqlstore.go, dialects)
Cluster-secret holder could POST a backup replacing all admin accounts via /api/backup/import. Also, Kuma imports (zero users) unconditionally wiped the users table — locked out all accounts until restart. Fix: API handlers strip data.Users; store only wipes+replaces users when explicitly provided (CLI restore). New ImportWipeUsers dialect method.

3. Email send respects context deadline (alert.go)
smtp.SendMail ignores context — blackholed SMTP hung alert goroutines for OS TCP timeout (minutes). Replaced with sendMailContext: dials with ctx deadline, sets connection deadlines, handles STARTTLS/AUTH. 200ms timeout test validates.

4. Split-brain failover docs (docs/clustering.md)
Honest documentation: no leader fencing, both-active during partitions, duplicate alerts expected, ~15s takeover, converges on heal.

Test plan

  • go test -race ./... — all pass
  • golangci-lint run ./... — 0 issues
  • New tests: pinned-IP port check, nil-users import preservation, SMTP timeout, SMTP happy path
Final security items before v1.0.0 tag. Cleans section B of review-findings.local.md. ## Changes **1. DNS-rebind TOCTOU on ping/port checks** (`checker.go`) Pre-check resolved+validated the target, then `runPingCheck`/`runPortCheck` re-resolved by hostname — DNS rebind between lookups bypassed SSRF guard. Fix: resolve once, pin validated IP, pass to check functions. HTTP unaffected (SafeDialContext validates at dial time). **2. API import no longer replaces user accounts** (`server.go`, `sqlstore.go`, dialects) Cluster-secret holder could POST a backup replacing all admin accounts via `/api/backup/import`. Also, Kuma imports (zero users) unconditionally wiped the users table — locked out all accounts until restart. Fix: API handlers strip `data.Users`; store only wipes+replaces users when explicitly provided (CLI restore). New `ImportWipeUsers` dialect method. **3. Email send respects context deadline** (`alert.go`) `smtp.SendMail` ignores context — blackholed SMTP hung alert goroutines for OS TCP timeout (minutes). Replaced with `sendMailContext`: dials with ctx deadline, sets connection deadlines, handles STARTTLS/AUTH. 200ms timeout test validates. **4. Split-brain failover docs** (`docs/clustering.md`) Honest documentation: no leader fencing, both-active during partitions, duplicate alerts expected, ~15s takeover, converges on heal. ## Test plan - [x] `go test -race ./...` — all pass - [x] `golangci-lint run ./...` — 0 issues - [x] New tests: pinned-IP port check, nil-users import preservation, SMTP timeout, SMTP happy path
lerko added 4 commits 2026-06-12 16:48:12 +00:00
Pre-check resolved and validated the target IP, then runPingCheck and
runPortCheck re-resolved by hostname — a DNS rebind between the two
lookups could redirect to a private IP, bypassing the SSRF guard.

Resolve once in RunCheck, pin the validated IP, and pass it down:
- runPingCheck: SetIPAddr with pinned IP (skips internal resolve)
- runPortCheck: dial pinned IP literal instead of hostname

HTTP checks are unaffected (SafeDialContext resolves+validates at
dial time). DNS checks validate the server address, not the target.
Cluster-secret holder could POST a backup with their own admin key to
/api/backup/import, replacing all users — privilege escalation from
cluster-auth to admin. Also, Kuma imports produced zero users but
ImportWipe unconditionally deleted the users table — locking out all
accounts until restart reseeded UPTOP_ADMIN_KEY.

- Server handlers strip data.Users (set nil) before calling ImportData
- ImportData only wipes+replaces users when data.Users != nil
- New ImportWipeUsers dialect method separates user wipe from data wipe
- CLI restore (main.go) unchanged — full import still replaces users
smtp.SendMail ignores context entirely — a blackholed SMTP server
hangs the alert goroutine for the OS TCP timeout (minutes), while the
30s context from the engine does nothing.

Replace with sendMailContext: dials with ctx deadline, sets connection
deadlines, handles STARTTLS and AUTH when advertised. Behavioral
parity with smtp.SendMail but cancellation works throughout.
docs(cluster): document split-brain limitation in failover
CI / test (pull_request) Successful in 1m56s
CI / lint (pull_request) Successful in 1m16s
CI / vulncheck (pull_request) Successful in 1m1s
7bf278e538
No leader fencing exists — during a network partition both nodes run
checks and fire alerts independently. Document the behavior honestly:
duplicate alerts, doubled history, ~15s takeover, converges on heal.
lerko merged commit 7bf278e538 into main 2026-06-12 17:13:26 +00:00
lerko deleted branch fix/prerelease-security-batch 2026-06-12 17:13:26 +00:00
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: lerkolabs/uptop#121