fix: critical bugs and security hardening #19

Merged
lerko merged 8 commits from fix/critical-bugs-security-hardening into main 2026-05-24 01:45:12 +00:00
Owner

Summary

  • Store: Replace panic() in generateToken() with proper error return. Check all json.Unmarshal errors instead of silently ignoring them. Add Close() to Store interface.
  • Alerts: Add context.Context to Provider.Send() so HTTP alerts respect timeout/cancellation. Log alert delivery failures instead of swallowing errors.
  • Security: Use crypto/subtle.ConstantTimeCompare for cluster secret checks (7 endpoints). Add http.MaxBytesReader (1MB) to all POST handlers.
  • Shutdown: Graceful shutdown for HTTP and SSH servers with 30s timeout. Close database on exit. Replace log.Fatalf in goroutines with log.Printf.

Test plan

  • go build ./... passes
  • go test ./... passes
  • go test -race ./... clean
  • Manual: start app, Ctrl+C, verify clean shutdown (no hanging)
  • Manual: send push heartbeat, verify still works
## Summary - **Store**: Replace `panic()` in `generateToken()` with proper error return. Check all `json.Unmarshal` errors instead of silently ignoring them. Add `Close()` to Store interface. - **Alerts**: Add `context.Context` to `Provider.Send()` so HTTP alerts respect timeout/cancellation. Log alert delivery failures instead of swallowing errors. - **Security**: Use `crypto/subtle.ConstantTimeCompare` for cluster secret checks (7 endpoints). Add `http.MaxBytesReader` (1MB) to all POST handlers. - **Shutdown**: Graceful shutdown for HTTP and SSH servers with 30s timeout. Close database on exit. Replace `log.Fatalf` in goroutines with `log.Printf`. ## Test plan - [x] `go build ./...` passes - [x] `go test ./...` passes - [x] `go test -race ./...` clean - [ ] Manual: start app, Ctrl+C, verify clean shutdown (no hanging) - [ ] Manual: send push heartbeat, verify still works
lerko added 4 commits 2026-05-24 00:00:00 +00:00
generateToken() now returns (string, error) instead of panicking on
crypto/rand failure. All json.Unmarshal calls for alert settings now
check and propagate errors instead of silently ignoring them.

Adds Close() to Store interface for graceful shutdown support.
Skips malformed notification entries during Kuma import.
Provider.Send now accepts context.Context for timeout/cancellation.
HTTPProvider and NtfyProvider use NewRequestWithContext so HTTP alerts
respect the 30s deadline. triggerAlert logs send failures and config
load errors instead of silently swallowing them.
Replace string equality checks on cluster secret with
crypto/subtle.ConstantTimeCompare to prevent timing attacks.
Add http.MaxBytesReader (1MB) to all POST endpoints that decode
JSON bodies. Change Start() to return *http.Server for graceful
shutdown support. Replace log.Fatalf with log.Printf in HTTP
server goroutine.
HTTP and SSH servers now shut down cleanly on SIGINT/SIGTERM with a
30s timeout. Database connection closed via defer. Replaced log.Fatalf
in SSH goroutine with log.Printf + ErrServerClosed check to prevent
unclean process exits.
lerko added 1 commit 2026-05-24 00:05:34 +00:00
Monitors with the same interval no longer fire simultaneously.
Each tick adds up to 10% random jitter. Initial checks stagger
over 0-3s to avoid thundering herd on startup.
lerko added 1 commit 2026-05-24 01:06:30 +00:00
55 tests covering state machine transitions, heartbeat handling, push
deadline checks, group aggregation, history recording, probe aggregation,
log management, state management, and concurrency safety.

Checker tests cover HTTP (via httptest), port (via net.Listen),
isCodeAccepted ranges, and siteTimeout defaults. Ping and DNS
checkers skipped (need ICMP privileges and DNS server).

Coverage: 64.2% overall, 100% on handleStatusChange, triggerAlert,
checkPush, recordCheck, and AggregateStatus.
lerko added 1 commit 2026-05-24 01:10:35 +00:00
24 tests covering push heartbeat, health check, backup export/import,
probe registration/assignments/results, and status page endpoints.
Tests verify auth enforcement (constant-time secret), method validation,
input validation, token stripping on status JSON, and maintenance
window overrides.
lerko added 1 commit 2026-05-24 01:23:29 +00:00
15 tests covering leader/follower mode selection, follower failover
after 3 consecutive health check failures, recovery when leader returns,
secret header propagation, context cancellation, probe registration,
assignment fetching, concurrent check execution (verifies 10-semaphore
cap), and result reporting.
lerko merged commit da61ce0f88 into main 2026-05-24 01:45:12 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: lerkolabs/uptop#19