Every Store interface method (except Close) now takes context.Context
as first parameter. All 54 db.Query/Exec/QueryRow calls in SQLStore
replaced with their *Context variants. DB operations now respect
cancellation and deadlines.
Context sources by caller:
- Engine dbWriter/poll/pruner: engine ctx from Start()
- HTTP handlers: r.Context()
- config.Apply/Export: caller-provided ctx
- TUI/main.go init: context.Background()
RunCheck and all sub-checks (HTTP/ping/port/DNS) accept parent ctx.
HTTP checks now inherit shutdown cancellation instead of rooting in
context.Background(). dbWrite.exec takes ctx so the writer goroutine
can cancel stuck DB operations.
DeleteSite/ImportData use BeginTx(ctx) instead of Begin().
1. Group auto-pause trap: remove the one-way Paused=true mutation
from checkGroup — monitorRoutine skipped paused groups, so they
could never re-evaluate or auto-unpause.
2. Retry logic: apply MaxRetries to all →DOWN transitions, not just
UP→DOWN. New monitors (PENDING) no longer alert on first transient
failure when retries are configured.
3. Shutdown drain hole: track checker goroutines with checkerWG so
Stop() waits for in-flight checks before draining the write queue.
Final drainWrites() catches any writes enqueued after the writer's
own drain.
4. Probe-ingest writer bypass: route SaveCheckFromNode through the
engine's serialized dbWriter instead of writing directly to the
store from the HTTP handler.
5. Dead-probe expiry: expire stale probe results (>3× site interval)
before aggregation so a dead probe can't poison status forever.
Also clean probeResults in RemoveSite.
6. Maintenance-cache N+1: replace per-check DB query with a
fully-resolved in-memory cache refreshed every poll cycle. One
GetActiveMaintenanceWindows() call instead of N IsMonitorInMaintenance.
ImportData now wipes check_history, state_changes, and alert_health
so re-inserted IDs don't inherit stale history from prior occupants.
Four fixes hardening the secrets and rate-limit posture a prior audit
left or that regressed:
X-Forwarded-For rate-limit bypass + memory DoS (ratelimit.go): clientIP
returned the raw XFF header, so an attacker rotating it minted unlimited
distinct limiter keys — never tripping the limit and growing the visitors
map without bound. XFF is now honored only when the immediate peer is a
configured trusted proxy (UPTOP_TRUSTED_PROXIES, CIDRs or bare IPs), using
the right-most non-trusted hop; otherwise the key is the real RemoteAddr.
The visitors map is bounded with LRU eviction as defense in depth.
Export redaction denylist -> per-provider allowlist (server.go): the old
six-key denylist missed the actual credentials — the webhook URL for
discord/slack/webhook/ntfy/gotify and api_key for opsgenie — exporting
them in the clear. redactByProvider keeps only known-safe keys per
provider type and redacts everything else, so unknown/new keys fail safe.
ImportData plaintext secrets (sqlstore.go): import inserted raw
json.Marshal(settings), bypassing the encryption AddAlert/UpdateAlert
use. It now routes through marshalSettings, so a restore with
UPTOP_ENCRYPTION_KEY set stores enc:-prefixed ciphertext, not plaintext.
Alert error credential leak (alert.go): provider Send returned the raw
*url.Error, whose URL carries the secret (Telegram bot token in the path,
webhook secrets in the URL); it was persisted to AlertHealth.LastError
and shown in the TUI. sanitizeError strips the URL, keeping the operation
and underlying cause.
Tests cover trusted/untrusted XFF + spoofed-bypass + map bound, the
allowlist per provider, encrypted-on-import round-trip, and URL-stripped
errors. README documents UPTOP_TRUSTED_PROXIES. Full suite green under
-race; golangci-lint clean.
Every check spawned `go e.db.Save*(...)` with the error discarded: a
fire-and-forget goroutine per log line, check, state change, and alert
health update. SaveLog ran a full-table prune DELETE on every insert and
SaveCheck a COUNT + conditional prune on every check, so the hot path
amplified each write into several statements. Nothing tracked these
goroutines, so at shutdown they raced the store's Close() — writes to a
closing DB, silently swallowed.
Introduce a single writer goroutine that drains a buffered channel of
typed dbWrite values (log/check/state-change/alert-health). Writes are
enqueued non-blocking; a saturated queue drops and notes it in the
in-memory log rather than blocking the check loop. Write errors are now
logged instead of discarded. Retention moves off the hot path: SaveLog
and SaveCheck become plain INSERTs, and PruneLogs/PruneCheckHistory/
PruneStateChanges run on a 10-minute timer inside the writer (single
keep-newest-N-per-site pass via a window function). state_changes was
previously never pruned — now bounded.
Add Engine.Stop(): cancels the engine's context, then waits for the
writer to drain every buffered write before returning. main wires it in
before the deferred store Close() so no write races a closed DB.
SQLite gains busy_timeout=5000 and synchronous=NORMAL, applied via the
DSN so every pooled connection inherits them (a post-open PRAGMA only
touches one connection); WAL moves to the DSN too. :memory: test DBs are
left as-is.
Tests: writer drains on Stop, Stop is idempotent, and the prune queries
keep newest-N per site / N logs on real SQLite. Full suite green under
-race.
Background goroutine runs every 15 minutes, deletes maintenance windows
that expired beyond the retention period (default 7 days). Configurable
via UPTOP_MAINT_RETENTION env var (Go duration format).
Closes#72
DeleteSite now removes maintenance_windows, check_history, and
state_changes for the site within a transaction before deleting
the site itself. Prevents orphaned rows.
Closes#71
Move Go module from gitea.lerkolabs.com/lerko/uptop to
gitea.lerkolabs.com/lerkolabs/uptop. Updates all imports,
go.mod, goreleaser owner, and README links.
Replace all monitor package-level mutable state with Engine struct.
All state (liveState, logStore, histories, tokenIndex, HTTP clients)
is now encapsulated in Engine, created via NewEngine(store).
Key changes:
- Engine struct holds all monitor state with proper mutex protection
- Engine.Start(ctx) and monitorRoutine respect context cancellation
for graceful shutdown — no more leaked goroutines
- cluster.runFollowerLoop also respects context for clean exit
- Token index (map[string]int) for O(1) push heartbeat lookup,
replacing O(n) linear scan through LiveState
- UpdateSiteConfig preserves 8 runtime fields instead of copying
17 config fields individually
- triggerAlert goroutines get 30s timeout context
- All consumers (TUI, server, cluster, main) receive *Engine via
constructor/parameter — no package-level state access
- main.go creates context.WithCancel, passes to engine and cluster
First test suite: 12 tests across store and alert packages
- Store: CRUD for sites/alerts/users, push token generation,
import/export round-trip, check history persistence
- Alert: Discord/Slack/Webhook payload format, HTTP 4xx error
propagation, Ntfy headers, unknown provider returns nil