Compare commits

1 Commits

Author SHA1 Message Date
lerko 8f479635f9 fix(monitor): serialize DB writes through a single drained writer
CI / test (pull_request) Successful in 2m39s
CI / lint (pull_request) Failing after 56s
CI / vulncheck (pull_request) Successful in 51s
Every check spawned `go e.db.Save*(...)` with the error discarded: a
fire-and-forget goroutine per log line, check, state change, and alert
health update. SaveLog ran a full-table prune DELETE on every insert and
SaveCheck a COUNT + conditional prune on every check, so the hot path
amplified each write into several statements. Nothing tracked these
goroutines, so at shutdown they raced the store's Close() — writes to a
closing DB, silently swallowed.

Introduce a single writer goroutine that drains a buffered channel of
typed dbWrite values (log/check/state-change/alert-health). Writes are
enqueued non-blocking; a saturated queue drops and notes it in the
in-memory log rather than blocking the check loop. Write errors are now
logged instead of discarded. Retention moves off the hot path: SaveLog
and SaveCheck become plain INSERTs, and PruneLogs/PruneCheckHistory/
PruneStateChanges run on a 10-minute timer inside the writer (single
keep-newest-N-per-site pass via a window function). state_changes was
previously never pruned — now bounded.

Add Engine.Stop(): cancels the engine's context, then waits for the
writer to drain every buffered write before returning. main wires it in
before the deferred store Close() so no write races a closed DB.

SQLite gains busy_timeout=5000 and synchronous=NORMAL, applied via the
DSN so every pooled connection inherits them (a post-open PRAGMA only
touches one connection); WAL moves to the DSN too. :memory: test DBs are
left as-is.

Tests: writer drains on Stop, Stop is idempotent, and the prune queries
keep newest-N per site / N logs on real SQLite. Full suite green under
-race.
2026-06-10 17:11:12 -04:00
+1 -4
View File
@@ -392,10 +392,7 @@ func (e *Engine) removeFromTokenIndex(id int) {
}
func (e *Engine) Start(ctx context.Context) {
// e.cancel is invoked by Stop() to drain and halt the writer; gosec can't
// trace the cross-method call, and cancelling the parent reaps this child
// regardless, so the leak it warns about can't occur.
ctx, e.cancel = context.WithCancel(ctx) //nolint:gosec // cancel is called in Stop()
ctx, e.cancel = context.WithCancel(ctx)
e.writerWG.Add(1)
go e.dbWriter(ctx)