8b39d4c1a1
Every check spawned `go e.db.Save*(...)` with the error discarded: a fire-and-forget goroutine per log line, check, state change, and alert health update. SaveLog ran a full-table prune DELETE on every insert and SaveCheck a COUNT + conditional prune on every check, so the hot path amplified each write into several statements. Nothing tracked these goroutines, so at shutdown they raced the store's Close() — writes to a closing DB, silently swallowed. Introduce a single writer goroutine that drains a buffered channel of typed dbWrite values (log/check/state-change/alert-health). Writes are enqueued non-blocking; a saturated queue drops and notes it in the in-memory log rather than blocking the check loop. Write errors are now logged instead of discarded. Retention moves off the hot path: SaveLog and SaveCheck become plain INSERTs, and PruneLogs/PruneCheckHistory/ PruneStateChanges run on a 10-minute timer inside the writer (single keep-newest-N-per-site pass via a window function). state_changes was previously never pruned — now bounded. Add Engine.Stop(): cancels the engine's context, then waits for the writer to drain every buffered write before returning. main wires it in before the deferred store Close() so no write races a closed DB. SQLite gains busy_timeout=5000 and synchronous=NORMAL, applied via the DSN so every pooled connection inherits them (a post-open PRAGMA only touches one connection); WAL moves to the DSN too. :memory: test DBs are left as-is. Tests: writer drains on Stop, Stop is idempotent, and the prune queries keep newest-N per site / N logs on real SQLite. Full suite green under -race.
47 lines
1.4 KiB
Go
47 lines
1.4 KiB
Go
package monitor
|
|
|
|
import (
|
|
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
|
|
"gitea.lerkolabs.com/lerkolabs/uptop/internal/store"
|
|
)
|
|
|
|
// dbWrite is a single unit of deferred persistence. The engine enqueues these
|
|
// onto a buffered channel; a single writer goroutine drains and executes them,
|
|
// serializing all writes through one connection and surfacing errors instead of
|
|
// discarding them. desc names the write for diagnostics on drop/failure.
|
|
type dbWrite interface {
|
|
exec(s store.Store) error
|
|
desc() string
|
|
}
|
|
|
|
type writeLog struct{ message string }
|
|
|
|
func (w writeLog) exec(s store.Store) error { return s.SaveLog(w.message) }
|
|
func (w writeLog) desc() string { return "log" }
|
|
|
|
type writeCheck struct {
|
|
siteID int
|
|
latencyNs int64
|
|
isUp bool
|
|
}
|
|
|
|
func (w writeCheck) exec(s store.Store) error { return s.SaveCheck(w.siteID, w.latencyNs, w.isUp) }
|
|
func (w writeCheck) desc() string { return "check" }
|
|
|
|
type writeStateChange struct {
|
|
siteID int
|
|
fromStatus string
|
|
toStatus string
|
|
reason string
|
|
}
|
|
|
|
func (w writeStateChange) exec(s store.Store) error {
|
|
return s.SaveStateChange(w.siteID, w.fromStatus, w.toStatus, w.reason)
|
|
}
|
|
func (w writeStateChange) desc() string { return "state-change" }
|
|
|
|
type writeAlertHealth struct{ rec models.AlertHealthRecord }
|
|
|
|
func (w writeAlertHealth) exec(s store.Store) error { return s.SaveAlertHealth(w.rec) }
|
|
func (w writeAlertHealth) desc() string { return "alert-health" }
|