feat/next: alert providers, prometheus metrics, core refactors #6

Merged
lerko merged 10 commits from feat/next into develop 2026-05-15 19:35:12 +00:00
Owner

Summary

  • Core refactors: store dialect unification, global singleton removal, engine encapsulation, error returns on Store interface, shared table rendering, shared HTTPProvider
  • 4 new alert providers: Telegram, PagerDuty, Pushover, Gotify (5 → 9 total)
  • Prometheus /metrics endpoint: zero-dependency text format, all monitor state
  • Correctness fixes: graceful shutdown, cursor bounds, robustness across subsystems

Test plan

  • go test ./...
  • TUI navigation, monitor CRUD, alert CRUD
  • New alert provider forms (Telegram, PagerDuty, Pushover, Gotify)
  • Curl /metrics for Prometheus output
  • Status page still renders
## Summary - **Core refactors**: store dialect unification, global singleton removal, engine encapsulation, error returns on Store interface, shared table rendering, shared HTTPProvider - **4 new alert providers**: Telegram, PagerDuty, Pushover, Gotify (5 → 9 total) - **Prometheus `/metrics` endpoint**: zero-dependency text format, all monitor state - **Correctness fixes**: graceful shutdown, cursor bounds, robustness across subsystems ## Test plan - [x] `go test ./...` - [x] TUI navigation, monitor CRUD, alert CRUD - [x] New alert provider forms (Telegram, PagerDuty, Pushover, Gotify) - [x] Curl `/metrics` for Prometheus output - [x] Status page still renders
lerko added 10 commits 2026-05-15 19:34:26 +00:00
- Move status page template to package-level template.Must (panic on
  parse error at init instead of nil deref at runtime)
- Fix XSS in import error responses (log detail server-side, return
  generic message to client)
- Handle ListenAndServe errors in HTTP and SSH servers
- Use defer resp.Body.Close() in all alert providers, check
  json.Marshal errors
- Share HTTP clients across checks instead of creating per-request
- Use http.NewRequestWithContext for per-site timeout control
- Support HTTP method field (was always GET despite DB storing method)
- Implement AcceptedCodes validation (was hardcoded >= 400 despite DB
  storing accepted code ranges)
- Add defer tx.Rollback() to ImportData for transaction safety
Extract shared SQLStore with Dialect interface for the ~5% that
differs between backends (DDL, placeholders, sequence resets).

- New dialect.go: Dialect interface + placeholder rewriter (? → $N)
- New sqlstore.go: single implementation of all 19 Store methods
- sqlite.go: reduced from 286 to 83 lines (SQLiteDialect only)
- postgres.go: reduced from 266 to 78 lines (PostgresDialect only)
- main.go: use NewSQLiteStore/NewPostgresStore constructors

Zero CRUD logic duplication. Every future schema change written once.
Every Store method now returns an error. Callers handle errors
gracefully — TUI logs to event log, server returns HTTP 500,
monitor engine logs and retries. All rows.Scan() errors are now
checked in sqlstore.go instead of silently appending corrupt data.

- GetSites, GetAllAlerts, GetAllUsers return ([]T, error)
- GetAlert returns (AlertConfig, error) instead of (AlertConfig, bool)
- AddSite, UpdateSite, DeleteSite, etc. all return error
- SaveCheck, LoadAllHistory, ExportData return error
- ~25 caller sites updated across tui, server, monitor, main
Remove store.Get()/SetGlobal()/Current. Store is now passed explicitly
to all consumers via constructor parameters and function arguments.

- TUI Model holds store field, set via InitialModel(isAdmin, store)
- monitor.StartEngine(s) and InitHistoryFromStore(s) accept store
- server.Start(cfg, s) closes over store in HTTP handlers
- main.go threads store to SSH server, TUI, monitor, server
- isKeyAllowed receives store as parameter

No more hidden dependency on package-level mutable state in store pkg.
Monitor package still uses package-level state (LiveState, etc.) — will
be encapsulated into Engine struct in Phase 7.
Discord, Slack, and Webhook providers now use a single HTTPProvider
struct with a PayloadFunc for the only part that differs. Centralizes
response body handling and adds HTTP status code checking (4xx/5xx
now return errors instead of being silently ignored).

Email and Ntfy keep separate implementations (different protocols).
Adding a new HTTP-based alert provider is now a one-line PayloadFunc.
- New table_helpers.go with renderTable() and shared styles
- Remove 4 duplicated style blocks (header/cell/selected/border)
  from tab_alerts.go and tab_users.go
- All 3 tab views now use renderTable() for offset/end calc,
  selected row highlighting, and table construction
- Sites tab keeps siteGroupStyle via StyleOverride callback
- Clamp cursor to list length at end of refreshData() to prevent
  index-out-of-bounds after concurrent list changes
- Fix off-by-one in tab click handler (i <= maxTabs → i < tabCount)
Replace all monitor package-level mutable state with Engine struct.
All state (liveState, logStore, histories, tokenIndex, HTTP clients)
is now encapsulated in Engine, created via NewEngine(store).

Key changes:
- Engine struct holds all monitor state with proper mutex protection
- Engine.Start(ctx) and monitorRoutine respect context cancellation
  for graceful shutdown — no more leaked goroutines
- cluster.runFollowerLoop also respects context for clean exit
- Token index (map[string]int) for O(1) push heartbeat lookup,
  replacing O(n) linear scan through LiveState
- UpdateSiteConfig preserves 8 runtime fields instead of copying
  17 config fields individually
- triggerAlert goroutines get 30s timeout context
- All consumers (TUI, server, cluster, main) receive *Engine via
  constructor/parameter — no package-level state access
- main.go creates context.WithCancel, passes to engine and cluster

First test suite: 12 tests across store and alert packages
- Store: CRUD for sites/alerts/users, push token generation,
  import/export round-trip, check history persistence
- Alert: Discord/Slack/Webhook payload format, HTTP 4xx error
  propagation, Ntfy headers, unknown provider returns nil
Expand alert provider count from 5 to 9. All new providers use
the shared HTTPProvider with closure-based payload functions.
Includes TUI form support and tests for each provider.
Zero-dependency Prometheus text exposition format. Exposes monitor
up/down, latency, status code, check timestamps, pause state,
SSL cert expiry, and check counters — all from in-memory state.
Reviewed-on: lerko/uptime#5
lerko merged commit 4ebba64ba1 into develop 2026-05-15 19:35:12 +00:00
lerko deleted branch feat/next 2026-05-15 19:35:14 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: lerkolabs/uptop#6