feat: alert providers, prometheus metrics, and core refactors #4

Closed
lerko wants to merge 0 commits from feat/prometheus-metrics into main
Owner
No description provided.
lerko added 16 commits 2026-05-15 15:32:47 +00:00
Per-site pause: [p] key toggles pause for selected monitor in TUI.
Paused monitors skip checks, persist to DB, show on status page.

Status page: replace full-page reload with fetch-based DOM updates
to eliminate scroll-jump on refresh. Add summary bar (UP/DOWN/PAUSED
counts), stale-data indicator, and fix SSL EXP CSS class bug.

TUI: constrain tables to terminal width via lipgloss .Width() to
prevent row wrapping that pushed header off-screen. Add MaxHeight
safety net. Bump subtle style from #383838 to #565f89 for
readability on dark terminals.
Reviewed-on: lerko/uptime#1
Prevent accidental deletes with y/n confirmation dialog. Validate all
numeric form inputs (interval, port, timeout, threshold, retries) with
range checks instead of silently defaulting to zero. Escape user-supplied
data in status page JavaScript to close XSS via monitor names. Persist
check history to new check_history table so sparklines and uptime
percentages survive restarts.
Display sequential # instead of internal database IDs in sites, alerts,
and users tables for a cleaner view without gaps from deleted records.
Groups act as visual organizers in the sites table. Monitors can be
assigned to a parent group via the form. Group rows show aggregated
worst-child status, children render with tree chars (├/└), and Space
toggles collapse/expand. Group form hides irrelevant connection and
advanced sections.
Reviewed-on: lerko/uptime#2
Use lipgloss StyleFunc to set per-column widths, with NAME as
the flex column absorbing remaining space. History column tied
to sparkWidth for consistency.
- Move status page template to package-level template.Must (panic on
  parse error at init instead of nil deref at runtime)
- Fix XSS in import error responses (log detail server-side, return
  generic message to client)
- Handle ListenAndServe errors in HTTP and SSH servers
- Use defer resp.Body.Close() in all alert providers, check
  json.Marshal errors
- Share HTTP clients across checks instead of creating per-request
- Use http.NewRequestWithContext for per-site timeout control
- Support HTTP method field (was always GET despite DB storing method)
- Implement AcceptedCodes validation (was hardcoded >= 400 despite DB
  storing accepted code ranges)
- Add defer tx.Rollback() to ImportData for transaction safety
Extract shared SQLStore with Dialect interface for the ~5% that
differs between backends (DDL, placeholders, sequence resets).

- New dialect.go: Dialect interface + placeholder rewriter (? → $N)
- New sqlstore.go: single implementation of all 19 Store methods
- sqlite.go: reduced from 286 to 83 lines (SQLiteDialect only)
- postgres.go: reduced from 266 to 78 lines (PostgresDialect only)
- main.go: use NewSQLiteStore/NewPostgresStore constructors

Zero CRUD logic duplication. Every future schema change written once.
Every Store method now returns an error. Callers handle errors
gracefully — TUI logs to event log, server returns HTTP 500,
monitor engine logs and retries. All rows.Scan() errors are now
checked in sqlstore.go instead of silently appending corrupt data.

- GetSites, GetAllAlerts, GetAllUsers return ([]T, error)
- GetAlert returns (AlertConfig, error) instead of (AlertConfig, bool)
- AddSite, UpdateSite, DeleteSite, etc. all return error
- SaveCheck, LoadAllHistory, ExportData return error
- ~25 caller sites updated across tui, server, monitor, main
Remove store.Get()/SetGlobal()/Current. Store is now passed explicitly
to all consumers via constructor parameters and function arguments.

- TUI Model holds store field, set via InitialModel(isAdmin, store)
- monitor.StartEngine(s) and InitHistoryFromStore(s) accept store
- server.Start(cfg, s) closes over store in HTTP handlers
- main.go threads store to SSH server, TUI, monitor, server
- isKeyAllowed receives store as parameter

No more hidden dependency on package-level mutable state in store pkg.
Monitor package still uses package-level state (LiveState, etc.) — will
be encapsulated into Engine struct in Phase 7.
Discord, Slack, and Webhook providers now use a single HTTPProvider
struct with a PayloadFunc for the only part that differs. Centralizes
response body handling and adds HTTP status code checking (4xx/5xx
now return errors instead of being silently ignored).

Email and Ntfy keep separate implementations (different protocols).
Adding a new HTTP-based alert provider is now a one-line PayloadFunc.
- New table_helpers.go with renderTable() and shared styles
- Remove 4 duplicated style blocks (header/cell/selected/border)
  from tab_alerts.go and tab_users.go
- All 3 tab views now use renderTable() for offset/end calc,
  selected row highlighting, and table construction
- Sites tab keeps siteGroupStyle via StyleOverride callback
- Clamp cursor to list length at end of refreshData() to prevent
  index-out-of-bounds after concurrent list changes
- Fix off-by-one in tab click handler (i <= maxTabs → i < tabCount)
Replace all monitor package-level mutable state with Engine struct.
All state (liveState, logStore, histories, tokenIndex, HTTP clients)
is now encapsulated in Engine, created via NewEngine(store).

Key changes:
- Engine struct holds all monitor state with proper mutex protection
- Engine.Start(ctx) and monitorRoutine respect context cancellation
  for graceful shutdown — no more leaked goroutines
- cluster.runFollowerLoop also respects context for clean exit
- Token index (map[string]int) for O(1) push heartbeat lookup,
  replacing O(n) linear scan through LiveState
- UpdateSiteConfig preserves 8 runtime fields instead of copying
  17 config fields individually
- triggerAlert goroutines get 30s timeout context
- All consumers (TUI, server, cluster, main) receive *Engine via
  constructor/parameter — no package-level state access
- main.go creates context.WithCancel, passes to engine and cluster

First test suite: 12 tests across store and alert packages
- Store: CRUD for sites/alerts/users, push token generation,
  import/export round-trip, check history persistence
- Alert: Discord/Slack/Webhook payload format, HTTP 4xx error
  propagation, Ntfy headers, unknown provider returns nil
Expand alert provider count from 5 to 9. All new providers use
the shared HTTPProvider with closure-based payload functions.
Includes TUI form support and tests for each provider.
Zero-dependency Prometheus text exposition format. Exposes monitor
up/down, latency, status code, check timestamps, pause state,
SSL cert expiry, and check counters — all from in-memory state.
lerko closed this pull request 2026-05-15 15:34:29 +00:00

Pull request closed

Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: lerkolabs/uptop#4