Wire up monitor detail view in TUI with type-specific icons.
Add SQLite WAL glob to gitignore. Extend store interface with
bulk-fetch and history queries for the detail panel.
Track alert delivery health at runtime:
- AlertHealth struct: LastSendAt, LastSendOK, LastError, SendCount, FailCount
- triggerAlert records success/failure after each Send()
- Health data exposed via GetAlertHealth() for TUI
Alerts tab enriched:
- Health dot column: green (OK), red (failed), gray (never sent)
- LAST SENT column: relative time ("2m ago", "never")
- [t] key sends test notification through selected channel
Inspired by Grafana's contact point health columns.
Push monitors no longer lie about status:
- PENDING stays until first heartbeat (no auto-promote to UP)
- LATE state (amber) when overdue but within grace period
- DOWN only after grace period expires
- Grace period = interval/2, minimum 60s
RecordHeartbeat now handles all transitions:
- PENDING → UP (first heartbeat, logged)
- LATE → UP (late arrival, logged)
- DOWN → UP (recovery, alert + state change persisted)
TUI updates:
- LATE rendered in amber/warning color
- Status bar shows LATE count separately
- Tab badge shows ⚠ for late monitors
- Sort order: DOWN > LATE > UP > PENDING > PAUSED
- Detail panel shows error for LATE monitors
Inspired by Healthchecks.io state machine (new/up/grace/down).
Propagate check failure reasons through the entire stack:
- Checker captures specific errors (DNS, timeout, HTTP status, SSL, etc.)
- Engine tracks LastError, StatusChangedAt, LastSuccessAt per monitor
- State transitions persisted to new state_changes table
- Detail panel shows error reason, HTTP code, state duration, last
success time, and last 5 state change events
- Monitor table shows inline error preview for DOWN services
- Alert messages include error reason
- Probe nodes forward error reasons to leader
15 files changed across models, checker, engine, store, TUI, and probes.
- Redact PostgreSQL DSN password from stdout/logs
- Harden .dockerignore to exclude .ssh/, .claude/, *.db, *.local files
- SSRF protection: block private/loopback/link-local IPs by default
(UPTOP_ALLOW_PRIVATE_TARGETS=true to override for homelab use)
- Fix email header injection via CRLF in monitor names
- AES-256-GCM encryption for alert credentials at rest
(UPTOP_ENCRYPTION_KEY env var, migrate-secrets subcommand)
- TLS support for HTTP server (UPTOP_TLS_CERT/UPTOP_TLS_KEY)
with HSTS header when TLS enabled
55 tests covering state machine transitions, heartbeat handling, push
deadline checks, group aggregation, history recording, probe aggregation,
log management, state management, and concurrency safety.
Checker tests cover HTTP (via httptest), port (via net.Listen),
isCodeAccepted ranges, and siteTimeout defaults. Ping and DNS
checkers skipped (need ICMP privileges and DNS server).
Coverage: 64.2% overall, 100% on handleStatusChange, triggerAlert,
checkPush, recordCheck, and AggregateStatus.
Monitors with the same interval no longer fire simultaneously.
Each tick adds up to 10% random jitter. Initial checks stagger
over 0-3s to avoid thundering herd on startup.
Provider.Send now accepts context.Context for timeout/cancellation.
HTTPProvider and NtfyProvider use NewRequestWithContext so HTTP alerts
respect the 30s deadline. triggerAlert logs send failures and config
load errors instead of silently swallowing them.
Group status now treats maintenance'd children like paused ones —
they're excluded from the UP/DOWN calculation. Prevents group from
showing DOWN when its only failing child is under maintenance.
Maintenance windows suppress alerts during planned downtime while checks
continue running. Incidents provide informational tracking. Supports
targeting all monitors, single monitor, or group (applies to children).
New Maint tab in TUI with create/end/delete. Status page, JSON API, and
Prometheus metrics all reflect maintenance state.
Phase 2 of distributed probing:
- Extract check logic into standalone RunCheck() for use by probes
- Add probe cluster mode: stateless nodes that fetch assignments, execute
checks, and report results to the leader
- Add multi-node result aggregation with configurable strategy
(any-down, majority-down, all-down)
- Leader ingests probe results into engine live state and triggers alerts
- New env vars: UPKEEP_NODE_ID, UPKEEP_NODE_NAME, UPKEEP_NODE_REGION,
UPKEEP_AGG_STRATEGY
- Example docker-compose.probe.yml with leader + 2 regional probes
Replace all monitor package-level mutable state with Engine struct.
All state (liveState, logStore, histories, tokenIndex, HTTP clients)
is now encapsulated in Engine, created via NewEngine(store).
Key changes:
- Engine struct holds all monitor state with proper mutex protection
- Engine.Start(ctx) and monitorRoutine respect context cancellation
for graceful shutdown — no more leaked goroutines
- cluster.runFollowerLoop also respects context for clean exit
- Token index (map[string]int) for O(1) push heartbeat lookup,
replacing O(n) linear scan through LiveState
- UpdateSiteConfig preserves 8 runtime fields instead of copying
17 config fields individually
- triggerAlert goroutines get 30s timeout context
- All consumers (TUI, server, cluster, main) receive *Engine via
constructor/parameter — no package-level state access
- main.go creates context.WithCancel, passes to engine and cluster
First test suite: 12 tests across store and alert packages
- Store: CRUD for sites/alerts/users, push token generation,
import/export round-trip, check history persistence
- Alert: Discord/Slack/Webhook payload format, HTTP 4xx error
propagation, Ntfy headers, unknown provider returns nil
Remove store.Get()/SetGlobal()/Current. Store is now passed explicitly
to all consumers via constructor parameters and function arguments.
- TUI Model holds store field, set via InitialModel(isAdmin, store)
- monitor.StartEngine(s) and InitHistoryFromStore(s) accept store
- server.Start(cfg, s) closes over store in HTTP handlers
- main.go threads store to SSH server, TUI, monitor, server
- isKeyAllowed receives store as parameter
No more hidden dependency on package-level mutable state in store pkg.
Monitor package still uses package-level state (LiveState, etc.) — will
be encapsulated into Engine struct in Phase 7.
Every Store method now returns an error. Callers handle errors
gracefully — TUI logs to event log, server returns HTTP 500,
monitor engine logs and retries. All rows.Scan() errors are now
checked in sqlstore.go instead of silently appending corrupt data.
- GetSites, GetAllAlerts, GetAllUsers return ([]T, error)
- GetAlert returns (AlertConfig, error) instead of (AlertConfig, bool)
- AddSite, UpdateSite, DeleteSite, etc. all return error
- SaveCheck, LoadAllHistory, ExportData return error
- ~25 caller sites updated across tui, server, monitor, main
- Move status page template to package-level template.Must (panic on
parse error at init instead of nil deref at runtime)
- Fix XSS in import error responses (log detail server-side, return
generic message to client)
- Handle ListenAndServe errors in HTTP and SSH servers
- Use defer resp.Body.Close() in all alert providers, check
json.Marshal errors
- Share HTTP clients across checks instead of creating per-request
- Use http.NewRequestWithContext for per-site timeout control
- Support HTTP method field (was always GET despite DB storing method)
- Implement AcceptedCodes validation (was hardcoded >= 400 despite DB
storing accepted code ranges)
- Add defer tx.Rollback() to ImportData for transaction safety
Groups act as visual organizers in the sites table. Monitors can be
assigned to a parent group via the form. Group rows show aggregated
worst-child status, children render with tree chars (├/└), and Space
toggles collapse/expand. Group form hides irrelevant connection and
advanced sections.
Prevent accidental deletes with y/n confirmation dialog. Validate all
numeric form inputs (interval, port, timeout, threshold, retries) with
range checks instead of silently defaulting to zero. Escape user-supplied
data in status page JavaScript to close XSS via monitor names. Persist
check history to new check_history table so sparklines and uptime
percentages survive restarts.
Per-site pause: [p] key toggles pause for selected monitor in TUI.
Paused monitors skip checks, persist to DB, show on status page.
Status page: replace full-page reload with fetch-based DOM updates
to eliminate scroll-jump on refresh. Add summary bar (UP/DOWN/PAUSED
counts), stale-data indicator, and fix SSL EXP CSS class bug.
TUI: constrain tables to terminal width via lipgloss .Width() to
prevent row wrapping that pushed header off-screen. Add MaxHeight
safety net. Bump subtle style from #383838 to #565f89 for
readability on dark terminals.
Implement checkPing (pro-bing ICMP), checkPort (TCP dial), and checkDNS
(miekg/dns) with per-monitor timeout, configurable DNS record types,
and fallback defaults. Groups skip checks entirely.
Add Hostname, Port, Timeout, Method, Description, ParentID, AcceptedCodes,
DNSResolveType, DNSServer, and IgnoreTLS fields. Refactor AddSite/UpdateSite
to accept models.Site instead of individual params. Includes DB migrations
for existing databases, per-monitor timeout/TLS in the engine, new type
options in TUI forms, and TYPE column in the sites table.