uptop

Author	SHA1	Message	Date
lerko	03cbe283df	chore(tui): polish demo + regenerate screenshots CI / test (pull_request) Successful in 2m45s Details CI / lint (pull_request) Successful in 1m4s Details CI / vulncheck (pull_request) Successful in 56s Details Rework the VHS demo so the README screenshots actually entice a download. Demo data / tooling: - seed.yaml: real, reachable service URLs (detail now shows nextcloud.com, not example.com); Auth Portal -> non-resolving home.arpa host so it reads as a believable, reliably-DOWN monitor - backfill: transient outages for Nextcloud/Jellyfin/Immich aligned with their state changes (uptime % now matches); log timestamps derived from now so the Logs view reads chronologically; real SSL warning; three probe nodes across regions; seeded alert send health - demo.tape: shorter warm-up, added Nodes + theme captures, ordered so every shot stays inside the 60s node-freshness window (consistent probe count) - vhs/crop: new tool to trim the empty terminal border around each screenshot - setup.sh: build backfill up front for deterministic timing; UPTOP_DEMO=1 Supporting code: - persist alert send health (new alert_health table, load on startup, best-effort save on send) so health/last-sent survive restarts - latency Min/Avg/Max ignore failed checks (no more "Min 0ms") - correct "probe"/"probes" pluralization - stable status dot instead of an animated spinner under UPTOP_DEMO	2026-05-28 22:32:45 -04:00
lerko	5dc31108f8	feat: proper push monitor lifecycle — PENDING, LATE, DOWN states CI / test (pull_request) Successful in 2m41s Details CI / lint (pull_request) Successful in 1m7s Details CI / vulncheck (pull_request) Successful in 46s Details Push monitors no longer lie about status: - PENDING stays until first heartbeat (no auto-promote to UP) - LATE state (amber) when overdue but within grace period - DOWN only after grace period expires - Grace period = interval/2, minimum 60s RecordHeartbeat now handles all transitions: - PENDING → UP (first heartbeat, logged) - LATE → UP (late arrival, logged) - DOWN → UP (recovery, alert + state change persisted) TUI updates: - LATE rendered in amber/warning color - Status bar shows LATE count separately - Tab badge shows ⚠ for late monitors - Sort order: DOWN > LATE > UP > PENDING > PAUSED - Detail panel shows error for LATE monitors Inspired by Healthchecks.io state machine (new/up/grace/down).	2026-05-27 19:56:50 -04:00
lerko	bc3a44beac	feat: show error reason when monitors go DOWN CI / test (pull_request) Successful in 2m42s Details CI / lint (pull_request) Successful in 1m11s Details CI / vulncheck (pull_request) Successful in 51s Details Propagate check failure reasons through the entire stack: - Checker captures specific errors (DNS, timeout, HTTP status, SSL, etc.) - Engine tracks LastError, StatusChangedAt, LastSuccessAt per monitor - State transitions persisted to new state_changes table - Detail panel shows error reason, HTTP code, state duration, last success time, and last 5 state change events - Monitor table shows inline error preview for DOWN services - Alert messages include error reason - Probe nodes forward error reasons to leader 15 files changed across models, checker, engine, store, TUI, and probes.	2026-05-27 19:32:30 -04:00
lerko	9d12e3ecf1	chore: complete rename from go-upkeep to uptop CI / test (pull_request) Successful in 4m26s Details CI / lint (pull_request) Successful in 1m11s Details - Module path: gitea.lerkolabs.com/lerko/uptop - Binary: cmd/uptop/ - All imports updated to full module path - Env vars: UPKEEP_* → UPTOP_* - Prometheus metrics: upkeep_* → uptop_* - Default DB: uptop.db - Docker image: lerko/uptop - All docs, compose files, CI updated Only remaining "go-upkeep" reference is the fork attribution in README.	2026-05-24 20:20:35 -04:00
lerko	94296e8286	test(monitor): add comprehensive test suite for engine and checkers 55 tests covering state machine transitions, heartbeat handling, push deadline checks, group aggregation, history recording, probe aggregation, log management, state management, and concurrency safety. Checker tests cover HTTP (via httptest), port (via net.Listen), isCodeAccepted ranges, and siteTimeout defaults. Ping and DNS checkers skipped (need ICMP privileges and DNS server). Coverage: 64.2% overall, 100% on handleStatusChange, triggerAlert, checkPush, recordCheck, and AggregateStatus.	2026-05-23 21:06:28 -04:00

5 Commits