uptop

Author	SHA1	Message	Date
lerko	03cbe283df	chore(tui): polish demo + regenerate screenshots CI / test (pull_request) Successful in 2m45s Details CI / lint (pull_request) Successful in 1m4s Details CI / vulncheck (pull_request) Successful in 56s Details Rework the VHS demo so the README screenshots actually entice a download. Demo data / tooling: - seed.yaml: real, reachable service URLs (detail now shows nextcloud.com, not example.com); Auth Portal -> non-resolving home.arpa host so it reads as a believable, reliably-DOWN monitor - backfill: transient outages for Nextcloud/Jellyfin/Immich aligned with their state changes (uptime % now matches); log timestamps derived from now so the Logs view reads chronologically; real SSL warning; three probe nodes across regions; seeded alert send health - demo.tape: shorter warm-up, added Nodes + theme captures, ordered so every shot stays inside the 60s node-freshness window (consistent probe count) - vhs/crop: new tool to trim the empty terminal border around each screenshot - setup.sh: build backfill up front for deterministic timing; UPTOP_DEMO=1 Supporting code: - persist alert send health (new alert_health table, load on startup, best-effort save on send) so health/last-sent survive restarts - latency Min/Avg/Max ignore failed checks (no more "Min 0ms") - correct "probe"/"probes" pluralization - stable status dot instead of an animated spinner under UPTOP_DEMO	2026-05-28 22:32:45 -04:00
lerko	bc3a44beac	feat: show error reason when monitors go DOWN CI / test (pull_request) Successful in 2m42s Details CI / lint (pull_request) Successful in 1m11s Details CI / vulncheck (pull_request) Successful in 51s Details Propagate check failure reasons through the entire stack: - Checker captures specific errors (DNS, timeout, HTTP status, SSL, etc.) - Engine tracks LastError, StatusChangedAt, LastSuccessAt per monitor - State transitions persisted to new state_changes table - Detail panel shows error reason, HTTP code, state duration, last success time, and last 5 state change events - Monitor table shows inline error preview for DOWN services - Alert messages include error reason - Probe nodes forward error reasons to leader 15 files changed across models, checker, engine, store, TUI, and probes.	2026-05-27 19:32:30 -04:00
lerko	9d12e3ecf1	chore: complete rename from go-upkeep to uptop CI / test (pull_request) Successful in 4m26s Details CI / lint (pull_request) Successful in 1m11s Details - Module path: gitea.lerkolabs.com/lerko/uptop - Binary: cmd/uptop/ - All imports updated to full module path - Env vars: UPKEEP_* → UPTOP_* - Prometheus metrics: upkeep_* → uptop_* - Default DB: uptop.db - Docker image: lerko/uptop - All docs, compose files, CI updated Only remaining "go-upkeep" reference is the fork attribution in README.	2026-05-24 20:20:35 -04:00
lerko	ae141c62ba	fix(store): replace panic with error return, handle unmarshal errors generateToken() now returns (string, error) instead of panicking on crypto/rand failure. All json.Unmarshal calls for alert settings now check and propagate errors instead of silently ignoring them. Adds Close() to Store interface for graceful shutdown support. Skips malformed notification entries during Kuma import.	2026-05-23 13:15:39 -04:00
lerko	e84b64f8ed	feat(tui): zebra striping, detail breadcrumb, sparkline stats, collapse persistence Add alternating row backgrounds for easier table scanning. Detail panel now shows breadcrumb path (Sites > Group > Name) and min/avg/max latency stats below the sparkline. Group collapse state persists across restarts via new preferences table in both SQLite and Postgres.	2026-05-22 20:53:23 -04:00
lerko	b146f34d19	feat: add incident management and maintenance windows Maintenance windows suppress alerts during planned downtime while checks continue running. Incidents provide informational tracking. Supports targeting all monitors, single monitor, or group (applies to children). New Maint tab in TUI with create/end/delete. Status page, JSON API, and Prometheus metrics all reflect maintenance state.	2026-05-22 18:45:02 -04:00
lerko	ed082e4080	feat: persist logs to DB, load on startup	2026-05-16 15:25:08 -04:00
lerko	ca9faa0acd	feat(cluster): add distributed probing foundation — schema, models, and probe APIs Add node-aware check history and probe registration infrastructure: - ProbeNode model and nodes table (SQLite + Postgres) - node_id column on check_history for multi-source tracking - Store interface: RegisterNode, GetNode, GetAllNodes, DeleteNode, SaveCheckFromNode - Dialect: UpsertNodeSQL (INSERT OR REPLACE / ON CONFLICT) - API endpoints: POST /api/probe/register, GET /api/probe/assignments, POST /api/probe/results - Backward compatible: existing SaveCheck wraps SaveCheckFromNode with empty node_id	2026-05-16 11:05:06 -04:00
lerko	5b01b9ee30	feat(config): add config-as-code YAML import/export Add declarative config-as-code support via YAML files. Monitors and alerts can be exported, version controlled, and applied across instances. - goupkeep export [-o file.yaml] dumps current state - goupkeep apply -f file.yaml creates/updates to match desired state - --dry-run shows planned changes without applying - --prune deletes monitors/alerts not in the YAML - Matching by name, alert references by name, nested group children - CLI refactored to subcommands (apply, export, serve) with backward compat - 24 tests covering apply, export, validation, round-trip idempotency	2026-05-15 20:40:49 -04:00
lerko	b7b8aa6f03	feat(metrics): add Prometheus /metrics endpoint Zero-dependency Prometheus text exposition format. Exposes monitor up/down, latency, status code, check timestamps, pause state, SSL cert expiry, and check counters — all from in-memory state.	2026-05-15 11:26:21 -04:00

10 Commits