uptop

Author	SHA1	Message	Date
lerko	92efb8e270	fix(security): make SSH key revocation fail closed CI / test (pull_request) Successful in 2m37s Details CI / lint (pull_request) Successful in 56s Details CI / vulncheck (pull_request) Successful in 51s Details keyCache.Invalidate existed but had zero callers, and refresh silently swallowed store errors — a revoked key kept working off the stale cache for as long as the DB stayed down. Invalidate now clears the key set (not just the timestamp) and is wired through userInvalidatingStore, a decorator at the composition root that drops the cache on AddUser/UpdateUser/DeleteUser/ImportData. Transient refresh errors still retain the previous key set so a DB blip can't lock every admin out, but a post-revocation refresh failure denies. Refresh errors are logged. First tests for the SSH auth gate. Also suppresses per-request HTTP logging when the local TUI owns the terminal — request logs scribbled over the alt screen.	2026-06-11 12:26:40 -04:00
lerko	809620340e	fix(security): close XFF bypass and three secret-leak paths CI / test (pull_request) Successful in 2m36s Details CI / lint (pull_request) Successful in 56s Details CI / vulncheck (pull_request) Successful in 46s Details Four fixes hardening the secrets and rate-limit posture a prior audit left or that regressed: X-Forwarded-For rate-limit bypass + memory DoS (ratelimit.go): clientIP returned the raw XFF header, so an attacker rotating it minted unlimited distinct limiter keys — never tripping the limit and growing the visitors map without bound. XFF is now honored only when the immediate peer is a configured trusted proxy (UPTOP_TRUSTED_PROXIES, CIDRs or bare IPs), using the right-most non-trusted hop; otherwise the key is the real RemoteAddr. The visitors map is bounded with LRU eviction as defense in depth. Export redaction denylist -> per-provider allowlist (server.go): the old six-key denylist missed the actual credentials — the webhook URL for discord/slack/webhook/ntfy/gotify and api_key for opsgenie — exporting them in the clear. redactByProvider keeps only known-safe keys per provider type and redacts everything else, so unknown/new keys fail safe. ImportData plaintext secrets (sqlstore.go): import inserted raw json.Marshal(settings), bypassing the encryption AddAlert/UpdateAlert use. It now routes through marshalSettings, so a restore with UPTOP_ENCRYPTION_KEY set stores enc:-prefixed ciphertext, not plaintext. Alert error credential leak (alert.go): provider Send returned the raw *url.Error, whose URL carries the secret (Telegram bot token in the path, webhook secrets in the URL); it was persisted to AlertHealth.LastError and shown in the TUI. sanitizeError strips the URL, keeping the operation and underlying cause. Tests cover trusted/untrusted XFF + spoofed-bypass + map bound, the allowlist per provider, encrypted-on-import round-trip, and URL-stripped errors. README documents UPTOP_TRUSTED_PROXIES. Full suite green under -race; golangci-lint clean.	2026-06-10 18:50:19 -04:00
lerko	8b39d4c1a1	fix(monitor): serialize DB writes through a single drained writer CI / test (pull_request) Successful in 2m36s Details CI / lint (pull_request) Successful in 56s Details CI / vulncheck (pull_request) Successful in 51s Details Every check spawned `go e.db.Save*(...)` with the error discarded: a fire-and-forget goroutine per log line, check, state change, and alert health update. SaveLog ran a full-table prune DELETE on every insert and SaveCheck a COUNT + conditional prune on every check, so the hot path amplified each write into several statements. Nothing tracked these goroutines, so at shutdown they raced the store's Close() — writes to a closing DB, silently swallowed. Introduce a single writer goroutine that drains a buffered channel of typed dbWrite values (log/check/state-change/alert-health). Writes are enqueued non-blocking; a saturated queue drops and notes it in the in-memory log rather than blocking the check loop. Write errors are now logged instead of discarded. Retention moves off the hot path: SaveLog and SaveCheck become plain INSERTs, and PruneLogs/PruneCheckHistory/ PruneStateChanges run on a 10-minute timer inside the writer (single keep-newest-N-per-site pass via a window function). state_changes was previously never pruned — now bounded. Add Engine.Stop(): cancels the engine's context, then waits for the writer to drain every buffered write before returning. main wires it in before the deferred store Close() so no write races a closed DB. SQLite gains busy_timeout=5000 and synchronous=NORMAL, applied via the DSN so every pooled connection inherits them (a post-open PRAGMA only touches one connection); WAL moves to the DSN too. :memory: test DBs are left as-is. Tests: writer drains on Stop, Stop is idempotent, and the prune queries keep newest-N per site / N logs on real SQLite. Full suite green under -race.	2026-06-10 18:14:28 -04:00
lerko	21a1563e53	feat(monitor): auto-prune expired maintenance windows CI / test (pull_request) Successful in 2m33s Details CI / lint (pull_request) Successful in 56s Details CI / vulncheck (pull_request) Successful in 50s Details Background goroutine runs every 15 minutes, deletes maintenance windows that expired beyond the retention period (default 7 days). Configurable via UPTOP_MAINT_RETENTION env var (Go duration format). Closes #72	2026-06-05 18:27:42 -04:00
lerko	e0cb0adebd	fix(tui): quick wins batch — version footer, column widths, zebra, sparkline CI / test (pull_request) Successful in 2m34s Details CI / lint (pull_request) Successful in 57s Details CI / vulncheck (pull_request) Successful in 51s Details - Show version in dashboard footer (wired from goreleaser ldflags) - Cap name column at 35, raise sparkline minimum to 15 chars - Preserve zebra background on group rows (was lost by style override) - Group sparkline uses bullet • instead of heavy circle ●	2026-06-04 14:56:01 -04:00
lerko	8f17deba67	chore: migrate module path to lerkolabs org CI / test (pull_request) Successful in 2m39s Details CI / lint (pull_request) Successful in 1m6s Details CI / vulncheck (pull_request) Successful in 46s Details Move Go module from gitea.lerkolabs.com/lerko/uptop to gitea.lerkolabs.com/lerkolabs/uptop. Updates all imports, go.mod, goreleaser owner, and README links.	2026-05-29 14:22:49 -04:00
lerko	026e969b74	chore: TUI screenshots, README polish, changelog rewrite (#32 ) CI / test (push) Successful in 2m41s Details CI / lint (push) Successful in 1m11s Details CI / vulncheck (push) Successful in 56s Details - Add 6 TUI screenshots to assets/ (monitors, alerts, logs, nodes, detail, theme) - Rewrite README with hero image, badges, collapsible install sections - Rewrite changelog to match actual CalVer tag history - VHS tooling extracted to lerko/uptop-vhs Reviewed-on: lerko/uptop#32	2026-05-29 17:45:31 +00:00
lerko	d8a2cab90f	feat: seed SSH users from env var and authorized_keys file (#31 ) CI / test (push) Successful in 2m36s Details CI / lint (push) Successful in 1m12s Details CI / vulncheck (push) Successful in 56s Details Release / release (push) Has been cancelled Details Release / docker (push) Has been cancelled Details ## Summary Docker onboarding was broken — no way to add first SSH user without `docker attach` to TUI. Now reads SSH public keys from two sources on startup: - `UPTOP_ADMIN_KEY` env var — single key for quick single-user setup - `UPTOP_KEYS` file path — authorized_keys format for team setup Dockerfile already sets `UPTOP_KEYS=/data/authorized_keys` and compose mounts `./data:/data`, so the flow is: ``` echo "ssh-ed25519 AAAA... me@host" > ./data/authorized_keys docker compose up -d ssh -p 23234 localhost ``` ### Behavior - Skips keys already in DB (idempotent across restarts) - All seeded users get admin role - Username parsed from key comment (e.g. `tyler@macbook` → `tyler`) - Comments and blank lines in keys file are ignored ### Tested - UPTOP_ADMIN_KEY seeds single admin user - UPTOP_KEYS file seeds multiple users with correct usernames - Second startup skips existing keys (no duplicates) - Build and all tests pass Reviewed-on: lerko/uptop#31	2026-05-27 21:15:00 +00:00
lerko	986f9f1d55	fix(security): phase 4 code quality and low-severity fixes CI / test (pull_request) Successful in 4m24s Details CI / lint (pull_request) Successful in 1m1s Details - Fix limitStr to handle multi-byte UTF-8 characters correctly - Sanitize log messages: strip ANSI escape sequences and newlines - URL-encode probe node_id instead of string concatenation - Fix follower resp.Body leak on non-200 responses - Make SSH host key path configurable via UPTOP_SSH_HOST_KEY env var - Add HTTP method checks on GET-only endpoints (405 for wrong methods) - Extract magic numbers into named constants across monitor/store/server - Standardize error output to stderr for all startup errors	2026-05-26 17:25:47 -04:00
lerko	bd561d9a5e	fix(security): phase 3 medium reliability and hardening CI / test (pull_request) Successful in 4m23s Details CI / lint (pull_request) Successful in 1m11s Details - Fail hard on critical migration errors (ignore only "already exists") - Cache SSH user keys with 30s TTL (avoid DB query per auth attempt) - Configure DB connection pooling (25 open, 5 idle, 5m lifetime) - Enable SQLite WAL mode for concurrent read/write - Optimize check history pruning (only prune above 1100 rows) - Add security headers: X-Content-Type-Options, X-Frame-Options, CSP, Referrer-Policy - Add CORS policy on /status/json via UPTOP_CORS_ORIGIN env var - Add HTTP request logging middleware (method, path, status, duration, IP) - Fix config file permissions from 0644 to 0600 - Pin Docker images: golang:1.24-alpine3.21, alpine:3.21 - Fix Docker CI tag pattern for CalVer (was semver) - Pass build args (VERSION, COMMIT, BUILD_DATE) to Docker build	2026-05-26 16:57:03 -04:00
lerko	d30d1460bd	fix(security): phase 2 high-severity hardening CI / test (pull_request) Successful in 4m31s Details CI / lint (pull_request) Successful in 56s Details - Push heartbeat accepts Authorization: Bearer header (query string deprecated) - Gotify alerts use X-Gotify-Key header instead of token in URL - Per-IP rate limiting on all API endpoints (token-bucket) - /metrics gated behind cluster secret (UPTOP_METRICS_PUBLIC=true to opt out) - Config export redacts passwords/tokens by default (redact_secrets=false to override) - Fix rewritePlaceholders for 100+ SQL parameters - Fix AddSiteReturningID/AddAlertReturningID race with LastInsertId/RETURNING - HTTP server timeouts: read 30s, write 60s, idle 120s	2026-05-25 21:15:33 -04:00
lerko	60b30935b3	fix(security): phase 1 critical fixes for public release CI / test (pull_request) Successful in 4m40s Details CI / lint (pull_request) Successful in 1m2s Details - Redact PostgreSQL DSN password from stdout/logs - Harden .dockerignore to exclude .ssh/, .claude/, .db, .local files - SSRF protection: block private/loopback/link-local IPs by default (UPTOP_ALLOW_PRIVATE_TARGETS=true to override for homelab use) - Fix email header injection via CRLF in monitor names - AES-256-GCM encryption for alert credentials at rest (UPTOP_ENCRYPTION_KEY env var, migrate-secrets subcommand) - TLS support for HTTP server (UPTOP_TLS_CERT/UPTOP_TLS_KEY) with HSTS header when TLS enabled	2026-05-25 11:26:47 -04:00
lerko	9d12e3ecf1	chore: complete rename from go-upkeep to uptop CI / test (pull_request) Successful in 4m26s Details CI / lint (pull_request) Successful in 1m11s Details - Module path: gitea.lerkolabs.com/lerko/uptop - Binary: cmd/uptop/ - All imports updated to full module path - Env vars: UPKEEP_* → UPTOP_* - Prometheus metrics: upkeep_* → uptop_* - Default DB: uptop.db - Docker image: lerko/uptop - All docs, compose files, CI updated Only remaining "go-upkeep" reference is the fork attribution in README.	2026-05-24 20:20:35 -04:00
lerko	8f9210b451	feat: add --version flag with build metadata injection Supports `goupkeep version`, `--version`, and `-v`. Prints version, commit hash, and build date when injected via ldflags. Shows "dev" for local builds. Dockerfile updated with ARGs for version injection.	2026-05-24 14:14:13 -04:00
lerko	359cff7292	chore: add golangci-lint config and fix all lint issues Add .golangci.yml enabling errcheck, staticcheck, govet, gosec, ineffassign, and unused linters. Fix 66 issues across 16 files: - Check all unchecked errors (errcheck) - Use HTTP status constants instead of numeric literals (staticcheck) - Replace deprecated LineUp/LineDown with ScrollUp/ScrollDown (staticcheck) - Convert sprintf+write patterns to fmt.Fprintf (staticcheck) - Add ReadHeaderTimeout to http.Server (gosec) - Remove unused types and functions (unused) - Add nolint comments for intentional patterns (InsecureSkipVerify, math/rand for jitter, dialect-only SQL formatting)	2026-05-23 22:02:06 -04:00
lerko	4891843c94	fix: graceful shutdown for HTTP, SSH servers and database HTTP and SSH servers now shut down cleanly on SIGINT/SIGTERM with a 30s timeout. Database connection closed via defer. Replaced log.Fatalf in SSH goroutine with log.Printf + ErrServerClosed check to prevent unclean process exits.	2026-05-23 13:23:27 -04:00
lerko	ed082e4080	feat: persist logs to DB, load on startup	2026-05-16 15:25:08 -04:00
lerko	ca5a42314f	feat(cluster): add probe execution mode, check extraction, and result aggregation Phase 2 of distributed probing: - Extract check logic into standalone RunCheck() for use by probes - Add probe cluster mode: stateless nodes that fetch assignments, execute checks, and report results to the leader - Add multi-node result aggregation with configurable strategy (any-down, majority-down, all-down) - Leader ingests probe results into engine live state and triggers alerts - New env vars: UPKEEP_NODE_ID, UPKEEP_NODE_NAME, UPKEEP_NODE_REGION, UPKEEP_AGG_STRATEGY - Example docker-compose.probe.yml with leader + 2 regional probes	2026-05-16 11:19:57 -04:00
lerko	5b01b9ee30	feat(config): add config-as-code YAML import/export Add declarative config-as-code support via YAML files. Monitors and alerts can be exported, version controlled, and applied across instances. - goupkeep export [-o file.yaml] dumps current state - goupkeep apply -f file.yaml creates/updates to match desired state - --dry-run shows planned changes without applying - --prune deletes monitors/alerts not in the YAML - Matching by name, alert references by name, nested group children - CLI refactored to subcommands (apply, export, serve) with backward compat - 24 tests covering apply, export, validation, round-trip idempotency	2026-05-15 20:40:49 -04:00
lerko	f023e38fdc	refactor(monitor): encapsulate engine state, add graceful shutdown and tests Replace all monitor package-level mutable state with Engine struct. All state (liveState, logStore, histories, tokenIndex, HTTP clients) is now encapsulated in Engine, created via NewEngine(store). Key changes: - Engine struct holds all monitor state with proper mutex protection - Engine.Start(ctx) and monitorRoutine respect context cancellation for graceful shutdown — no more leaked goroutines - cluster.runFollowerLoop also respects context for clean exit - Token index (map[string]int) for O(1) push heartbeat lookup, replacing O(n) linear scan through LiveState - UpdateSiteConfig preserves 8 runtime fields instead of copying 17 config fields individually - triggerAlert goroutines get 30s timeout context - All consumers (TUI, server, cluster, main) receive *Engine via constructor/parameter — no package-level state access - main.go creates context.WithCancel, passes to engine and cluster First test suite: 12 tests across store and alert packages - Store: CRUD for sites/alerts/users, push token generation, import/export round-trip, check history persistence - Alert: Discord/Slack/Webhook payload format, HTTP 4xx error propagation, Ntfy headers, unknown provider returns nil	2026-05-15 08:21:17 -04:00
lerko	a6bb9a7aff	refactor(core): remove store global singleton, thread store explicitly Remove store.Get()/SetGlobal()/Current. Store is now passed explicitly to all consumers via constructor parameters and function arguments. - TUI Model holds store field, set via InitialModel(isAdmin, store) - monitor.StartEngine(s) and InitHistoryFromStore(s) accept store - server.Start(cfg, s) closes over store in HTTP handlers - main.go threads store to SSH server, TUI, monitor, server - isKeyAllowed receives store as parameter No more hidden dependency on package-level mutable state in store pkg. Monitor package still uses package-level state (LiveState, etc.) — will be encapsulated into Engine struct in Phase 7.	2026-05-15 00:45:07 -04:00
lerko	d4f4012c8a	refactor(store): add error returns to all Store interface methods Every Store method now returns an error. Callers handle errors gracefully — TUI logs to event log, server returns HTTP 500, monitor engine logs and retries. All rows.Scan() errors are now checked in sqlstore.go instead of silently appending corrupt data. - GetSites, GetAllAlerts, GetAllUsers return ([]T, error) - GetAlert returns (AlertConfig, error) instead of (AlertConfig, bool) - AddSite, UpdateSite, DeleteSite, etc. all return error - SaveCheck, LoadAllHistory, ExportData return error - ~25 caller sites updated across tui, server, monitor, main	2026-05-15 00:37:20 -04:00
lerko	ab75f61c6b	refactor(store): unify SQLite and Postgres into dialect-based SQLStore Extract shared SQLStore with Dialect interface for the ~5% that differs between backends (DDL, placeholders, sequence resets). - New dialect.go: Dialect interface + placeholder rewriter (? → $N) - New sqlstore.go: single implementation of all 19 Store methods - sqlite.go: reduced from 286 to 83 lines (SQLiteDialect only) - postgres.go: reduced from 266 to 78 lines (PostgresDialect only) - main.go: use NewSQLiteStore/NewPostgresStore constructors Zero CRUD logic duplication. Every future schema change written once.	2026-05-15 00:31:44 -04:00
lerko	4d5116644f	fix(core): correctness and robustness fixes across all subsystems - Move status page template to package-level template.Must (panic on parse error at init instead of nil deref at runtime) - Fix XSS in import error responses (log detail server-side, return generic message to client) - Handle ListenAndServe errors in HTTP and SSH servers - Use defer resp.Body.Close() in all alert providers, check json.Marshal errors - Share HTTP clients across checks instead of creating per-request - Use http.NewRequestWithContext for per-site timeout control - Support HTTP method field (was always GET despite DB storing method) - Implement AcceptedCodes validation (was hardcoded >= 400 despite DB storing accepted code ranges) - Add defer tx.Rollback() to ImportData for transaction safety	2026-05-15 00:00:02 -04:00
lerko	e97780ad38	fix(tui,status,store): add delete confirm, input validation, XSS fix, history persistence Prevent accidental deletes with y/n confirmation dialog. Validate all numeric form inputs (interval, port, timeout, threshold, retries) with range checks instead of silently defaulting to zero. Escape user-supplied data in status page JavaScript to close XSS via monitor names. Persist check history to new check_history table so sparklines and uptime percentages survive restarts.	2026-05-14 20:51:06 -04:00
lerko	6d92df4f46	feat(importer): add Uptime Kuma backup converter with CLI and API Convert Kuma monitorList/notificationList to go-upkeep Backup format. Maps all monitor types (http, ping, port, dns, group), ntfy notifications with auth, parent IDs, and alert assignments. Available via --import-kuma flag and POST /api/import/kuma endpoint.	2026-05-14 17:30:17 -04:00
lerko	f06dd5702b	feat(models): widen Site struct and DB schema for ping, port, dns, group monitor types Add Hostname, Port, Timeout, Method, Description, ParentID, AcceptedCodes, DNSResolveType, DNSServer, and IgnoreTLS fields. Refactor AddSite/UpdateSite to accept models.Site instead of individual params. Includes DB migrations for existing databases, per-monitor timeout/TLS in the engine, new type options in TUI forms, and TYPE column in the sites table.	2026-05-14 17:10:56 -04:00
lerko	11848ce674	fix(security): harden TLS, timeouts, validation, logging, and token generation - Default TLS verification on, opt-in UPKEEP_INSECURE_SKIP_VERIFY - Alert webhooks use 10s timeout client, close response bodies - URL input validates http/https scheme for HTTP monitors - Stdlib logs route to stderr instead of discard - Panic on crypto/rand failure in token generation - Cluster startup warnings for non-HTTPS and missing secret - Replace demo SMTP creds with obvious placeholders - Color-coded log entries and scroll hints in logs tab	2026-05-14 15:28:04 -04:00
lerko	02f0a39d97	feat: initial commit — uptime monitor (forked from go-upkeep) Go-based uptime monitor with SQLite/Postgres storage, TUI dashboard, SSH server, alerting, and clustering support.	2026-05-14 11:05:10 -04:00

29 Commits