Commit Graph

24 Commits

Author SHA1 Message Date
lerko 7d0b4dab8b fix(tui): clean pseudo-version in footer, rename Sites tab to Monitors
Strip Go module pseudo-version suffix (timestamp+hash+dirty) from the
footer version string — shows "v0.1.0" instead of the full build
metadata. Rename "Sites" tab and breadcrumbs to "Monitors" for
consistency with README, CLI help, and user-facing docs.
2026-06-20 14:03:48 -04:00
lerko dbd519c121 fix: 4 additional release-consistency findings
CI / test (pull_request) Successful in 1m46s
CI / lint (pull_request) Successful in 1m11s
CI / vulncheck (pull_request) Successful in 51s
- Disable healthcheck on probe compose services (no HTTP server)
- Remove stale "(Phase 4)" comment from dev compose
- Add data/ to .gitignore (compose volume creates deploy/data)
- Clarify -db-type flag help text (sqlite or postgres)
2026-06-19 20:37:42 -04:00
lerko b32145fb58 fix: resolve 13 release-consistency findings
Documentation:
- Fix CI badge href to /actions (was 404 on Gitea)
- Add UPTOP_METRICS_PUBLIC + UPTOP_MAINT_RETENTION to README env table
- Link maintenance retention to env var name in data retention section
- Note metrics auth requirement in features list
- Fix clustering.md: fail-closed wording, mark AGG_STRATEGY/NODE_REGION optional
- Fix .env.example: wording (no .env loader), add TRUSTED_PROXIES + MAINT_RETENTION
- Add CLI help/usage with subcommand listing, accept serve/help/-h/-version

Docker/deploy:
- Add EXPOSE 8080 to Dockerfile
- Remove dead LIPGLOSS_RENDERER_HAS_DARK_BACKGROUND env
- Exempt /api/health from cluster auth (fixes Docker HEALTHCHECK 401)
- Add sysctls for unprivileged ping to all compose files

Cosmetic:
- Fix bug_report.yaml: SemVer placeholder, remove nonexistent serve subcommand
2026-06-19 20:09:03 -04:00
lerko 9ee5908af5 fix(version): fall back to embedded build info when ldflags absent
CI / test (pull_request) Successful in 1m49s
CI / lint (pull_request) Successful in 1m11s
CI / vulncheck (pull_request) Successful in 51s
go install module@tag compiles without GoReleaser's ldflags, so
--version reported "dev" and the TUI footer matched. The module
version and vcs stamps are embedded in every binary; read them via
debug.ReadBuildInfo when the ldflags defaults are untouched. Release
builds unchanged — ldflags still win.
2026-06-12 18:41:43 -04:00
lerko edfe6122b1 fix: Kuma import tokens/paused, Docker hardening, migrate-secrets idempotency
CI / test (pull_request) Successful in 1m54s
CI / lint (pull_request) Successful in 1m27s
CI / vulncheck (pull_request) Successful in 56s
1. Kuma import now maps push monitor tokens (generates crypto/rand
   token) and paused state (Active=false → Paused=true). Previously
   push monitors imported with empty token sat DOWN forever, and
   paused Kuma monitors came in unpaused and started alerting.

2. Dockerfile adds HEALTHCHECK against /api/health on port 8080.
   Container orchestrators can now detect unhealthy instances.

3. migrate-secrets sets the encryptor before loading alerts, so
   already-encrypted settings are decrypted correctly on second run
   instead of failing with a JSON unmarshal error.

4. docker-compose.yml adds container hardening: read_only filesystem,
   cap_drop ALL, no-new-privileges, tmpfs for /tmp.
2026-06-12 08:39:30 -04:00
lerko 5d2b7a3e66 fix: seven quick-win bug fixes across engine, server, TUI, CLI
CI / test (pull_request) Successful in 1m55s
CI / lint (pull_request) Successful in 1m27s
CI / vulncheck (pull_request) Successful in 1m1s
1. Alertless monitors no longer spam error logs — triggerAlert
   returns early when alertID <= 0.

2. HTTP response body drained before close — enables connection
   reuse via keep-alive instead of fresh TCP+TLS per check.

3. /api/backup/export enforces GET — was the only endpoint
   accepting any HTTP method.

4. limitStr guards against max < 3 — prevents negative slice
   index panic on very narrow terminals.

5. Filter input accepts multibyte characters — len(msg.Runes)
   instead of len(msg.String()) for proper Unicode support.

6. Startup warning corrected — with no UPTOP_CLUSTER_SECRET,
   endpoints reject (401), not accept. Warning now says so.

7. UPTOP_KEYS file open failure logged — was silently swallowed,
   leaving operators with no admin seeded and no message.
2026-06-11 18:28:32 -04:00
lerko 341d60d2fe refactor: unify logging with log/slog
CI / test (pull_request) Successful in 1m57s
CI / lint (pull_request) Successful in 1m22s
CI / vulncheck (pull_request) Successful in 56s
Replace three uncoordinated logging systems (log.Printf, fmt.Fprintf
to stderr, fmt.Println warnings) with structured slog calls.

68 log calls migrated:
- log.Printf → slog.Error/Warn/Info (45 calls across 5 files)
- fmt.Fprintf(os.Stderr) → slog.Error (23 calls in main.go)

Kept unchanged:
- fmt.Println/Printf for CLI user output (version, banners, import results)
- engine.AddLog for TUI-visible ring buffer (monitoring events)

Store migration diagnostics demoted to slog.Debug (silent at default
info level). HTTP request logging now structured with method/path/
status/duration/ip attributes.
2026-06-11 18:00:19 -04:00
lerko 52ccd7ad91 refactor(models): split Site into SiteConfig + SiteState
CI / test (pull_request) Successful in 1m58s
CI / lint (pull_request) Successful in 1m21s
CI / vulncheck (pull_request) Successful in 1m2s
Site now embeds SiteConfig (22 persistent fields) and SiteState
(11 ephemeral runtime fields). Field access unchanged via promotion
— site.Name and site.Status still work.

Store layer deals exclusively in SiteConfig — the DB never sees
runtime state. Engine's liveState keeps full Site composites.
UpdateSiteConfig reduced from 11-line field-by-field copy to
`existing.SiteConfig = cfg`.

RunCheck takes SiteConfig (only needs config fields). Checker is
now statically prevented from reading/writing runtime state.

Backup.Sites changed to []SiteConfig — exports no longer carry
zero-valued runtime fields. Import backward-compatible (json
ignores unknown fields).
2026-06-11 17:13:09 -04:00
lerko 54790db5c8 refactor(config): consolidate env parsing into appConfig struct
New cmd/uptop/config.go with appConfig struct + parseConfig() that
reads all 25 UPTOP_* env vars in one place with defaults. Replaces
~120 lines of scattered os.Getenv calls in runServe.

runServe now reads cfg := parseConfig() up front. ServerConfig
built via cfg.serverConfig(). Uniform flag > env > default
precedence for port/db-type/dsn via flag defaults from config.
2026-06-11 16:29:47 -04:00
lerko 2b357341c8 refactor(store): shared storetest.BaseMock replaces 5 duplicated mocks
CI / test (pull_request) Successful in 1m57s
CI / lint (pull_request) Successful in 1m16s
CI / vulncheck (pull_request) Successful in 1m1s
New internal/store/storetest/mock.go provides BaseMock implementing
the full Store interface with no-op defaults and optional Func field
overrides. Each test file embeds BaseMock and shadows only the methods
it needs.

Removes ~400 lines of duplicated stub methods across 6 test files.
Adding a Store method now requires one addition (BaseMock) instead
of editing 6 files.
2026-06-11 16:09:29 -04:00
lerko 70a83a1da9 refactor(store): propagate context.Context through all Store methods
Every Store interface method (except Close) now takes context.Context
as first parameter. All 54 db.Query/Exec/QueryRow calls in SQLStore
replaced with their *Context variants. DB operations now respect
cancellation and deadlines.

Context sources by caller:
- Engine dbWriter/poll/pruner: engine ctx from Start()
- HTTP handlers: r.Context()
- config.Apply/Export: caller-provided ctx
- TUI/main.go init: context.Background()

RunCheck and all sub-checks (HTTP/ping/port/DNS) accept parent ctx.
HTTP checks now inherit shutdown cancellation instead of rooting in
context.Background(). dbWrite.exec takes ctx so the writer goroutine
can cancel stuck DB operations.

DeleteSite/ImportData use BeginTx(ctx) instead of Begin().
2026-06-11 14:40:30 -04:00
lerko 92efb8e270 fix(security): make SSH key revocation fail closed
CI / test (pull_request) Successful in 2m37s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
keyCache.Invalidate existed but had zero callers, and refresh silently
swallowed store errors — a revoked key kept working off the stale
cache for as long as the DB stayed down.

Invalidate now clears the key set (not just the timestamp) and is
wired through userInvalidatingStore, a decorator at the composition
root that drops the cache on AddUser/UpdateUser/DeleteUser/ImportData.
Transient refresh errors still retain the previous key set so a DB
blip can't lock every admin out, but a post-revocation refresh failure
denies. Refresh errors are logged. First tests for the SSH auth gate.

Also suppresses per-request HTTP logging when the local TUI owns the
terminal — request logs scribbled over the alt screen.
2026-06-11 12:26:40 -04:00
lerko 809620340e fix(security): close XFF bypass and three secret-leak paths
CI / test (pull_request) Successful in 2m36s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 46s
Four fixes hardening the secrets and rate-limit posture a prior audit
left or that regressed:

X-Forwarded-For rate-limit bypass + memory DoS (ratelimit.go): clientIP
returned the raw XFF header, so an attacker rotating it minted unlimited
distinct limiter keys — never tripping the limit and growing the visitors
map without bound. XFF is now honored only when the immediate peer is a
configured trusted proxy (UPTOP_TRUSTED_PROXIES, CIDRs or bare IPs), using
the right-most non-trusted hop; otherwise the key is the real RemoteAddr.
The visitors map is bounded with LRU eviction as defense in depth.

Export redaction denylist -> per-provider allowlist (server.go): the old
six-key denylist missed the actual credentials — the webhook URL for
discord/slack/webhook/ntfy/gotify and api_key for opsgenie — exporting
them in the clear. redactByProvider keeps only known-safe keys per
provider type and redacts everything else, so unknown/new keys fail safe.

ImportData plaintext secrets (sqlstore.go): import inserted raw
json.Marshal(settings), bypassing the encryption AddAlert/UpdateAlert
use. It now routes through marshalSettings, so a restore with
UPTOP_ENCRYPTION_KEY set stores enc:-prefixed ciphertext, not plaintext.

Alert error credential leak (alert.go): provider Send returned the raw
*url.Error, whose URL carries the secret (Telegram bot token in the path,
webhook secrets in the URL); it was persisted to AlertHealth.LastError
and shown in the TUI. sanitizeError strips the URL, keeping the operation
and underlying cause.

Tests cover trusted/untrusted XFF + spoofed-bypass + map bound, the
allowlist per provider, encrypted-on-import round-trip, and URL-stripped
errors. README documents UPTOP_TRUSTED_PROXIES. Full suite green under
-race; golangci-lint clean.
2026-06-10 18:50:19 -04:00
lerko 8b39d4c1a1 fix(monitor): serialize DB writes through a single drained writer
CI / test (pull_request) Successful in 2m36s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
Every check spawned `go e.db.Save*(...)` with the error discarded: a
fire-and-forget goroutine per log line, check, state change, and alert
health update. SaveLog ran a full-table prune DELETE on every insert and
SaveCheck a COUNT + conditional prune on every check, so the hot path
amplified each write into several statements. Nothing tracked these
goroutines, so at shutdown they raced the store's Close() — writes to a
closing DB, silently swallowed.

Introduce a single writer goroutine that drains a buffered channel of
typed dbWrite values (log/check/state-change/alert-health). Writes are
enqueued non-blocking; a saturated queue drops and notes it in the
in-memory log rather than blocking the check loop. Write errors are now
logged instead of discarded. Retention moves off the hot path: SaveLog
and SaveCheck become plain INSERTs, and PruneLogs/PruneCheckHistory/
PruneStateChanges run on a 10-minute timer inside the writer (single
keep-newest-N-per-site pass via a window function). state_changes was
previously never pruned — now bounded.

Add Engine.Stop(): cancels the engine's context, then waits for the
writer to drain every buffered write before returning. main wires it in
before the deferred store Close() so no write races a closed DB.

SQLite gains busy_timeout=5000 and synchronous=NORMAL, applied via the
DSN so every pooled connection inherits them (a post-open PRAGMA only
touches one connection); WAL moves to the DSN too. :memory: test DBs are
left as-is.

Tests: writer drains on Stop, Stop is idempotent, and the prune queries
keep newest-N per site / N logs on real SQLite. Full suite green under
-race.
2026-06-10 18:14:28 -04:00
lerko 21a1563e53 feat(monitor): auto-prune expired maintenance windows
CI / test (pull_request) Successful in 2m33s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 50s
Background goroutine runs every 15 minutes, deletes maintenance windows
that expired beyond the retention period (default 7 days). Configurable
via UPTOP_MAINT_RETENTION env var (Go duration format).

Closes #72
2026-06-05 18:27:42 -04:00
lerko e0cb0adebd fix(tui): quick wins batch — version footer, column widths, zebra, sparkline
CI / test (pull_request) Successful in 2m34s
CI / lint (pull_request) Successful in 57s
CI / vulncheck (pull_request) Successful in 51s
- Show version in dashboard footer (wired from goreleaser ldflags)
- Cap name column at 35, raise sparkline minimum to 15 chars
- Preserve zebra background on group rows (was lost by style override)
- Group sparkline uses bullet • instead of heavy circle ●
2026-06-04 14:56:01 -04:00
lerko 8f17deba67 chore: migrate module path to lerkolabs org
CI / test (pull_request) Successful in 2m39s
CI / lint (pull_request) Successful in 1m6s
CI / vulncheck (pull_request) Successful in 46s
Move Go module from gitea.lerkolabs.com/lerko/uptop to
gitea.lerkolabs.com/lerkolabs/uptop. Updates all imports,
go.mod, goreleaser owner, and README links.
2026-05-29 14:22:49 -04:00
lerko 026e969b74 chore: TUI screenshots, README polish, changelog rewrite (#32)
CI / test (push) Successful in 2m41s
CI / lint (push) Successful in 1m11s
CI / vulncheck (push) Successful in 56s
- Add 6 TUI screenshots to assets/ (monitors, alerts, logs, nodes, detail, theme)
- Rewrite README with hero image, badges, collapsible install sections
- Rewrite changelog to match actual CalVer tag history
- VHS tooling extracted to lerko/uptop-vhs

Reviewed-on: lerko/uptop#32
2026-05-29 17:45:31 +00:00
lerko d8a2cab90f feat: seed SSH users from env var and authorized_keys file (#31)
CI / test (push) Successful in 2m36s
CI / lint (push) Successful in 1m12s
CI / vulncheck (push) Successful in 56s
Release / release (push) Has been cancelled
Release / docker (push) Has been cancelled
## Summary

Docker onboarding was broken — no way to add first SSH user without `docker attach` to TUI.

Now reads SSH public keys from two sources on startup:
- `UPTOP_ADMIN_KEY` env var — single key for quick single-user setup
- `UPTOP_KEYS` file path — authorized_keys format for team setup

Dockerfile already sets `UPTOP_KEYS=/data/authorized_keys` and compose mounts `./data:/data`, so the flow is:

```
echo "ssh-ed25519 AAAA... me@host" > ./data/authorized_keys
docker compose up -d
ssh -p 23234 localhost
```

### Behavior
- Skips keys already in DB (idempotent across restarts)
- All seeded users get admin role
- Username parsed from key comment (e.g. `tyler@macbook` → `tyler`)
- Comments and blank lines in keys file are ignored

### Tested
- UPTOP_ADMIN_KEY seeds single admin user
- UPTOP_KEYS file seeds multiple users with correct usernames
- Second startup skips existing keys (no duplicates)
- Build and all tests pass

Reviewed-on: lerko/uptop#31
2026-05-27 21:15:00 +00:00
lerko 986f9f1d55 fix(security): phase 4 code quality and low-severity fixes
CI / test (pull_request) Successful in 4m24s
CI / lint (pull_request) Successful in 1m1s
- Fix limitStr to handle multi-byte UTF-8 characters correctly
- Sanitize log messages: strip ANSI escape sequences and newlines
- URL-encode probe node_id instead of string concatenation
- Fix follower resp.Body leak on non-200 responses
- Make SSH host key path configurable via UPTOP_SSH_HOST_KEY env var
- Add HTTP method checks on GET-only endpoints (405 for wrong methods)
- Extract magic numbers into named constants across monitor/store/server
- Standardize error output to stderr for all startup errors
2026-05-26 17:25:47 -04:00
lerko bd561d9a5e fix(security): phase 3 medium reliability and hardening
CI / test (pull_request) Successful in 4m23s
CI / lint (pull_request) Successful in 1m11s
- Fail hard on critical migration errors (ignore only "already exists")
- Cache SSH user keys with 30s TTL (avoid DB query per auth attempt)
- Configure DB connection pooling (25 open, 5 idle, 5m lifetime)
- Enable SQLite WAL mode for concurrent read/write
- Optimize check history pruning (only prune above 1100 rows)
- Add security headers: X-Content-Type-Options, X-Frame-Options, CSP, Referrer-Policy
- Add CORS policy on /status/json via UPTOP_CORS_ORIGIN env var
- Add HTTP request logging middleware (method, path, status, duration, IP)
- Fix config file permissions from 0644 to 0600
- Pin Docker images: golang:1.24-alpine3.21, alpine:3.21
- Fix Docker CI tag pattern for CalVer (was semver)
- Pass build args (VERSION, COMMIT, BUILD_DATE) to Docker build
2026-05-26 16:57:03 -04:00
lerko d30d1460bd fix(security): phase 2 high-severity hardening
CI / test (pull_request) Successful in 4m31s
CI / lint (pull_request) Successful in 56s
- Push heartbeat accepts Authorization: Bearer header (query string deprecated)
- Gotify alerts use X-Gotify-Key header instead of token in URL
- Per-IP rate limiting on all API endpoints (token-bucket)
- /metrics gated behind cluster secret (UPTOP_METRICS_PUBLIC=true to opt out)
- Config export redacts passwords/tokens by default (redact_secrets=false to override)
- Fix rewritePlaceholders for 100+ SQL parameters
- Fix AddSiteReturningID/AddAlertReturningID race with LastInsertId/RETURNING
- HTTP server timeouts: read 30s, write 60s, idle 120s
2026-05-25 21:15:33 -04:00
lerko 60b30935b3 fix(security): phase 1 critical fixes for public release
CI / test (pull_request) Successful in 4m40s
CI / lint (pull_request) Successful in 1m2s
- Redact PostgreSQL DSN password from stdout/logs
- Harden .dockerignore to exclude .ssh/, .claude/, *.db, *.local files
- SSRF protection: block private/loopback/link-local IPs by default
  (UPTOP_ALLOW_PRIVATE_TARGETS=true to override for homelab use)
- Fix email header injection via CRLF in monitor names
- AES-256-GCM encryption for alert credentials at rest
  (UPTOP_ENCRYPTION_KEY env var, migrate-secrets subcommand)
- TLS support for HTTP server (UPTOP_TLS_CERT/UPTOP_TLS_KEY)
  with HSTS header when TLS enabled
2026-05-25 11:26:47 -04:00
lerko 9d12e3ecf1 chore: complete rename from go-upkeep to uptop
CI / test (pull_request) Successful in 4m26s
CI / lint (pull_request) Successful in 1m11s
- Module path: gitea.lerkolabs.com/lerko/uptop
- Binary: cmd/uptop/
- All imports updated to full module path
- Env vars: UPKEEP_* → UPTOP_*
- Prometheus metrics: upkeep_* → uptop_*
- Default DB: uptop.db
- Docker image: lerko/uptop
- All docs, compose files, CI updated

Only remaining "go-upkeep" reference is the fork attribution in README.
2026-05-24 20:20:35 -04:00