Compare commits

..

42 Commits

Author SHA1 Message Date
lerko 47d3b0e68f fix(tui): bump Subtle ANSI fallback from "8" to "7"
CI / test (pull_request) Successful in 1m49s
CI / lint (pull_request) Successful in 1m12s
CI / vulncheck (pull_request) Successful in 56s
Bright black ("8") plus Faint made PENDING status and dividers nearly
invisible in 16-color terminals. White ("7") with Faint renders as a
readable dim gray while still sitting below Muted in the hierarchy.
2026-06-19 17:27:54 -04:00
lerko 8fd13fefbf feat(tui): add monochrome emphasis attributes for SSH readability
Apply Bold/Faint attributes to semantic styles following htop's
monochrome design principle. Creates 4-tier visual hierarchy that
works even when colors collapse: Bold (danger/warn), Normal (success/
default), Faint (subtle/stale/borders/inactive tabs). Complements
the ANSI-16 color fallbacks without affecting TrueColor appearance.
2026-06-19 17:27:54 -04:00
lerko 974c4b61ea fix(tui): add ANSI-16 color fallbacks for SSH terminals
Theme colors now use lipgloss.CompleteColor with hand-picked ANSI-16
values instead of raw hex. Prevents algorithmic degradation from
collapsing dark backgrounds into indistinguishable ANSI colors over
SSH. Backgrounds fall through to terminal default in 16-color mode;
semantic colors map to distinct ANSI indices (green/yellow/red/blue/
cyan/magenta). TrueColor rendering is unchanged.
2026-06-19 17:27:54 -04:00
lerko d50a5159d4 fix(release): pin GoReleaser to triggering tag
CI / test (pull_request) Successful in 1m43s
CI / lint (pull_request) Successful in 1m22s
CI / vulncheck (pull_request) Successful in 56s
GORELEASER_CURRENT_TAG prevents GoReleaser from resolving the
wrong tag via git-describe when multiple tags point to the same
commit (e.g. v0.1.0 + v0.1.0-rc.5 on adf8fed).
2026-06-17 17:26:16 -04:00
lerko adf8fed44f docs(changelog): regenerate at HEAD before v0.1.0 tag
CI / test (pull_request) Successful in 1m49s
CI / lint (pull_request) Successful in 1m17s
CI / vulncheck (pull_request) Successful in 51s
Release Binaries / release (push) Successful in 2m22s
Release Docker / docker (push) Successful in 11m3s
2026-06-17 14:00:05 -04:00
lerko c2bfa5ad82 fix: resolve 4 tag-blocking issues for v0.1.0
CI / test (pull_request) Successful in 1m43s
CI / lint (pull_request) Successful in 1m11s
CI / vulncheck (pull_request) Successful in 51s
- README/CONTRIBUTING quick start: add UPTOP_ADMIN_KEY so SSH works on
  fresh DB, fix single-file go run path that doesn't compile
- apply --dry-run: assign placeholder IDs for new alerts and groups so
  resolveAlertID succeeds when monitors reference not-yet-created alerts
- deploy/*.yml: switch user-facing compose files from broken build
  context to image: lerkolabs/uptop:latest, fix dev context to ..
2026-06-16 20:32:41 -04:00
lerko 2e07e16b45 refactor(tui): restructure site form to 2 type-aware pages
CI / test (pull_request) Successful in 2m2s
CI / lint (pull_request) Successful in 1m17s
CI / vulncheck (pull_request) Successful in 56s
Replace 4-page paginated form (17 fields for HTTP) with a 2-page
type-aware layout. Page 1 shows core fields + type-specific target
(URL for HTTP, Hostname for ping, etc). Page 2 shows configuration
with pre-filled defaults. Group type gets 1 page.

Form rebuilds dynamically when monitor type changes, preserving
all entered values via pointer-bound siteFormData. Focus returns
to the Type select after rebuild so users can continue forward.
WithWidth set explicitly on rebuild to prevent placeholder truncation.
2026-06-16 19:39:52 -04:00
lerko dd34da4d67 fix(tui): sync selectedID on click so refreshLive doesn't revert cursor
handleClick set m.cursor but returned without calling syncSelectedID,
causing the next refreshLive tick to snap the cursor back to the
previously selected site.
2026-06-16 16:58:56 -04:00
lerko de51dde6e6 docs(changelog): regenerate full history for v0.1.0
CI / test (pull_request) Successful in 1m44s
CI / lint (pull_request) Successful in 1m6s
CI / vulncheck (pull_request) Successful in 57s
Replaces the stale CalVer changelog (last updated 2026-06-02, all
referenced tags since deleted) with git-cliff output simulated against
the v0.1.0 tag — matches what release-binaries.yml will publish as
release notes.
2026-06-12 19:38:55 -04:00
lerko e2024bcab1 fix(release): drop body-grep Security grouping, map polish type in cliff
The body=".*security" parser ran before the docs/chore skip rules, so
any commit merely mentioning security in its body landed in the
Security section (e.g. a docs commit and the CalVer->SemVer ci commit).
Real security fixes never reached it — ^fix matches first — so the rule
only ever produced miscategorized entries. Removed.

polish(...) commits had no parser, and git-cliff defaults the group to
the raw type — rendering a stray lowercase "polish" section. Mapped to
Changed.
2026-06-12 19:38:55 -04:00
lerko fb4e14ecd1 docs(readme): fix broken quick start, stale binary targets, canonical origin
- go run cmd/uptop/main.go broke when PR #108 split config.go out of
  main.go — use ./cmd/uptop package path (same fix #104 applied to
  Dockerfile and goreleaser)
- binary section claimed Linux amd64 only; goreleaser ships
  linux/darwin/windows x amd64/arm64 + deb/rpm since #104
- state canonical repo (Gitea) vs mirror (GitHub) up front
- UPTOP_ADMIN_KEY is the only documented first-run path
2026-06-12 19:38:55 -04:00
lerko 9ee5908af5 fix(version): fall back to embedded build info when ldflags absent
CI / test (pull_request) Successful in 1m49s
CI / lint (pull_request) Successful in 1m11s
CI / vulncheck (pull_request) Successful in 51s
go install module@tag compiles without GoReleaser's ldflags, so
--version reported "dev" and the TUI footer matched. The module
version and vcs stamps are embedded in every binary; read them via
debug.ReadBuildInfo when the ldflags defaults are untouched. Release
builds unchanged — ldflags still win.
2026-06-12 18:41:43 -04:00
lerko eff67332aa fix(release): exclude rc tags from cliff tag_pattern so launch notes span full history
CI / test (pull_request) Successful in 1m48s
CI / lint (pull_request) Successful in 1m11s
CI / vulncheck (pull_request) Successful in 51s
ignore_tags drops rc-tagged commits from the final tag's section instead
of folding them forward — a simulated v0.1.0 rendered zero commits.
Excluding rc tags from tag_pattern makes finals span back to the last
real tag (full history for v0.1.0, verified 8.8KB in a scratch clone)
and rc tags render [Unreleased] with everything pending.
2026-06-12 17:47:48 -04:00
lerko dc4c5fdf8a fix(release): remove tagged scan image in cleanup step
CI / test (pull_request) Successful in 1m51s
CI / lint (pull_request) Successful in 1m12s
CI / vulncheck (pull_request) Successful in 51s
Release Binaries / release (push) Successful in 2m18s
Release Docker / docker (push) Successful in 10m52s
2026-06-12 17:21:42 -04:00
lerko 96eb3e8185 fix(release): scan gates docker push, rc tags spare :latest, mirror waits for stable assets
CI / test (pull_request) Successful in 1m52s
CI / lint (pull_request) Successful in 1m16s
CI / vulncheck (pull_request) Successful in 50s
rc.2 proved the grype gate was decorative — buildx pushed before the
scan ran, so a red run still shipped the image (and rc tags moved
:latest). Build amd64 locally, scan that, then run the multi-arch push
from the warm builder cache. :latest now only moves on non-rc tags.

mirror-release: poll until the Gitea asset count is stable across two
polls (GoReleaser uploads sequentially — assets>0 could mirror a partial
set) and stretch the timeout to 20 min since the release run can queue
behind the Docker job on the single runner.
2026-06-12 17:20:48 -04:00
lerko 37bf443e29 fix(release): suppress wish GHSA alias in grype, fold rc tags into launch notes
CI / test (pull_request) Successful in 1m44s
CI / lint (pull_request) Successful in 1m11s
CI / vulncheck (pull_request) Successful in 51s
Release Binaries / release (push) Successful in 2m9s
Release Docker / docker (push) Successful in 10m18s
The existing .grype.yaml ignore listed the wish SCP traversal only by CVE
id; grype's db now matches it as GHSA-xjvp-7243-rg9h and ignores are
exact-id, so the rc.2 scan gate tripped on an already-triaged finding.
List both ids. Vulnerable SCP middleware is never compiled in; real fix
is the charm v2 stack migration (#126).

cliff.toml ignore_tags folds rc tags into the next real release so
v0.1.0's notes cover full history instead of commits-since-rc.2.
2026-06-12 17:02:55 -04:00
lerko f53dfa1c4c fix(release): repair pipeline defects found in v0.1.0-rc.1 rehearsal
CI / test (pull_request) Successful in 1m44s
CI / lint (pull_request) Successful in 1m12s
CI / vulncheck (pull_request) Successful in 50s
Release Binaries / release (push) Successful in 2m11s
Release Docker / docker (push) Failing after 10m3s
Four defects from the rc.1 dress rehearsal:

- Dockerfile pinned golang:1.26-alpine3.23 at a 1.26.3 digest while
  go.mod requires 1.26.4; golang images set GOTOOLCHAIN=local, so the
  build hard-fails. Pin 1.26.4-alpine3.23 explicitly.
- changelog.disable swallowed --release-notes (the flag is consumed by
  the changelog pipe), publishing empty release bodies. Re-enable.
- Remove the Gitea-side GitHub relay step: redundant with
  .github/workflows/mirror-release.yml, which runs on GitHub Actions
  with the built-in token and copies the canonical Gitea assets.
- mirror-release.yml: jq '.body // empty' treats "" as truthy so the
  notes fallback never fired; use select(). Mark rc tags --prerelease.
2026-06-12 16:16:28 -04:00
lerko 4070691407 docs: close pre-release documentation gaps
CI / test (pull_request) Successful in 2m0s
CI / lint (pull_request) Successful in 1m22s
CI / vulncheck (pull_request) Successful in 51s
Release Binaries / release (push) Failing after 8m31s
Release Docker / docker (push) Failing after 2m17s
- Docker compose: ping_group_range sysctl, without which ping monitors
  silently report DOWN in containers
- README: data retention table (1000 checks / 5000 state changes per
  monitor, 200 logs, pruned automatically), group-alert limitation note
- config-as-code: apply is not atomic + re-run convergence, backup
  redaction footgun (/api/backup/export redacts by default), opsgenie
  example (provider count was stale at 9), ntfy auth keys
2026-06-12 15:37:47 -04:00
lerko 6dfd56dcd4 ci(release): skip GitHub relay when GH_MIRROR_TOKEN absent
CI / test (pull_request) Successful in 1m55s
CI / lint (pull_request) Successful in 1m26s
CI / vulncheck (pull_request) Successful in 56s
Relay is on hold until after the rc dress rehearsal — without the
secret the step exits cleanly instead of failing the release run.
Adding the secret later enables it with no workflow change.
2026-06-12 15:31:57 -04:00
lerko 17b5557e23 test(importer): cover malformed Kuma backup input
Importer parses untrusted JSON on the migration onboarding path with no
coverage. Add malformed-input table (truncated, wrong types, null
lists), notification config edge cases, and field-mapping checks.
2026-06-12 15:31:57 -04:00
lerko dc27547ffb docs(monitor): document before-Start contract on engine setters 2026-06-12 15:31:57 -04:00
lerko 83ec6bee42 fix(tui): apply log filter to full log list, not viewport window
viewLogsTab filtered logViewport.View() — the visible window — so the
entry count showed the window size and hidden lines reappeared while
scrolling. Filter and render now happen at content-set time from
engine.GetLogs(); the view only reads stored counts.
2026-06-12 15:31:57 -04:00
lerko d538aad18e ci(release): relay release artifacts to GitHub mirror
GoReleaser publishes to exactly one SCM (Gitea); the push mirror carries
refs but not releases, so GitHub Releases — where the README points —
stayed empty. After the Gitea release, wait for the mirrored tag and
create the GitHub release with the same artifacts and notes.

Needs new Gitea secret GH_MIRROR_TOKEN (GitHub PAT with repo scope).
GITHUB_TOKEN is reserved by Gitea Actions, hence the different name.
2026-06-12 15:31:57 -04:00
lerko ab0a69d06b fix(cluster)!: rename X-Upkeep-Secret header to X-Uptop-Secret
CI / test (pull_request) Successful in 1m57s
CI / lint (pull_request) Successful in 1m27s
CI / vulncheck (pull_request) Successful in 56s
Last upkeep-era name in the wire protocol. Breaking for mixed-version
clusters, but zero installed base exists pre-v0.1.0 — free now, breaking
forever after first tag.
2026-06-12 14:27:44 -04:00
lerko 7bf278e538 docs(cluster): document split-brain limitation in failover
CI / test (pull_request) Successful in 1m56s
CI / lint (pull_request) Successful in 1m16s
CI / vulncheck (pull_request) Successful in 1m1s
No leader fencing exists — during a network partition both nodes run
checks and fire alerts independently. Document the behavior honestly:
duplicate alerts, doubled history, ~15s takeover, converges on heal.
2026-06-12 12:47:03 -04:00
lerko 023234f4c3 fix(alert): email send respects context deadline
smtp.SendMail ignores context entirely — a blackholed SMTP server
hangs the alert goroutine for the OS TCP timeout (minutes), while the
30s context from the engine does nothing.

Replace with sendMailContext: dials with ctx deadline, sets connection
deadlines, handles STARTTLS and AUTH when advertised. Behavioral
parity with smtp.SendMail but cancellation works throughout.
2026-06-12 12:46:45 -04:00
lerko 4328d25f22 fix(security): API import no longer replaces user accounts
Cluster-secret holder could POST a backup with their own admin key to
/api/backup/import, replacing all users — privilege escalation from
cluster-auth to admin. Also, Kuma imports produced zero users but
ImportWipe unconditionally deleted the users table — locking out all
accounts until restart reseeded UPTOP_ADMIN_KEY.

- Server handlers strip data.Users (set nil) before calling ImportData
- ImportData only wipes+replaces users when data.Users != nil
- New ImportWipeUsers dialect method separates user wipe from data wipe
- CLI restore (main.go) unchanged — full import still replaces users
2026-06-12 12:45:16 -04:00
lerko f745dcb21f fix(security): close DNS-rebind TOCTOU on ping/port checks
Pre-check resolved and validated the target IP, then runPingCheck and
runPortCheck re-resolved by hostname — a DNS rebind between the two
lookups could redirect to a private IP, bypassing the SSRF guard.

Resolve once in RunCheck, pin the validated IP, and pass it down:
- runPingCheck: SetIPAddr with pinned IP (skips internal resolve)
- runPortCheck: dial pinned IP literal instead of hostname

HTTP checks are unaffected (SafeDialContext resolves+validates at
dial time). DNS checks validate the server address, not the target.
2026-06-12 12:42:50 -04:00
lerko e99e959b64 ci: switch versioning from CalVer to SemVer
CI / test (pull_request) Successful in 1m59s
CI / lint (pull_request) Successful in 1m27s
CI / vulncheck (pull_request) Successful in 56s
Go module tooling requires v-prefixed semver tags (go install @latest
ignores CalVer tags entirely), GoReleaser errors on non-semver tags,
and zero-padded CalVer months are invalid semver. Old CalVer tags and
releases were deleted due to pre-release security issues; relaunch
tags as v1.0.0.

- Workflow tag triggers: [0-9]* -> v[0-9]* (Gitea + GitHub relay)
- cliff.toml tag_pattern: regex v[0-9].* (was matching everything --
  tag_pattern is regex since git-cliff 1.4, not glob)
- Docker image tags drop the v prefix per registry convention
2026-06-12 11:13:18 -04:00
lerko c3eac80e14 fix(store): chmod SQLite DB files to 0600 on open
CI / test (pull_request) Successful in 1m57s
CI / lint (pull_request) Successful in 1m26s
CI / vulncheck (pull_request) Successful in 1m2s
Bare-metal installs created the DB with process umask (often 022),
making uptop.db, -wal, and -shm world-readable. These files contain
alert credentials and config. Now chmod 0600 after open. Missing
WAL/SHM siblings (not yet created) are silently skipped. Docker
installs were already mitigated by the non-root UID.
2026-06-12 09:51:11 -04:00
lerko 6cf0efed9b fix: seven fixes — token scan, variadic cleanup, TUI layout, compose secrets
CI / test (pull_request) Successful in 1m54s
CI / lint (pull_request) Successful in 1m27s
CI / vulncheck (pull_request) Successful in 1m1s
1. UpdateSite handles token-read Scan error instead of ignoring it.
   sql.ErrNoRows (nonexistent site) passes through; real DB errors
   surface.

2. RunCheck allowPrivate changed from variadic to real bool param.
   Dead maxRequestBody duplicate removed from sqlstore.go.

3. Footer help bar documents [Space] for group collapse.

4. adjustCursor unified with clampCursor — one clamping path
   instead of two with different semantics.

5. Compose cluster/probe example files annotate hardcoded secrets
   with "EXAMPLE ONLY — rotate before use".

6. huhForm.WithHeight moved from View() to handleResize — no longer
   mutates form state during render.

7. maxTableRows recalculated on filter enter/exit via recalcLayout()
   — was only recalculated on resize, causing off-by-one when the
   filter bar appeared/disappeared.
2026-06-12 09:36:00 -04:00
lerko 9115ab720c fix: six small fixes — rate limiter leak, DST SLA, probe sort, TUI cleanup
CI / test (pull_request) Successful in 1m55s
CI / lint (pull_request) Successful in 1m27s
CI / vulncheck (pull_request) Successful in 56s
1. Rate limiter cleanup goroutine now stoppable via Stop() channel
   instead of looping forever. Prevents goroutine leak in tests.

2. Dead WindowSizeMsg branch in handleFormMsg removed — top-level
   Update handles resize before forms see it.

3. Probe results sorted by node ID — map iteration no longer
   reorders rows every render.

4. fmtAlertConfig takes models.AlertConfig directly instead of an
   anonymous struct the caller builds inline.

5. Backspace no longer aliases delete — d is the documented key.
   Prevents accidental delete-confirm on habitual backspace.

6. SLA daily buckets use time.Date day arithmetic instead of
   Add(-i*24h) — lands on midnight correctly across DST transitions.
2026-06-12 09:18:52 -04:00
lerko edfe6122b1 fix: Kuma import tokens/paused, Docker hardening, migrate-secrets idempotency
CI / test (pull_request) Successful in 1m54s
CI / lint (pull_request) Successful in 1m27s
CI / vulncheck (pull_request) Successful in 56s
1. Kuma import now maps push monitor tokens (generates crypto/rand
   token) and paused state (Active=false → Paused=true). Previously
   push monitors imported with empty token sat DOWN forever, and
   paused Kuma monitors came in unpaused and started alerting.

2. Dockerfile adds HEALTHCHECK against /api/health on port 8080.
   Container orchestrators can now detect unhealthy instances.

3. migrate-secrets sets the encryptor before loading alerts, so
   already-encrypted settings are decrypted correctly on second run
   instead of failing with a JSON unmarshal error.

4. docker-compose.yml adds container hardening: read_only filesystem,
   cap_drop ALL, no-new-privileges, tmpfs for /tmp.
2026-06-12 08:39:30 -04:00
lerko 13637ec216 chore(tui): delete dead braille code, hoist sparkWidth, stop resize flash
CI / test (pull_request) Successful in 1m58s
CI / lint (pull_request) Successful in 1m22s
CI / vulncheck (pull_request) Successful in 1m1s
1. Delete braille.go + braille_test.go — dead code, only referenced
   by its own test. Can be re-added when latency charts are built.

2. Hoist duplicate `const sparkWidth = 40` (update.go + view_detail.go)
   to package-level `detailSparkWidth`. Click-index resolution and
   rendering now share one constant.

3. Remove tea.ClearScreen on every resize — caused full-screen flash
   during continuous resizes. ctrl+l manual clear kept.
2026-06-12 08:03:16 -04:00
lerko 916c963663 fix(engine): apply convergence + push/group check history
CI / test (pull_request) Successful in 1m54s
CI / lint (pull_request) Successful in 1m27s
CI / vulncheck (pull_request) Successful in 1m1s
1. Poll loop now fully converges with the DB: updated site configs
   are refreshed via UpdateSiteConfig, and sites removed from the DB
   are evicted from liveState. Previously the loop only added new
   sites — config edits via apply were ignored until restart, and
   pruned sites kept being checked and alerting.

2. Push monitors now record check history on each heartbeat via
   recordCheck. Previously RecordHeartbeat updated state but never
   wrote to check_history — push uptime % and sparklines were empty.

3. Groups record a synthetic check per evaluation tick so they get
   uptime history and sparklines instead of blank displays.
2026-06-11 20:45:30 -04:00
lerko fa56f47f96 fix(tui): track selection by site ID + q means back everywhere
CI / test (pull_request) Successful in 2m5s
CI / lint (pull_request) Successful in 1m26s
CI / vulncheck (pull_request) Successful in 1m2s
Cursor tracked by site ID instead of positional index. When the
list re-sorts every tick (sites change status), the selection stays
on the same monitor instead of silently jumping to whatever now
occupies that index position.

q now means "back" in detail, history, SLA, and alert-detail views
— consistent with muscle memory from navigating deeper views.
Only the dashboard q quits the app. ctrl+c always quits from
anywhere.
2026-06-11 19:34:21 -04:00
lerko f7da69f25f fix(security): SSRF guard gaps + DNS port restriction + metrics auth
CI / test (pull_request) Successful in 1m54s
CI / lint (pull_request) Successful in 1m27s
CI / vulncheck (pull_request) Successful in 1m1s
1. SSRF guard now blocks 0.0.0.0/8 (routes to localhost on Linux)
   and 100.64.0.0/10 (CGNAT). Also rejects unspecified, multicast,
   and loopback IPs via net.IP methods for defense in depth.

2. DNS monitor type no longer bypasses SSRF guard. The DNSServer
   address is resolved and validated against isPrivateIP before use.
   Port restricted to 53 — prevents arbitrary internal port probing
   via crafted DNSServer values.

3. /metrics now default-deny when MetricsPublic is false, regardless
   of whether UPTOP_CLUSTER_SECRET is set. Previously, no secret =
   no auth check = metrics exposed to everyone.
2026-06-11 18:57:37 -04:00
lerko 5d2b7a3e66 fix: seven quick-win bug fixes across engine, server, TUI, CLI
CI / test (pull_request) Successful in 1m55s
CI / lint (pull_request) Successful in 1m27s
CI / vulncheck (pull_request) Successful in 1m1s
1. Alertless monitors no longer spam error logs — triggerAlert
   returns early when alertID <= 0.

2. HTTP response body drained before close — enables connection
   reuse via keep-alive instead of fresh TCP+TLS per check.

3. /api/backup/export enforces GET — was the only endpoint
   accepting any HTTP method.

4. limitStr guards against max < 3 — prevents negative slice
   index panic on very narrow terminals.

5. Filter input accepts multibyte characters — len(msg.Runes)
   instead of len(msg.String()) for proper Unicode support.

6. Startup warning corrected — with no UPTOP_CLUSTER_SECRET,
   endpoints reject (401), not accept. Warning now says so.

7. UPTOP_KEYS file open failure logged — was silently swallowed,
   leaving operators with no admin seeded and no message.
2026-06-11 18:28:32 -04:00
lerko 341d60d2fe refactor: unify logging with log/slog
CI / test (pull_request) Successful in 1m57s
CI / lint (pull_request) Successful in 1m22s
CI / vulncheck (pull_request) Successful in 56s
Replace three uncoordinated logging systems (log.Printf, fmt.Fprintf
to stderr, fmt.Println warnings) with structured slog calls.

68 log calls migrated:
- log.Printf → slog.Error/Warn/Info (45 calls across 5 files)
- fmt.Fprintf(os.Stderr) → slog.Error (23 calls in main.go)

Kept unchanged:
- fmt.Println/Printf for CLI user output (version, banners, import results)
- engine.AddLog for TUI-visible ring buffer (monitoring events)

Store migration diagnostics demoted to slog.Debug (silent at default
info level). HTTP request logging now structured with method/path/
status/duration/ip attributes.
2026-06-11 18:00:19 -04:00
lerko 52ccd7ad91 refactor(models): split Site into SiteConfig + SiteState
CI / test (pull_request) Successful in 1m58s
CI / lint (pull_request) Successful in 1m21s
CI / vulncheck (pull_request) Successful in 1m2s
Site now embeds SiteConfig (22 persistent fields) and SiteState
(11 ephemeral runtime fields). Field access unchanged via promotion
— site.Name and site.Status still work.

Store layer deals exclusively in SiteConfig — the DB never sees
runtime state. Engine's liveState keeps full Site composites.
UpdateSiteConfig reduced from 11-line field-by-field copy to
`existing.SiteConfig = cfg`.

RunCheck takes SiteConfig (only needs config fields). Checker is
now statically prevented from reading/writing runtime state.

Backup.Sites changed to []SiteConfig — exports no longer carry
zero-valued runtime fields. Import backward-compatible (json
ignores unknown fields).
2026-06-11 17:13:09 -04:00
lerko ba4465daa2 refactor(server): extract Server type with named handler methods
CI / test (pull_request) Successful in 1m53s
CI / lint (pull_request) Successful in 1m21s
CI / vulncheck (pull_request) Successful in 1m2s
Replace the 328-line Start() god function with a Server struct +
11 named handler methods. Routes registered in routes(), middleware
applied in one place.

Start() kept as a convenience wrapper (NewServer + Start) so
existing callers don't need to change unless they want the Server
reference.

Each handler is now independently readable and testable without
parsing a 300-line closure nest.
2026-06-11 16:32:38 -04:00
lerko 54790db5c8 refactor(config): consolidate env parsing into appConfig struct
New cmd/uptop/config.go with appConfig struct + parseConfig() that
reads all 25 UPTOP_* env vars in one place with defaults. Replaces
~120 lines of scattered os.Getenv calls in runServe.

runServe now reads cfg := parseConfig() up front. ServerConfig
built via cfg.serverConfig(). Uniform flag > env > default
precedence for port/db-type/dsn via flag defaults from config.
2026-06-11 16:29:47 -04:00
65 changed files with 2545 additions and 1573 deletions
+6 -1
View File
@@ -3,7 +3,7 @@ name: Release Binaries
on:
push:
tags:
- "[0-9]*"
- "v[0-9]*"
jobs:
release:
@@ -49,6 +49,11 @@ jobs:
version: "~> v2"
args: release --clean --release-notes=/tmp/release-notes.md
env:
GORELEASER_CURRENT_TAG: ${{ github.ref_name }}
GORELEASER_FORCE_TOKEN: gitea
GITEA_TOKEN: ${{ secrets.RELEASE_TOKEN }}
GITEA_API_URL: http://gitea:3000/api/v1
# GitHub release relaying is handled by .github/workflows/mirror-release.yml,
# which runs on GitHub Actions when the push mirror delivers the tag and
# copies this run's Gitea release assets — no PAT needed on this side.
+31 -8
View File
@@ -3,11 +3,11 @@ name: Release Docker
on:
push:
tags:
- "[0-9]*"
- "v[0-9]*"
workflow_dispatch:
inputs:
tag:
description: "Image tag (e.g. 2026.06.1). Defaults to latest commit SHA."
description: "Image tag (e.g. 1.0.0, no v prefix). Defaults to latest commit SHA."
required: false
jobs:
@@ -27,14 +27,20 @@ jobs:
TAG="${{ github.sha }}"
fi
else
# Docker convention: git tag v1.2.3 -> image tag 1.2.3
TAG="${{ github.ref_name }}"
TAG="${TAG#v}"
fi
echo "tag=$TAG" >> "$GITHUB_OUTPUT"
TAGS="lerkolabs/uptop:${TAG}"
TAGS="${TAGS},lerkolabs/uptop:sha-${SHORT_SHA}"
# :latest only for real releases — rc rehearsal tags must not move it
if [ "${{ github.ref_type }}" = "tag" ]; then
TAGS="${TAGS},lerkolabs/uptop:latest"
case "$TAG" in
*-*) ;;
*) TAGS="${TAGS},lerkolabs/uptop:latest" ;;
esac
fi
echo "tags=$TAGS" >> "$GITHUB_OUTPUT"
@@ -50,6 +56,26 @@ jobs:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
# Scan must gate the push: build amd64 locally, scan it, and only then run
# the multi-arch push (amd64 layers come from the builder cache, so the
# second build only adds the arm64 work).
- name: Build for scan (amd64, local)
uses: docker/build-push-action@v5
with:
context: .
load: true
platforms: linux/amd64
tags: uptop-scan:${{ steps.meta.outputs.tag }}
build-args: |
VERSION=${{ steps.meta.outputs.tag }}
COMMIT=${{ github.sha }}
BUILD_DATE=${{ github.event.head_commit.timestamp }}
- name: Scan image for CVEs
run: |
curl -sSfL https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh -s -- -b /usr/local/bin v0.114.0
grype uptop-scan:${{ steps.meta.outputs.tag }} --fail-on critical --output table
- name: Build and push
uses: docker/build-push-action@v5
with:
@@ -64,11 +90,6 @@ jobs:
COMMIT=${{ github.sha }}
BUILD_DATE=${{ github.event.head_commit.timestamp }}
- name: Scan image for CVEs
run: |
curl -sSfL https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh -s -- -b /usr/local/bin v0.114.0
grype lerkolabs/uptop:${{ steps.meta.outputs.tag }} --fail-on critical --output table
- name: Update Docker Hub description
uses: peter-evans/dockerhub-description@v4
with:
@@ -79,5 +100,7 @@ jobs:
- name: Cleanup Docker artifacts
if: always()
run: |
# the scan image is tagged, so image prune won't catch it
docker image rm "uptop-scan:${{ steps.meta.outputs.tag }}" 2>/dev/null || true
docker image prune -f
docker builder prune -f --keep-storage=2GB
+20 -8
View File
@@ -3,7 +3,7 @@ name: Mirror Release to GitHub
on:
push:
tags:
- "[0-9]*"
- "v[0-9]*"
permissions:
contents: write
@@ -19,26 +19,35 @@ jobs:
run: |
API="https://gitea.lerkolabs.com/api/v1/repos/lerkolabs/uptop/releases/tags/${TAG}"
for i in $(seq 1 20); do
# 40 x 30s = 20 min: the Gitea release can queue behind the ~18-min
# Docker job on the single runner. Asset count must hold steady for
# two consecutive polls — GoReleaser uploads one file at a time, and
# mirroring mid-upload would publish a partial asset set.
PREV_COUNT=0
ASSET_COUNT=0
for i in $(seq 1 40); do
if RESPONSE=$(curl -sf "$API" 2>/dev/null); then
ASSET_COUNT=$(echo "$RESPONSE" | jq '.assets | length')
if [ "$ASSET_COUNT" -gt 0 ]; then
echo "Found release with $ASSET_COUNT assets"
if [ "$ASSET_COUNT" -gt 0 ] && [ "$ASSET_COUNT" -eq "$PREV_COUNT" ]; then
echo "Found release with $ASSET_COUNT assets (stable)"
break
fi
echo "Release exists but no assets yet... attempt $i/20"
echo "Release has $ASSET_COUNT assets (was $PREV_COUNT)... attempt $i/40"
PREV_COUNT="$ASSET_COUNT"
else
echo "Waiting for Gitea release... attempt $i/20"
echo "Waiting for Gitea release... attempt $i/40"
fi
sleep 30
done
if [ -z "$RESPONSE" ] || [ "$ASSET_COUNT" -eq 0 ]; then
echo "::error::Gitea release for ${TAG} not found or has no assets after 10 minutes"
echo "::error::Gitea release for ${TAG} not found or has no assets after 20 minutes"
exit 1
fi
echo "$RESPONSE" | jq -r '.body // empty' > /tmp/release-notes.md
# select() so an empty-string body produces an empty file — `// empty`
# treats "" as truthy and wrote a blank line, defeating this fallback.
echo "$RESPONSE" | jq -r '.body | select(. != null and . != "")' > /tmp/release-notes.md
if [ ! -s /tmp/release-notes.md ]; then
echo "Release ${TAG} from [Gitea](https://gitea.lerkolabs.com/lerkolabs/uptop/releases/tag/${TAG})" > /tmp/release-notes.md
@@ -62,8 +71,11 @@ jobs:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
TAG: ${{ github.ref_name }}
run: |
PRERELEASE=""
case "$TAG" in *-*) PRERELEASE="--prerelease" ;; esac
gh release create "$TAG" \
--repo "$GITHUB_REPOSITORY" \
--title "$TAG" \
--notes-file /tmp/release-notes.md \
$PRERELEASE \
/tmp/assets/*
+5 -2
View File
@@ -8,6 +8,7 @@ release:
gitea:
owner: lerkolabs
name: uptop
prerelease: auto
builds:
- main: ./cmd/uptop
@@ -58,5 +59,7 @@ nfpms:
dst: /usr/share/doc/uptop/LICENSE
type: doc
changelog:
disable: true
# Changelog generation must stay enabled: the --release-notes flag is consumed
# by the changelog pipe, so disabling it silently drops the git-cliff notes
# (empty release body on v0.1.0-rc.1). With --release-notes set, GoReleaser
# skips its own generation and uses the file.
+8 -3
View File
@@ -1,6 +1,11 @@
ignore:
# CVE-2026-41589: SCP path traversal in charmbracelet/wish.
# SCP path traversal in charmbracelet/wish — same flaw, two ids: grype has
# matched it as CVE-2026-41589 and as GHSA-xjvp-7243-rg9h depending on db
# version, and ignore matching is exact-id, so both stay listed.
# We only import wish/bubbletea for the SSH TUI server — the vulnerable
# scp.Middleware / scp.NewFileSystemHandler symbols are never compiled in.
# No fix available for wish v1; v2 (charm.land/wish/v2) patched in 2.0.1.
# scp.Middleware / scp.NewFileSystemHandler symbols are never compiled in
# (govulncheck reachability agrees). No fix for wish v1; v2
# (charm.land/wish/v2 >= 2.0.1) requires the bubbletea-v2 stack migration,
# tracked in issue #126. Remove both entries when that lands.
- vulnerability: CVE-2026-41589
- vulnerability: GHSA-xjvp-7243-rg9h
+175 -121
View File
@@ -1,129 +1,183 @@
# Changelog
## [2026.06.2] — 2026-06-02 (infrastructure)
## [Unreleased]
### Added
- initial commit — uptime monitor (forked from go-upkeep)
- enhanced dashboard with lipgloss tables, huh forms, mouse support, and animations
- upgrade users tab with lipgloss table, edit support, role select
- upgrade alerts tab with lipgloss table, click zones, colored types
- widen Site struct and DB schema for ping, port, dns, group monitor types
- add ping, port, and DNS check routines
- add ntfy notification provider with TUI support
- add Uptime Kuma backup converter with CLI and API
- add mouse wheel scrolling for all tabs
- add per-site pause, fix viewport, polish status page
- add monitor groups with collapse/expand and tree view
- add Telegram, PagerDuty, Pushover, Gotify providers
- add Prometheus /metrics endpoint
- expose HTTP method and accepted status codes in monitor form
- add config-as-code YAML import/export
- add distributed probing foundation — schema, models, and probe APIs
- add probe execution mode, check extraction, and result aggregation
- add region affinity, Nodes TUI tab, and probe metrics
- add status bar, tab badges, and detail panel
- bordered modals, welcome state, and dynamic name width
- DOWN-first sort, health pulse, and site filter
- split available width evenly between NAME and HISTORY columns
- add type icons to sites table
- persist logs to DB, load on startup
- add incident management and maintenance windows
- zebra striping, detail breadcrumb, sparkline stats, collapse persistence
- add --version flag with build metadata injection
- add theme system with 4 curated palettes
- swap light theme for Tokyo Night and Gruvbox
- seed SSH users from env var and authorized_keys file (#31)
- show error reason when monitors go DOWN
- proper push monitor lifecycle — PENDING, LATE, DOWN states
- logs tab overhaul — severity tags, filtering, recovery durations
- alert channel health indicator + test alerts
- add GitHub release relay workflow
- classify error reasons on DOWN monitors
- add state change history view with outage duration
- add Opsgenie provider
- add STALE state for push monitors
- add SLA reporting view
- overhaul latency sparkline scaling, color, and layout
- auto-prune expired maintenance windows
- click-to-inspect sparkline tooltips in detail view
### Changed
- Split release pipeline into separate binary and Docker workflows (#45)
- Pin Docker base images by digest (#45)
- Add GitHub release relay — mirrors Gitea releases to GitHub (#49)
- Add Grype CVE scanning to Docker pipeline (#45)
- Make CVE scan non-blocking for non-exploitable wish SCP vulnerability (#48)
- replace database ID column with row counter
- unify SQLite and Postgres into dialect-based SQLStore
- add error returns to all Store interface methods
- remove store global singleton, thread store explicitly
- extract shared HTTPProvider for webhook-based alerts
- extract shared table rendering, fix cursor bounds
- encapsulate engine state, add graceful shutdown and tests
- split release pipeline, add nfpm/homebrew/git-cliff
- decompose god files into single-concern modules
- consistent chrome across all views
- status icons, clean STATUS column, relative time
- extract magic numbers into named constants
- check all discarded errors in sqlstore_test.go
- overhaul tab bar — consistent counts, active highlight, colored alerts
- responsive column hiding — 3-tier priority-based layout
- swap mattn/go-sqlite3 for modernc.org/sqlite
- propagate context.Context through all Store methods
- typed Status constants with IsBroken() predicate
- schema_version migration table + DeleteAlert FK fix
- shared storetest.BaseMock replaces 5 duplicated mocks
- consolidate env parsing into appConfig struct
- extract Server type with named handler methods
- split Site into SiteConfig + SiteState
- unify logging with log/slog
- restructure site form to 2 type-aware pages
### Fixed
- git-cliff install in CI — resolve download URL dynamically, extract to /tmp (#46, #47)
## [2026.06.1] — 2026-06-01
- forward all msg types to huh forms, improve row selection UX
- harden TLS, timeouts, validation, logging, and token generation
- add delete confirm, input validation, XSS fix, history persistence
- correctness and robustness fixes across all subsystems
- make status bar and tab badges visible
- use stable sort to prevent site list shuffling each tick
- sort children by ID before status to prevent map-order shuffling
- sparkline now spans full column width
- sparkline right-aligned — current time at right edge, dots fill left
- increase history buffer to 60 so sparkline fills completely
- compute uptime from windowed statuses, not running counters
- seed status and latency from DB history on startup
- strip push tokens from /status/json response
- correct viewport sizing and dynamic chrome calculation
- constrain form height to terminal and forward resize events
- skip children in maintenance when computing group status
- exclude maintenance'd monitors from down count and pulse
- group selection highlight, layout constants, group history graphs
- stable monitor count and universal group icons
- replace panic with error return, handle unmarshal errors
- add context to Provider.Send, log alert failures
- constant-time secret comparison, request size limits
- graceful shutdown for HTTP, SSH servers and database
- add jitter to check intervals and stagger startup
- use sh instead of bash for runner compatibility
- enable CGO for race detector, use lint-action v7
- install gcc for race detector support
- skip irrelevant field validation by monitor type
- guard max retries validator for group type
- tighten zebra row contrast for Tokyo Night and Gruvbox
- phase 1 critical fixes for public release
- phase 2 high-severity hardening
- phase 3 medium reliability and hardening
- phase 4 code quality and low-severity fixes
- rename GITEA_TOKEN to RELEASE_TOKEN
- remove explicit container, use sh shell
- bump golang.org/x/crypto v0.47.0 → v0.52.0
- install git and gcc for GoReleaser in release pipeline
- use internal Gitea URL for GoReleaser API calls
- use docker-builder runner for Docker image builds
- patch Docker Scout CVEs and remove unused openssh-client (#41)
- non-root user, supply chain attestations, build cleanup
- move SSH host key path into /data for non-root user
- create .ssh dir explicitly, ensure entrypoint is executable
- resolve git-cliff download URL dynamically
- extract git-cliff to /tmp to avoid dirty worktree
- make Grype CVE scan non-blocking for known wish vuln
- bump Go 1.26.3 → 1.26.4
- remove error truncation from detail panel
- classify safedial "failed to connect" as TCP
- resolve staticcheck lint errors in history view
- trigger immediate recheck after site config edit
- broken tick chain after form/dialog + retries off-by-one
- wire up [e] edit key in detail panel
- show push token and URL in detail panel
- show correct push heartbeat curl command in detail panel
- propagate STALE/LATE child status to group
- quick wins batch — version footer, column widths, zebra, sparkline
- logs tab use viewport for scrollable content
- pin footer to bottom of terminal
- normalize content whitespace for consistent footer position
- clip overflowing content to keep footer pinned
- remove extra blank lines above footer
- expand log viewport to fill content area
- log STALE recovery in push heartbeat handler
- check fmt.Sscanf return value (errcheck lint)
- inject time into ComputeDailyBreakdown for testability
- cascade delete related rows when removing a site
- merge check results into live state, never overwrite
- serialize DB writes through a single drained writer
- close XFF bypass and three secret-leak paths
- move blocking DB IO out of Update/View into tea.Cmds
- move theme styles onto the Model to end cross-session races
- finish moving keypress DB reads into tea.Cmds
- move all store writes out of Update into tea.Cmds
- mask alert secrets in the TUI detail panel and table
- serve /status/json through a public DTO
- make SSH key revocation fail closed
- six correctness fixes for the state machine
- migrate Postgres timestamps to TIMESTAMPTZ
- seven quick-win bug fixes across engine, server, TUI, CLI
- SSRF guard gaps + DNS port restriction + metrics auth
- track selection by site ID + q means back everywhere
- apply convergence + push/group check history
- Kuma import tokens/paused, Docker hardening, migrate-secrets idempotency
- six small fixes — rate limiter leak, DST SLA, probe sort, TUI cleanup
- seven fixes — token scan, variadic cleanup, TUI layout, compose secrets
- chmod SQLite DB files to 0600 on open
- close DNS-rebind TOCTOU on ping/port checks
- API import no longer replaces user accounts
- email send respects context deadline
- rename X-Upkeep-Secret header to X-Uptop-Secret
- apply log filter to full log list, not viewport window
- repair pipeline defects found in v0.1.0-rc.1 rehearsal
- suppress wish GHSA alias in grype, fold rc tags into launch notes
- scan gates docker push, rc tags spare :latest, mirror waits for stable assets
- remove tagged scan image in cleanup step
- exclude rc tags from cliff tag_pattern so launch notes span full history
- fall back to embedded build info when ldflags absent
- drop body-grep Security grouping, map polish type in cliff
- sync selectedID on click so refreshLive doesn't revert cursor
- resolve 4 tag-blocking issues for v0.1.0
### Changed
- Container runs as non-root user `uptop` (UID/GID 1000) instead of root (#44)
- SSH host key relocated to `/data/.ssh/id_ed25519` for non-root compatibility (#44)
- Release workflow prunes dangling images and build cache after Docker push (#44)
### Added
- SBOM and provenance attestations on Docker images for supply chain compliance (#44)
- Entrypoint script with volume writability check and migration guidance (#44)
### Breaking
- Existing Docker volumes with root-owned files require migration before upgrading:
`docker run --rm -v <volume>:/data alpine chown -R 1000:1000 /data`
## [2026.05.6] — 2026-05-30 (infrastructure)
### Changed
- Sync README to Docker Hub on release (#43)
### Security
- Patch Docker Scout CVEs, remove unused openssh-client (#41)
## [2026.05.5] — 2026-05-29
### Added
- Error reason display when monitors go DOWN (#33)
- Push monitor lifecycle — PENDING, LATE, DOWN states (#34)
- Logs tab overhaul — severity tags, filtering, recovery durations (#35)
- Alert channel health indicator and test alerts (#36)
- TUI screenshots in `assets/` (#32)
- CI status badge in README
### Changed
- Visual polish — detail sections, column headers, alert detail (#37)
- README rewritten with hero image, badges, collapsible install sections (#32)
- Changelog rewritten to match actual CalVer tag history
- Migrated to `lerkolabs` org namespace (#38)
- Docker-compose files moved to `deploy/`
## [2026.05.4] — 2026-05-27
### Added
- SSH user seeding from `UPTOP_ADMIN_KEY` env var and `UPTOP_KEYS` file (#31)
- GoReleaser for binary releases
- govulncheck in CI pipeline
- Multi-arch Docker builds (amd64 + arm64)
### Changed
- CI overhaul — Go 1.26, build caching, streamlined pipeline (#30)
- Bumped golang.org/x/crypto v0.47.0 → v0.52.0
- Bumped Alpine 3.21 → 3.23
### Security
- Phase 1: SSRF protection, input validation, safe dial (#26)
- Phase 2: TLS hardening, auth bypass fixes, rate limiting (#27)
- Phase 3: Graceful degradation, connection limits, timeout enforcement (#28)
- Phase 4: Code quality, error handling, linter fixes (#29)
## [2026.05.3] — 2026-05-25
### Added
- Theme system with 5 dark palettes — Default, Dracula, Nord, Tokyo Night, Gruvbox (#24)
- `--version` flag with build metadata injection
- Gitea Actions CI pipeline — test + lint (#20)
- golangci-lint configuration
- Comprehensive test suite — 94 tests across monitor, server, cluster (#19)
- CONTRIBUTING.md and SECURITY.md
### Changed
- Renamed project from go-upkeep to uptop (#25)
- Updated LICENSE with dual copyright for independent fork
### Fixed
- Form validators scoped to relevant monitor types (#23)
- Graceful shutdown for HTTP, SSH servers and database (#19)
- Constant-time secret comparison, request size limits (#19)
- Check interval jitter to prevent thundering herd (#19)
- TUI visual polish — zebra striping, group icons, sparkline stats (#18)
## [2026.05.2] — 2026-05-22
### Added
- Incident management and maintenance windows (#17)
- Production docker-compose.yml
### Fixed
- Viewport sizing and dynamic chrome calculation (#16)
- Form height constrained to terminal with resize forwarding
- Maintenance'd monitors excluded from down count and pulse
- Group status correctly skips children in maintenance
## [2026.05.1] — 2026-05-16
### Added
- Distributed probing with leader + probe nodes
- Config-as-code — YAML apply/export with dry-run and prune
- TUI polish — status bar, tab badges, detail panel, modals
- DOWN-first sort, health pulse, site filter
- Type icons in sites table
- Sparkline history graphs
- Persistent state — uptime, status, latency, and logs survive restarts
- Push token stripping from /status/json response
## [2026.04.1] — 2026-04-01
### Added
- SSH-accessible TUI built on Bubble Tea + Wish
- 6 check types — HTTP, Push, Ping, Port, DNS, Group
- 9 alert providers — Discord, Slack, Email, Ntfy, Telegram, PagerDuty, Pushover, Gotify, Webhook
- SQLite and PostgreSQL support
- HA clustering with automatic failover
- Prometheus /metrics endpoint
- Public status page (HTML + JSON)
- Uptime Kuma backup import
+1 -1
View File
@@ -3,7 +3,7 @@
## Development
```sh
go run cmd/uptop/main.go -demo # starts with sample data
go run ./cmd/uptop -demo # starts with sample data
ssh -p 23234 localhost # connect to TUI
```
+3 -1
View File
@@ -1,5 +1,5 @@
# --- Stage 1: Builder ---
FROM golang:1.26-alpine3.23@sha256:91eda9776261207ea25fd06b5b7fed8d397dd2c0a283e77f2ab6e91bfa71079d AS builder
FROM golang:1.26.4-alpine3.23@sha256:f23e8b227fb4493eabe03bede4d5a32d04092da71962f1fb79b5f7d1e6c2a17f AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN --mount=type=cache,target=/go/pkg/mod \
@@ -31,6 +31,8 @@ ENV UPTOP_SSH_HOST_KEY=/data/.ssh/id_ed25519
ENV UPTOP_PORT=23234
EXPOSE 23234
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD wget -qO- http://localhost:8080/api/health || exit 1
USER uptop
ENTRYPOINT ["docker-entrypoint.sh"]
CMD ["./uptop"]
+27 -6
View File
@@ -19,6 +19,8 @@ An uptime monitor you manage entirely from the terminal. It runs as a server, ex
Built on [RDGames/go-upkeep](https://github.com/RDGames/go-upkeep). Rewritten for clustering, config-as-code, and a proper dashboard.
Canonical repo: [gitea.lerkolabs.com/lerkolabs/uptop](https://gitea.lerkolabs.com/lerkolabs/uptop) — [GitHub](https://github.com/lerkolabs/uptop) is a mirror; releases are published to both.
## Features
- **6 check types** — HTTP, Push (heartbeat), Ping, Port, DNS, Groups
@@ -30,6 +32,8 @@ Built on [RDGames/go-upkeep](https://github.com/RDGames/go-upkeep). Rewritten fo
- **SQLite or Postgres** — SQLite for single-node, Postgres for production
- **Uptime Kuma import** — migrate from Kuma with one command
> Group monitors roll up child status for display but don't fire their own alerts yet — attach alerts to the children.
## Screenshots
<table>
@@ -49,14 +53,14 @@ Built on [RDGames/go-upkeep](https://github.com/RDGames/go-upkeep). Rewritten fo
## Quick start
```bash
go run cmd/uptop/main.go
UPTOP_ADMIN_KEY="$(cat ~/.ssh/id_ed25519.pub)" go run ./cmd/uptop
ssh -p 23234 localhost
```
Want some data to look at first:
```bash
go run cmd/uptop/main.go -demo
UPTOP_ADMIN_KEY="$(cat ~/.ssh/id_ed25519.pub)" go run ./cmd/uptop -demo
```
## Install
@@ -79,16 +83,20 @@ services:
# - UPTOP_ADMIN_KEY=ssh-ed25519 AAAA... you@host
volumes:
- ./data:/data
sysctls:
- net.ipv4.ping_group_range=0 2147483647
```
First run: set `UPTOP_ADMIN_KEY` to your SSH public key, or attach to the container and add it in the Users tab.
First run: set `UPTOP_ADMIN_KEY` to your SSH public key.
The `sysctls` line enables unprivileged ICMP inside the container — without it, ping monitors get no response and silently report DOWN.
</details>
<details>
<summary><strong>Binary (Linux amd64)</strong></summary>
<summary><strong>Binary (Linux, macOS, Windows)</strong></summary>
Download from [Releases](https://github.com/lerkolabs/uptop/releases).
Download from [Releases](https://github.com/lerkolabs/uptop/releases) — amd64 and arm64 tarballs (zip for Windows), plus `.deb`/`.rpm` packages and `checksums.txt`.
</details>
@@ -162,6 +170,19 @@ Set `UPTOP_ENCRYPTION_KEY` to encrypt alert credentials (SMTP passwords, webhook
Without this, credentials are stored as plaintext in the database. uptop warns on startup if unset. To encrypt credentials on an existing install, run `uptop migrate-secrets` with the key set.
### Data retention
uptop prunes its own history in the background — no external cleanup jobs needed:
| Data | Kept |
|---|---|
| Check history | newest 1,000 checks per monitor |
| State changes (UP/DOWN transitions) | newest 5,000 per monitor |
| Logs | newest 200 entries |
| Maintenance windows | 7 days after they end (configurable) |
Sparklines, uptime percentages, and SLA reports are computed from these windows, so very long-horizon stats aren't retained. Export to Prometheus via `/metrics` if you need unlimited history.
## Clustering
uptop supports three modes: **leader** (default single node), **follower** (HA failover — takes over if the leader goes down), and **probe** (stateless distributed checks from multiple regions).
@@ -174,7 +195,7 @@ Export your Kuma backup JSON, then:
```bash
curl -X POST http://localhost:8080/api/import/kuma \
-H "X-Upkeep-Secret: your-secret" \
-H "X-Uptop-Secret: your-secret" \
-H "Content-Type: application/json" \
-d @kuma-backup.json
```
+8 -2
View File
@@ -23,7 +23,13 @@ filter_unconventional = true
split_commits = false
protect_breaking_commits = false
filter_commits = false
tag_pattern = "[0-9]*"
# Only final tags count as releases — rc rehearsal tags must not become
# section boundaries, or the final tag's notes would cover only
# commits-since-last-rc (v0.1.0 rendered 0 commits with ignore_tags, which
# drops rc-tagged commits instead of folding them forward). With rc tags
# outside the pattern, finals render the full span and rc tags render
# [Unreleased] with everything pending. Verified empirically on both.
tag_pattern = 'v[0-9]+\.[0-9]+\.[0-9]+$'
topo_order = false
sort_commits = "oldest"
@@ -33,7 +39,7 @@ commit_parsers = [
{ message = "^perf", group = "Changed" },
{ message = "^refactor", group = "Changed" },
{ message = "^security", group = "Security" },
{ body = ".*security", group = "Security" },
{ message = "^polish", group = "Changed" },
{ body = "BREAKING", group = "Breaking" },
{ footer = "BREAKING.CHANGE", group = "Breaking" },
{ message = "^docs", skip = true },
+133
View File
@@ -0,0 +1,133 @@
package main
import (
"net"
"os"
"strconv"
"time"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/server"
)
type appConfig struct {
Port int
SSHHostKey string
DBType string
DBDSN string
HTTPPort int
TLSCert string
TLSKey string
StatusEnabled bool
StatusTitle string
ClusterMode string
ClusterSecret string
PeerURL string
NodeID string
NodeName string
NodeRegion string
AggStrategy string
AllowPrivateTargets bool
InsecureSkipVerify bool
MaintRetention time.Duration
EncryptionKey string
MetricsPublic bool
CORSOrigin string
TrustedProxies []*net.IPNet
AdminKey string
KeysFile string
}
func parseConfig() appConfig {
cfg := appConfig{
Port: 23234,
SSHHostKey: ".ssh/id_ed25519",
DBType: "sqlite",
DBDSN: "uptop.db",
HTTPPort: 8080,
StatusTitle: "System Status",
ClusterMode: "leader",
MaintRetention: 7 * 24 * time.Hour,
}
if v := os.Getenv("UPTOP_PORT"); v != "" {
if n, err := strconv.Atoi(v); err == nil {
cfg.Port = n
}
}
if v := os.Getenv("UPTOP_DB_TYPE"); v != "" {
cfg.DBType = v
}
if v := os.Getenv("UPTOP_DB_DSN"); v != "" {
cfg.DBDSN = v
}
if v := os.Getenv("UPTOP_HTTP_PORT"); v != "" {
if n, err := strconv.Atoi(v); err == nil {
cfg.HTTPPort = n
}
}
if os.Getenv("UPTOP_STATUS_ENABLED") == "true" {
cfg.StatusEnabled = true
}
if v := os.Getenv("UPTOP_STATUS_TITLE"); v != "" {
cfg.StatusTitle = v
}
if v := os.Getenv("UPTOP_CLUSTER_MODE"); v != "" {
cfg.ClusterMode = v
}
if v := os.Getenv("UPTOP_PEER_URL"); v != "" {
cfg.PeerURL = v
}
if v := os.Getenv("UPTOP_CLUSTER_SECRET"); v != "" {
cfg.ClusterSecret = v
}
cfg.NodeID = os.Getenv("UPTOP_NODE_ID")
cfg.NodeName = os.Getenv("UPTOP_NODE_NAME")
cfg.NodeRegion = os.Getenv("UPTOP_NODE_REGION")
cfg.AggStrategy = os.Getenv("UPTOP_AGG_STRATEGY")
cfg.AllowPrivateTargets = os.Getenv("UPTOP_ALLOW_PRIVATE_TARGETS") == "true"
cfg.InsecureSkipVerify = os.Getenv("UPTOP_INSECURE_SKIP_VERIFY") == "true"
cfg.MetricsPublic = os.Getenv("UPTOP_METRICS_PUBLIC") == "true"
cfg.EncryptionKey = os.Getenv("UPTOP_ENCRYPTION_KEY")
cfg.TLSCert = os.Getenv("UPTOP_TLS_CERT")
cfg.TLSKey = os.Getenv("UPTOP_TLS_KEY")
cfg.CORSOrigin = os.Getenv("UPTOP_CORS_ORIGIN")
cfg.TrustedProxies = parseTrustedProxies(os.Getenv("UPTOP_TRUSTED_PROXIES"))
cfg.SSHHostKey = envOrDefault("UPTOP_SSH_HOST_KEY", cfg.SSHHostKey)
cfg.AdminKey = os.Getenv("UPTOP_ADMIN_KEY")
cfg.KeysFile = os.Getenv("UPTOP_KEYS")
if v := os.Getenv("UPTOP_MAINT_RETENTION"); v != "" {
if d, err := time.ParseDuration(v); err == nil && d > 0 {
cfg.MaintRetention = d
}
}
return cfg
}
func (c appConfig) serverConfig(quietHTTPLog bool) server.ServerConfig {
return server.ServerConfig{
Port: c.HTTPPort,
EnableStatus: c.StatusEnabled,
Title: c.StatusTitle,
ClusterKey: c.ClusterSecret,
TLSCert: c.TLSCert,
TLSKey: c.TLSKey,
ClusterMode: c.ClusterMode,
MetricsPublic: c.MetricsPublic,
CORSOrigin: c.CORSOrigin,
TrustedProxies: c.TrustedProxies,
QuietHTTPLog: quietHTTPLog,
}
}
+113 -148
View File
@@ -6,13 +6,13 @@ import (
"errors"
"flag"
"fmt"
"log"
"log/slog"
"net"
"net/url"
"os"
"os/signal"
"path/filepath"
"strconv"
"runtime/debug"
"strings"
"sync"
"syscall"
@@ -40,8 +40,34 @@ var (
date = "unknown"
)
// GoReleaser stamps the vars above via ldflags, but `go install module@tag`
// compiles without them and would report "dev". The module version and any
// vcs stamps are embedded in every binary, so fall back to those.
func init() {
if version != "dev" {
return
}
info, ok := debug.ReadBuildInfo()
if !ok {
return
}
if mv := info.Main.Version; mv != "" && mv != "(devel)" {
version = strings.TrimPrefix(mv, "v")
}
for _, s := range info.Settings {
switch s.Key {
case "vcs.revision":
commit = s.Value
case "vcs.time":
date = s.Value
}
}
}
func main() {
log.SetOutput(os.Stderr)
slog.SetDefault(slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{
Level: slog.LevelInfo,
})))
if len(os.Args) >= 2 {
switch os.Args[1] {
@@ -63,11 +89,18 @@ func main() {
}
func printVersion() {
if version == "dev" {
fmt.Println("uptop dev")
} else {
fmt.Printf("uptop %s (%s, %s)\n", version, commit, date)
out := "uptop " + version
var meta []string
if commit != "none" {
meta = append(meta, commit)
}
if date != "unknown" {
meta = append(meta, date)
}
if len(meta) > 0 {
out += " (" + strings.Join(meta, ", ") + ")"
}
fmt.Println(out)
}
func envOrDefault(key, fallback string) string {
@@ -111,7 +144,7 @@ func parseTrustedProxies(raw string) []*net.IPNet {
}
_, ipnet, err := net.ParseCIDR(part)
if err != nil {
fmt.Fprintf(os.Stderr, "WARNING: ignoring invalid UPTOP_TRUSTED_PROXIES entry %q: %v\n", part, err)
slog.Warn("ignoring invalid UPTOP_TRUSTED_PROXIES entry", "entry", part, "err", err) //nolint:gosec // structured slog, not format string
continue
}
cidrs = append(cidrs, ipnet)
@@ -128,21 +161,21 @@ func openStore(dbType, dsn string) store.Store {
ss, err = store.NewSQLiteStore(dsn)
}
if err != nil {
fmt.Fprintf(os.Stderr, "database error: %v\n", err)
slog.Error("database connection failed", "err", err)
os.Exit(1)
}
if encKey := os.Getenv("UPTOP_ENCRYPTION_KEY"); encKey != "" {
enc, err := store.NewEncryptor(encKey)
if err != nil {
fmt.Fprintf(os.Stderr, "encryption key error: %v\n", err)
slog.Error("encryption key invalid", "err", err)
os.Exit(1)
}
ss.SetEncryptor(enc)
} else {
fmt.Println("WARNING: No UPTOP_ENCRYPTION_KEY set. Alert credentials stored unencrypted.")
slog.Warn("no UPTOP_ENCRYPTION_KEY set, alert credentials stored unencrypted")
}
if err := ss.Init(context.Background()); err != nil {
fmt.Fprintf(os.Stderr, "database init error: %v\n", err)
slog.Error("database init failed", "err", err)
os.Exit(1)
}
return ss
@@ -167,7 +200,7 @@ func runApply(args []string) {
f, err := config.LoadFile(*filePath)
if err != nil {
fmt.Fprintf(os.Stderr, "error: %v\n", err)
slog.Error("config load failed", "err", err)
os.Exit(1)
}
@@ -176,7 +209,7 @@ func runApply(args []string) {
Prune: *prune,
})
if err != nil {
fmt.Fprintf(os.Stderr, "error: %v\n", err)
slog.Error("config apply failed", "err", err)
os.Exit(1)
}
@@ -194,12 +227,12 @@ func runExport(args []string) {
f, err := config.Export(context.Background(), s)
if err != nil {
fmt.Fprintf(os.Stderr, "error: %v\n", err)
slog.Error("export failed", "err", err)
os.Exit(1)
}
if err := config.WriteFile(f, *outPath); err != nil {
fmt.Fprintf(os.Stderr, "error: %v\n", err)
slog.Error("export write failed", "err", err)
os.Exit(1)
}
}
@@ -217,7 +250,7 @@ func runMigrateSecrets(args []string) {
}
enc, err := store.NewEncryptor(encKey)
if err != nil {
fmt.Fprintf(os.Stderr, "error: %v\n", err)
slog.Error("encryption key invalid", "err", err)
os.Exit(1)
}
@@ -228,25 +261,25 @@ func runMigrateSecrets(args []string) {
ss, err = store.NewSQLiteStore(*dsn)
}
if err != nil {
fmt.Fprintf(os.Stderr, "database error: %v\n", err)
slog.Error("database connection failed", "err", err)
os.Exit(1)
}
if err := ss.Init(context.Background()); err != nil {
fmt.Fprintf(os.Stderr, "database init error: %v\n", err)
os.Exit(1)
}
alerts, err := ss.GetAllAlerts(context.Background())
if err != nil {
fmt.Fprintf(os.Stderr, "error loading alerts: %v\n", err)
slog.Error("database init failed", "err", err)
os.Exit(1)
}
ss.SetEncryptor(enc)
alerts, err := ss.GetAllAlerts(context.Background())
if err != nil {
slog.Error("failed to load alerts", "err", err)
os.Exit(1)
}
migrated := 0
for _, a := range alerts {
if err := ss.UpdateAlert(context.Background(), a.ID, a.Name, a.Type, a.Settings); err != nil {
fmt.Fprintf(os.Stderr, "error migrating alert %q: %v\n", a.Name, err)
slog.Error("alert migration failed", "alert", a.Name, "err", err)
os.Exit(1)
}
migrated++
@@ -255,64 +288,19 @@ func runMigrateSecrets(args []string) {
}
func runServe(args []string) {
portVal := 23234
dbType := "sqlite"
dbDSN := "uptop.db"
httpPort := 8080
enableStatus := false
statusTitle := "System Status"
clusterMode := "leader"
clusterPeer := ""
clusterKey := ""
cfg := parseConfig()
if v := os.Getenv("UPTOP_PORT"); v != "" {
if p, err := strconv.Atoi(v); err == nil {
portVal = p
}
}
if v := os.Getenv("UPTOP_DB_TYPE"); v != "" {
dbType = v
}
if v := os.Getenv("UPTOP_DB_DSN"); v != "" {
dbDSN = v
}
if v := os.Getenv("UPTOP_HTTP_PORT"); v != "" {
if p, err := strconv.Atoi(v); err == nil {
httpPort = p
}
}
if v := os.Getenv("UPTOP_STATUS_ENABLED"); v == "true" {
enableStatus = true
}
if v := os.Getenv("UPTOP_STATUS_TITLE"); v != "" {
statusTitle = v
}
if v := os.Getenv("UPTOP_CLUSTER_MODE"); v != "" {
clusterMode = v
}
if v := os.Getenv("UPTOP_PEER_URL"); v != "" {
clusterPeer = v
}
if v := os.Getenv("UPTOP_CLUSTER_SECRET"); v != "" {
clusterKey = v
}
nodeID := os.Getenv("UPTOP_NODE_ID")
nodeName := os.Getenv("UPTOP_NODE_NAME")
nodeRegion := os.Getenv("UPTOP_NODE_REGION")
aggStrategy := os.Getenv("UPTOP_AGG_STRATEGY")
if clusterMode == "probe" {
if nodeID == "" {
if cfg.ClusterMode == "probe" {
if cfg.NodeID == "" {
fmt.Fprintln(os.Stderr, "UPTOP_NODE_ID is required for probe mode")
os.Exit(1)
}
if clusterPeer == "" {
if cfg.PeerURL == "" {
fmt.Fprintln(os.Stderr, "UPTOP_PEER_URL is required for probe mode")
os.Exit(1)
}
fmt.Printf("Cluster: Running as PROBE (node=%s, region=%s)\n", nodeID, nodeRegion)
fmt.Printf("Cluster: Running as PROBE (node=%s, region=%s)\n", cfg.NodeID, cfg.NodeRegion)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
@@ -323,29 +311,28 @@ func runServe(args []string) {
cancel()
}()
probeAllowPrivate := os.Getenv("UPTOP_ALLOW_PRIVATE_TARGETS") == "true"
if probeAllowPrivate {
fmt.Println("WARNING: Private target blocking disabled. Monitor URLs can reach internal networks.")
if cfg.AllowPrivateTargets {
slog.Warn("private target blocking disabled, monitor URLs can reach internal networks")
}
if err := cluster.RunProbe(ctx, cluster.ProbeConfig{
NodeID: nodeID,
NodeName: nodeName,
Region: nodeRegion,
LeaderURL: clusterPeer,
SharedKey: clusterKey,
NodeID: cfg.NodeID,
NodeName: cfg.NodeName,
Region: cfg.NodeRegion,
LeaderURL: cfg.PeerURL,
SharedKey: cfg.ClusterSecret,
Interval: 30,
AllowPrivateTargets: probeAllowPrivate,
AllowPrivateTargets: cfg.AllowPrivateTargets,
}); err != nil {
fmt.Fprintf(os.Stderr, "Probe error: %v\n", err)
slog.Error("probe failed", "err", err)
}
return
}
fs := flag.NewFlagSet("serve", flag.ExitOnError)
port := fs.Int("port", portVal, "SSH Port")
flagDBType := fs.String("db-type", dbType, "Database type")
flagDSN := fs.String("dsn", dbDSN, "Database DSN")
port := fs.Int("port", cfg.Port, "SSH Port")
flagDBType := fs.String("db-type", cfg.DBType, "Database type")
flagDSN := fs.String("dsn", cfg.DBDSN, "Database DSN")
demo := fs.Bool("demo", false, "Seed demo data")
importKuma := fs.String("import-kuma", "", "Import Uptime Kuma backup JSON file")
_ = fs.Parse(args) // ExitOnError: parse errors exit before returning
@@ -354,32 +341,32 @@ func runServe(args []string) {
var dbErr error
if *flagDBType == "postgres" {
ss, dbErr = store.NewPostgresStore(*flagDSN)
fmt.Printf("Using PostgreSQL: %s\n", redactDSN(*flagDSN))
slog.Info("database connected", "type", "postgres", "dsn", redactDSN(*flagDSN))
} else {
ss, dbErr = store.NewSQLiteStore(*flagDSN)
fmt.Printf("Using SQLite: %s\n", *flagDSN)
slog.Info("database connected", "type", "sqlite", "dsn", *flagDSN)
}
if dbErr != nil {
fmt.Fprintf(os.Stderr, "database connection error: %v\n", dbErr)
slog.Error("database connection failed", "err", dbErr)
os.Exit(1)
}
defer ss.Close()
if encKey := os.Getenv("UPTOP_ENCRYPTION_KEY"); encKey != "" {
enc, err := store.NewEncryptor(encKey)
if cfg.EncryptionKey != "" {
enc, err := store.NewEncryptor(cfg.EncryptionKey)
if err != nil {
fmt.Fprintf(os.Stderr, "encryption key error: %v\n", err)
slog.Error("encryption key invalid", "err", err)
os.Exit(1)
}
ss.SetEncryptor(enc)
} else {
fmt.Println("WARNING: No UPTOP_ENCRYPTION_KEY set. Alert credentials stored unencrypted.")
slog.Warn("no UPTOP_ENCRYPTION_KEY set, alert credentials stored unencrypted")
}
kc := newKeyCache(ss)
var s store.Store = &userInvalidatingStore{Store: ss, kc: kc}
if err := s.Init(context.Background()); err != nil {
fmt.Fprintf(os.Stderr, "database init error: %v\n", err)
slog.Error("database init failed", "err", err)
os.Exit(1)
}
if *demo {
@@ -391,34 +378,29 @@ func runServe(args []string) {
if *importKuma != "" {
kb, err := importer.LoadKumaFile(*importKuma)
if err != nil {
fmt.Fprintf(os.Stderr, "kuma import error: %v\n", err)
slog.Error("kuma import failed", "err", err)
os.Exit(1)
}
backup := importer.ConvertKuma(kb)
if err := s.ImportData(context.Background(), backup); err != nil {
fmt.Fprintf(os.Stderr, "import failed: %v\n", err)
slog.Error("import failed", "err", err)
os.Exit(1)
}
fmt.Printf("Imported %d monitors and %d alerts from Uptime Kuma v%s\n", len(backup.Sites), len(backup.Alerts), kb.Version)
}
allowPrivate := os.Getenv("UPTOP_ALLOW_PRIVATE_TARGETS") == "true"
if allowPrivate {
fmt.Println("WARNING: Private target blocking disabled. Monitor URLs can reach internal networks.")
if cfg.AllowPrivateTargets {
slog.Warn("private target blocking disabled, monitor URLs can reach internal networks")
}
eng := monitor.NewEngineWithOpts(s, allowPrivate)
if os.Getenv("UPTOP_INSECURE_SKIP_VERIFY") == "true" {
eng := monitor.NewEngineWithOpts(s, cfg.AllowPrivateTargets)
if cfg.InsecureSkipVerify {
eng.SetInsecureSkipVerify(true)
}
if aggStrategy != "" {
eng.SetAggStrategy(monitor.AggregationStrategy(aggStrategy))
}
if v := os.Getenv("UPTOP_MAINT_RETENTION"); v != "" {
if d, err := time.ParseDuration(v); err == nil && d > 0 {
eng.SetMaintRetention(d)
}
if cfg.AggStrategy != "" {
eng.SetAggStrategy(monitor.AggregationStrategy(cfg.AggStrategy))
}
eng.SetMaintRetention(cfg.MaintRetention)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
@@ -428,31 +410,14 @@ func runServe(args []string) {
eng.InitAlertHealth()
eng.Start(ctx)
tlsCert := os.Getenv("UPTOP_TLS_CERT")
tlsKey := os.Getenv("UPTOP_TLS_KEY")
// When the local TUI owns the terminal, per-request HTTP logs to stderr
// would scribble over the alt screen.
localTUI := isatty.IsTerminal(os.Stdout.Fd()) || isatty.IsCygwinTerminal(os.Stdout.Fd())
httpSrv := server.Start(server.ServerConfig{
Port: httpPort,
EnableStatus: enableStatus,
Title: statusTitle,
ClusterKey: clusterKey,
TLSCert: tlsCert,
TLSKey: tlsKey,
ClusterMode: clusterMode,
MetricsPublic: os.Getenv("UPTOP_METRICS_PUBLIC") == "true",
CORSOrigin: os.Getenv("UPTOP_CORS_ORIGIN"),
TrustedProxies: parseTrustedProxies(os.Getenv("UPTOP_TRUSTED_PROXIES")),
QuietHTTPLog: localTUI,
}, s, eng)
httpSrv := server.Start(cfg.serverConfig(localTUI), s, eng)
cluster.Start(ctx, cluster.Config{
Mode: clusterMode,
PeerURL: clusterPeer,
SharedKey: clusterKey,
Mode: cfg.ClusterMode,
PeerURL: cfg.PeerURL,
SharedKey: cfg.ClusterSecret,
}, eng)
sshSrv := startSSHServer(*port, s, eng, kc)
@@ -460,7 +425,7 @@ func runServe(args []string) {
if localTUI {
p := tea.NewProgram(tui.InitialModel(true, s, eng, version), tea.WithAltScreen(), tea.WithMouseCellMotion())
if _, err := p.Run(); err != nil {
fmt.Fprintf(os.Stderr, "error: %v\n", err)
slog.Error("TUI failed", "err", err)
}
} else {
fmt.Println("uptop running in HEADLESS mode")
@@ -471,20 +436,18 @@ func runServe(args []string) {
}
cancel()
// Drain pending DB writes before the deferred ss.Close() runs, so no
// write races a closed database.
eng.Stop()
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 30*time.Second)
defer shutdownCancel()
if httpSrv != nil {
if err := httpSrv.Shutdown(shutdownCtx); err != nil {
log.Printf("HTTP shutdown error: %v", err)
slog.Error("HTTP shutdown failed", "err", err)
}
}
if sshSrv != nil {
if err := sshSrv.Shutdown(shutdownCtx); err != nil {
log.Printf("SSH shutdown error: %v", err)
slog.Error("SSH shutdown failed", "err", err)
}
}
}
@@ -503,12 +466,12 @@ func startSSHServer(port int, db store.Store, eng *monitor.Engine, kc *keyCache)
),
)
if err != nil {
fmt.Fprintf(os.Stderr, "SSH server error: %v\n", err)
slog.Error("SSH server failed", "err", err)
return nil
}
go func() {
if err := s.ListenAndServe(); err != nil && !errors.Is(err, ssh.ErrServerClosed) {
log.Printf("SSH server error: %v", err)
slog.Error("SSH server failed", "err", err)
}
}()
return s
@@ -523,11 +486,11 @@ func seedDemoData(s store.Store) {
fmt.Println("Seeding demo data...")
if err := s.AddAlert(ctx, "Discord Ops", "discord", map[string]string{"url": "https://discord.com/api/webhooks/demo/token"}); err != nil {
log.Printf("demo seed: add alert: %v", err)
slog.Error("demo seed failed", "step", "add alert", "err", err)
return
}
if err := s.AddAlert(ctx, "Slack Infra", "slack", map[string]string{"url": "https://hooks.slack.com/services/DEMO/WEBHOOK"}); err != nil {
log.Printf("demo seed: add alert: %v", err)
slog.Error("demo seed failed", "step", "add alert", "err", err)
return
}
if err := s.AddAlert(ctx, "Email Oncall", "email", map[string]string{
@@ -535,7 +498,7 @@ func seedDemoData(s store.Store) {
"user": "oncall@example.com", "pass": "replace-me",
"from": "oncall@example.com", "to": "team@example.com",
}); err != nil {
log.Printf("demo seed: add alert: %v", err)
slog.Error("demo seed failed", "step", "add alert", "err", err)
return
}
@@ -545,7 +508,7 @@ func seedDemoData(s store.Store) {
alertID = alerts[0].ID
}
demoSites := []models.Site{
demoSites := []models.SiteConfig{
{Name: "Google", URL: "https://www.google.com", Type: "http", Interval: 30, AlertID: alertID, CheckSSL: true, ExpiryThreshold: 14, MaxRetries: 2},
{Name: "GitHub", URL: "https://github.com", Type: "http", Interval: 30, AlertID: alertID, CheckSSL: true, ExpiryThreshold: 7, MaxRetries: 3},
{Name: "Cloudflare DNS", URL: "https://1.1.1.1", Type: "http", Interval: 60, AlertID: alertID, ExpiryThreshold: 7, MaxRetries: 1},
@@ -559,7 +522,7 @@ func seedDemoData(s store.Store) {
}
for _, site := range demoSites {
if err := s.AddSite(ctx, site); err != nil {
log.Printf("demo seed: add site %q: %v", site.Name, err)
slog.Error("demo seed failed", "step", "add site", "site", site.Name, "err", err)
}
}
}
@@ -582,7 +545,7 @@ func (c *keyCache) refresh() {
// Keep the previous key set: a transient DB error must not lock every
// admin out. Revocation still fails closed because Invalidate clears
// the set immediately.
log.Printf("SSH key cache refresh failed: %v", err)
slog.Error("SSH key cache refresh failed", "err", err)
return
}
keys := make([]ssh.PublicKey, 0, len(users))
@@ -672,7 +635,9 @@ func seedKeysFromEnv(s store.Store) {
if path := os.Getenv("UPTOP_KEYS"); path != "" {
f, err := os.Open(filepath.Clean(path))
if err == nil {
if err != nil {
slog.Warn("failed to open UPTOP_KEYS file", "path", path, "err", err) //nolint:gosec // structured slog, not format string
} else {
scanner := bufio.NewScanner(f)
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
@@ -691,7 +656,7 @@ func seedKeysFromEnv(s store.Store) {
existing, err := s.GetAllUsers(ctx)
if err != nil {
fmt.Fprintf(os.Stderr, "warning: could not check existing users: %v\n", err)
slog.Warn("could not check existing users", "err", err)
return
}
@@ -708,7 +673,7 @@ func seedKeysFromEnv(s store.Store) {
username := usernameFromKey(key, i, len(existing)+added)
if err := s.AddUser(ctx, username, key, "admin"); err != nil {
fmt.Fprintf(os.Stderr, "warning: failed to seed user %q: %v\n", username, err)
slog.Warn("failed to seed user", "user", username, "err", err) //nolint:gosec // structured slog, not format string
continue
}
fmt.Printf("Seeded admin user %q from %s\n", username, seedSource(i, len(keys), os.Getenv("UPTOP_ADMIN_KEY") != ""))
+4 -4
View File
@@ -3,7 +3,7 @@ services:
# LEADER NODE
# -------------------------
leader:
build: .
image: lerkolabs/uptop:latest
container_name: uptop-leader
ports:
- "23234:23234" # SSH
@@ -18,7 +18,7 @@ services:
# Cluster Config
- UPTOP_CLUSTER_MODE=leader
- UPTOP_CLUSTER_SECRET=mysecret
- UPTOP_CLUSTER_SECRET=mysecret # EXAMPLE ONLY — rotate before use
depends_on:
- leader-db
stdin_open: true
@@ -38,7 +38,7 @@ services:
# FOLLOWER NODE
# -------------------------
follower:
build: .
image: lerkolabs/uptop:latest
container_name: uptop-follower
ports:
- "23233:23234" # SSH (Mapped to different host port)
@@ -53,7 +53,7 @@ services:
# Cluster Config
- UPTOP_CLUSTER_MODE=follower
- UPTOP_CLUSTER_SECRET=mysecret
- UPTOP_CLUSTER_SECRET=mysecret # EXAMPLE ONLY — rotate before use
# IMPORTANT: Uses the Service Name "leader" to connect internally
- UPTOP_PEER_URL=http://leader:8080
depends_on:
+1 -1
View File
@@ -2,7 +2,7 @@ services:
# The Application
app:
build:
context: .
context: ..
dockerfile: Dockerfile
container_name: uptop-dev
ports:
+6 -6
View File
@@ -1,9 +1,9 @@
services:
leader:
build: .
image: lerkolabs/uptop:latest
environment:
- UPTOP_CLUSTER_MODE=leader
- UPTOP_CLUSTER_SECRET=changeme
- UPTOP_CLUSTER_SECRET=changeme # EXAMPLE ONLY — rotate before use
- UPTOP_AGG_STRATEGY=any-down
- UPTOP_STATUS_ENABLED=true
ports:
@@ -11,25 +11,25 @@ services:
- "23234:23234"
probe-us-east:
build: .
image: lerkolabs/uptop:latest
environment:
- UPTOP_CLUSTER_MODE=probe
- UPTOP_NODE_ID=us-east-1
- UPTOP_NODE_NAME=US East Probe
- UPTOP_NODE_REGION=us-east
- UPTOP_PEER_URL=http://leader:8080
- UPTOP_CLUSTER_SECRET=changeme
- UPTOP_CLUSTER_SECRET=changeme # EXAMPLE ONLY — rotate before use
depends_on:
- leader
probe-eu-west:
build: .
image: lerkolabs/uptop:latest
environment:
- UPTOP_CLUSTER_MODE=probe
- UPTOP_NODE_ID=eu-west-1
- UPTOP_NODE_NAME=EU West Probe
- UPTOP_NODE_REGION=eu-west
- UPTOP_PEER_URL=http://leader:8080
- UPTOP_CLUSTER_SECRET=changeme
- UPTOP_CLUSTER_SECRET=changeme # EXAMPLE ONLY — rotate before use
depends_on:
- leader
+8 -3
View File
@@ -1,10 +1,15 @@
services:
app:
build:
context: .
dockerfile: Dockerfile
image: lerkolabs/uptop:latest
container_name: uptop
restart: unless-stopped
read_only: true
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
tmpfs:
- /tmp
ports:
- "23234:23234"
- "8080:8080"
+6 -1
View File
@@ -16,6 +16,11 @@ A follower is a standby replica that takes over if the leader goes down.
- When the leader recovers, the follower detects it and goes back to standby
- Both nodes have their own database — they do not share state
**Limitations:**
- During a network partition where both nodes are healthy, both will run checks and fire alerts independently. There is no leader fencing — the follower has no way to confirm the leader is actually down vs. unreachable from its perspective. This window lasts until the partition heals, at which point the follower detects the leader and steps down.
- Expect duplicate alerts and doubled check history entries during a split-brain event. Alerts are idempotent for most providers (a second "site is down" notification is noisy but not harmful).
- Failover takeover time is ~15 seconds (3 missed polls × 5 second interval). This is not configurable.
**Required env vars:**
| Node | Variable | Value |
@@ -76,5 +81,5 @@ Set via `UPTOP_AGG_STRATEGY` on the leader.
## Security
- Set `UPTOP_CLUSTER_SECRET` on all nodes. Without it, cluster API endpoints are unauthenticated.
- Secrets are sent in HTTP headers (`X-Upkeep-Secret`). Use TLS or a reverse proxy for production.
- Secrets are sent in HTTP headers (`X-Uptop-Secret`). Use TLS or a reverse proxy for production.
- uptop warns on startup if the cluster secret is missing or if cluster mode is active without TLS.
+31 -2
View File
@@ -122,7 +122,7 @@ Groups can't nest inside other groups. A group is healthy when all its children
## Alert types
All 9 providers work in the YAML. The `settings` map is different per type.
All 10 providers work in the YAML. The `settings` map is different per type.
```yaml
# Discord / Slack / Generic Webhook — just a URL
@@ -149,6 +149,9 @@ All 9 providers work in the YAML. The `settings` map is different per type.
url: https://ntfy.sh
topic: my-alerts
priority: "4"
# for protected topics:
# username: user
# password: pass
# Telegram
- name: Telegram Ops
@@ -178,6 +181,14 @@ All 9 providers work in the YAML. The `settings` map is different per type.
url: https://gotify.example.com
token: app-token
priority: "8"
# Opsgenie
- name: Opsgenie
type: opsgenie
settings:
api_key: your-api-key
priority: P2 # P1P5, default P3
# eu: "true" # use the EU API endpoint
```
## Commands
@@ -224,7 +235,25 @@ Monitors and alerts are matched by **name**. Names must be unique across the ent
Apply is idempotent. Run it twice with the same file, second run changes nothing.
If something fails mid-apply, just fix the issue and run it again. It picks up where it left off.
Apply is **not atomic** — items are written one at a time, so an error mid-apply (bad value, lost DB connection, ctrl-C) leaves the items already written in place. That's safe to recover from: apply diffs against the database by name, so fix the issue and run it again — it converges the rest. Just don't run two applies against the same database at once.
## Backups and secrets
`uptop export` writes alert credentials (SMTP passwords, API tokens, webhook URLs) into the YAML in clear text — that's what makes the file restorable. Treat it like a secrets file.
The HTTP export endpoint redacts those same fields **by default**:
```bash
# secrets show as ***REDACTED*** — fine for sharing or review
curl -H "X-Uptop-Secret: your-secret" \
"http://localhost:8080/api/backup/export"
# full backup you can actually restore from
curl -H "X-Uptop-Secret: your-secret" \
"http://localhost:8080/api/backup/export?redact_secrets=false"
```
Restoring a redacted export imports the literal string `***REDACTED***` as your credentials. For real backups, pass `redact_secrets=false` or run `uptop export` on the host.
## Typical workflow
+63 -2
View File
@@ -3,9 +3,11 @@ package alert
import (
"bytes"
"context"
"crypto/tls"
"encoding/json"
"errors"
"fmt"
"net"
"net/http"
"net/smtp"
"net/url"
@@ -244,7 +246,6 @@ func (e *EmailProvider) Send(ctx context.Context, title, message string) error {
return ctx.Err()
default:
}
auth := smtp.PlainAuth("", e.User, e.Pass, e.Host)
to := sanitizeHeader(e.To)
from := sanitizeHeader(e.From)
subject := sanitizeHeader(title)
@@ -256,7 +257,67 @@ func (e *EmailProvider) Send(ctx context.Context, title, message string) error {
"Content-Type: text/plain; charset=utf-8\r\n" +
"\r\n" +
body + "\r\n")
return smtp.SendMail(e.Host+":"+e.Port, auth, from, []string{to}, msg)
return sendMailContext(ctx, e.Host, e.Port, e.User, e.Pass, from, []string{to}, msg)
}
// sendMailContext is a ctx-aware replacement for smtp.SendMail.
// smtp.SendMail ignores context entirely — a blackholed SMTP server hangs for
// the OS TCP timeout (minutes). This dials with the context deadline and sets
// connection deadlines so cancellation is respected throughout.
func sendMailContext(ctx context.Context, host, port, user, pass, from string, rcpt []string, msg []byte) error {
addr := host + ":" + port
dialer := net.Dialer{}
conn, err := dialer.DialContext(ctx, "tcp", addr)
if err != nil {
return fmt.Errorf("smtp dial: %w", err)
}
if deadline, ok := ctx.Deadline(); ok {
_ = conn.SetDeadline(deadline)
}
c, err := smtp.NewClient(conn, host)
if err != nil {
_ = conn.Close()
return fmt.Errorf("smtp client: %w", err)
}
defer c.Close()
if ok, _ := c.Extension("STARTTLS"); ok {
if err := c.StartTLS(&tls.Config{ServerName: host}); err != nil {
return fmt.Errorf("smtp starttls: %w", err)
}
}
if user != "" || pass != "" {
auth := smtp.PlainAuth("", user, pass, host)
if err := c.Auth(auth); err != nil {
return fmt.Errorf("smtp auth: %w", err)
}
}
if err := c.Mail(from); err != nil {
return fmt.Errorf("smtp mail: %w", err)
}
for _, r := range rcpt {
if err := c.Rcpt(r); err != nil {
return fmt.Errorf("smtp rcpt: %w", err)
}
}
w, err := c.Data()
if err != nil {
return fmt.Errorf("smtp data: %w", err)
}
if _, err := w.Write(msg); err != nil {
return fmt.Errorf("smtp write: %w", err)
}
if err := w.Close(); err != nil {
return fmt.Errorf("smtp data close: %w", err)
}
return c.Quit()
}
type NtfyProvider struct {
+117
View File
@@ -1,14 +1,18 @@
package alert
import (
"bufio"
"context"
"encoding/json"
"errors"
"fmt"
"net"
"net/http"
"net/http/httptest"
"net/url"
"strings"
"testing"
"time"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
)
@@ -330,3 +334,116 @@ func TestSanitizeError(t *testing.T) {
t.Error("nil should stay nil")
}
}
func TestEmailProvider_ContextTimeout(t *testing.T) {
// Listener that accepts but never speaks — simulates a blackholed SMTP server.
ln, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatal(err)
}
defer ln.Close()
go func() {
for {
conn, err := ln.Accept()
if err != nil {
return
}
// Hold connection open, never send banner.
go func(c net.Conn) {
time.Sleep(30 * time.Second)
c.Close()
}(conn)
}
}()
_, portStr, _ := net.SplitHostPort(ln.Addr().String())
provider := &EmailProvider{
Host: "127.0.0.1", Port: portStr,
From: "test@test.com", To: "dest@test.com",
}
ctx, cancel := context.WithTimeout(context.Background(), 200*time.Millisecond)
defer cancel()
start := time.Now()
err = provider.Send(ctx, "test", "body")
elapsed := time.Since(start)
if err == nil {
t.Fatal("expected error from stalled SMTP")
}
if elapsed > 2*time.Second {
t.Errorf("Send took %v — context deadline not respected", elapsed)
}
}
func TestSendMailContext_HappyPath(t *testing.T) {
// Minimal fake SMTP server that accepts one message.
ln, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatal(err)
}
defer ln.Close()
received := make(chan string, 1)
go func() {
conn, err := ln.Accept()
if err != nil {
return
}
defer conn.Close()
fmt.Fprintf(conn, "220 localhost ESMTP\r\n")
scanner := bufio.NewScanner(conn)
var dataMode bool
var body strings.Builder
for scanner.Scan() {
line := scanner.Text()
if dataMode {
if line == "." {
dataMode = false
fmt.Fprintf(conn, "250 OK\r\n")
continue
}
body.WriteString(line + "\n")
continue
}
switch {
case strings.HasPrefix(line, "EHLO"):
fmt.Fprintf(conn, "250-localhost\r\n250 OK\r\n")
case strings.HasPrefix(line, "MAIL FROM"):
fmt.Fprintf(conn, "250 OK\r\n")
case strings.HasPrefix(line, "RCPT TO"):
fmt.Fprintf(conn, "250 OK\r\n")
case line == "DATA":
fmt.Fprintf(conn, "354 Go ahead\r\n")
dataMode = true
case line == "QUIT":
fmt.Fprintf(conn, "221 Bye\r\n")
received <- body.String()
return
default:
fmt.Fprintf(conn, "250 OK\r\n")
}
}
}()
_, portStr, _ := net.SplitHostPort(ln.Addr().String())
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
err = sendMailContext(ctx, "127.0.0.1", portStr, "", "", "from@test.com", []string{"to@test.com"}, []byte("Subject: test\r\n\r\nhello"))
if err != nil {
t.Fatalf("sendMailContext: %v", err)
}
select {
case body := <-received:
if !strings.Contains(body, "hello") {
t.Errorf("expected body to contain 'hello', got: %s", body)
}
case <-time.After(5 * time.Second):
t.Fatal("timed out waiting for fake SMTP to receive message")
}
}
+1 -1
View File
@@ -52,7 +52,7 @@ func runFollowerLoop(ctx context.Context, cfg Config, eng *monitor.Engine) {
req, _ := http.NewRequest("GET", cfg.PeerURL+"/api/health", nil)
if cfg.SharedKey != "" {
req.Header.Set("X-Upkeep-Secret", cfg.SharedKey)
req.Header.Set("X-Uptop-Secret", cfg.SharedKey)
}
resp, err := client.Do(req)
+5 -5
View File
@@ -113,7 +113,7 @@ func TestFollowerLoop_SendsSecret(t *testing.T) {
var receivedSecret string
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
mu.Lock()
receivedSecret = r.Header.Get("X-Upkeep-Secret")
receivedSecret = r.Header.Get("X-Uptop-Secret")
mu.Unlock()
w.WriteHeader(200)
w.Write([]byte("OK"))
@@ -203,7 +203,7 @@ func TestProbeRegister_Failure(t *testing.T) {
func TestProbeFetchAssignments_Success(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
json.NewEncoder(w).Encode(map[string][]models.Site{
"sites": {{ID: 1, Name: "s1", Type: "http", URL: "http://example.com"}},
"sites": {{SiteConfig: models.SiteConfig{ID: 1, Name: "s1", Type: "http", URL: "http://example.com"}}},
})
}))
defer srv.Close()
@@ -240,8 +240,8 @@ func TestProbeExecuteChecks(t *testing.T) {
defer srv.Close()
sites := []models.Site{
{ID: 1, Type: "http", URL: srv.URL},
{ID: 2, Type: "http", URL: srv.URL},
{SiteConfig: models.SiteConfig{ID: 1, Type: "http", URL: srv.URL}},
{SiteConfig: models.SiteConfig{ID: 2, Type: "http", URL: srv.URL}},
}
strict := &http.Client{}
@@ -277,7 +277,7 @@ func TestProbeExecuteChecks_Concurrency(t *testing.T) {
var sites []models.Site
for i := 0; i < 20; i++ {
sites = append(sites, models.Site{ID: i + 1, Type: "http", URL: srv.URL})
sites = append(sites, models.Site{SiteConfig: models.SiteConfig{ID: i + 1, Type: "http", URL: srv.URL}})
}
results := probeExecuteChecks(context.Background(), sites, &http.Client{}, &http.Client{}, true)
+9 -9
View File
@@ -6,7 +6,7 @@ import (
"crypto/tls"
"encoding/json"
"fmt"
"log"
"log/slog"
"net/http"
"net/url"
"sync"
@@ -47,7 +47,7 @@ func RunProbe(ctx context.Context, cfg ProbeConfig) error {
}
if err := probeRegister(ctx, apiClient, cfg); err != nil {
log.Printf("Probe: initial registration failed: %v (will retry)", err)
slog.Error("probe initial registration failed", "err", err)
}
for {
@@ -59,7 +59,7 @@ func RunProbe(ctx context.Context, cfg ProbeConfig) error {
sites, err := probeFetchAssignments(ctx, apiClient, cfg)
if err != nil {
log.Printf("Probe: failed to fetch assignments: %v", err)
slog.Error("probe failed to fetch assignments", "err", err)
sleepCtx(ctx, 10*time.Second)
continue
}
@@ -73,7 +73,7 @@ func RunProbe(ctx context.Context, cfg ProbeConfig) error {
if len(results) > 0 {
if err := probeReportResults(ctx, apiClient, cfg, results); err != nil {
log.Printf("Probe: failed to report results: %v", err)
slog.Error("probe failed to report results", "err", err)
}
}
@@ -90,7 +90,7 @@ func probeRegister(ctx context.Context, client *http.Client, cfg ProbeConfig) er
return err
}
req.Header.Set("Content-Type", "application/json")
req.Header.Set("X-Upkeep-Secret", cfg.SharedKey)
req.Header.Set("X-Uptop-Secret", cfg.SharedKey)
resp, err := client.Do(req)
if err != nil {
return err
@@ -108,7 +108,7 @@ func probeFetchAssignments(ctx context.Context, client *http.Client, cfg ProbeCo
if err != nil {
return nil, err
}
req.Header.Set("X-Upkeep-Secret", cfg.SharedKey)
req.Header.Set("X-Uptop-Secret", cfg.SharedKey)
resp, err := client.Do(req)
if err != nil {
return nil, err
@@ -152,7 +152,7 @@ loop:
defer wg.Done()
defer func() { <-sem }()
cr := monitor.RunCheck(ctx, s, strict, insecure, false, allowPrivate)
cr := monitor.RunCheck(ctx, s.SiteConfig, strict, insecure, false, allowPrivate)
mu.Lock()
results = append(results, probeResultItem{
SiteID: s.ID,
@@ -180,7 +180,7 @@ func probeReportResults(ctx context.Context, client *http.Client, cfg ProbeConfi
return err
}
req.Header.Set("Content-Type", "application/json")
req.Header.Set("X-Upkeep-Secret", cfg.SharedKey)
req.Header.Set("X-Uptop-Secret", cfg.SharedKey)
resp, err := client.Do(req)
if err != nil {
return err
@@ -189,7 +189,7 @@ func probeReportResults(ctx context.Context, client *http.Client, cfg ProbeConfi
if resp.StatusCode != 200 {
return fmt.Errorf("results returned %d", resp.StatusCode)
}
fmt.Printf("Probe: reported %d check results\n", len(results))
slog.Info("probe reported check results", "count", len(results))
return nil
}
+13 -6
View File
@@ -42,7 +42,7 @@ func Apply(ctx context.Context, s store.Store, f *File, opts ApplyOpts) ([]Chang
existingAlertsByName[a.Name] = a
}
existingSitesByName := make(map[string]models.Site, len(existingSites))
existingSitesByName := make(map[string]models.SiteConfig, len(existingSites))
for _, s := range existingSites {
existingSitesByName[s.Name] = s
}
@@ -54,6 +54,7 @@ func Apply(ctx context.Context, s store.Store, f *File, opts ApplyOpts) ([]Chang
alertMap[ea.Name] = ea.ID
}
nextPlaceholderID := -1
desiredAlertNames := make(map[string]bool, len(f.Alerts))
for _, a := range f.Alerts {
desiredAlertNames[a.Name] = true
@@ -66,6 +67,9 @@ func Apply(ctx context.Context, s store.Store, f *File, opts ApplyOpts) ([]Chang
return changes, fmt.Errorf("create alert %q: %w", a.Name, err)
}
alertMap[a.Name] = id
} else {
alertMap[a.Name] = nextPlaceholderID
nextPlaceholderID--
}
} else {
alertMap[a.Name] = existing.ID
@@ -109,6 +113,9 @@ func Apply(ctx context.Context, s store.Store, f *File, opts ApplyOpts) ([]Chang
return changes, fmt.Errorf("create group %q: %w", g.Name, err)
}
groupMap[g.Name] = id
} else {
groupMap[g.Name] = nextPlaceholderID
nextPlaceholderID--
}
} else {
groupMap[g.Name] = existing.ID
@@ -181,7 +188,7 @@ func Apply(ctx context.Context, s store.Store, f *File, opts ApplyOpts) ([]Chang
return changes, nil
}
func applyMonitor(ctx context.Context, s store.Store, m Monitor, alertMap map[string]int, existing map[string]models.Site, parentID int, dryRun bool) ([]Change, error) {
func applyMonitor(ctx context.Context, s store.Store, m Monitor, alertMap map[string]int, existing map[string]models.SiteConfig, parentID int, dryRun bool) ([]Change, error) {
alertID, err := resolveAlertID(alertMap, m.Alert)
if err != nil {
return nil, fmt.Errorf("monitor %q: %w", m.Name, err)
@@ -222,8 +229,8 @@ func resolveAlertID(alertMap map[string]int, name string) (int, error) {
return id, nil
}
func monitorToSite(m Monitor, alertID, parentID int) models.Site {
s := models.Site{
func monitorToSite(m Monitor, alertID, parentID int) models.SiteConfig {
s := models.SiteConfig{
Name: m.Name,
Type: m.Type,
URL: m.URL,
@@ -269,7 +276,7 @@ func collectMonitorNames(monitors []Monitor, names map[string]bool) {
}
}
func normalizeSite(s models.Site) models.Site {
func normalizeSite(s models.SiteConfig) models.SiteConfig {
if s.Method == "" {
s.Method = "GET"
}
@@ -293,7 +300,7 @@ func diffAlert(existing models.AlertConfig, desired Alert) string {
return strings.Join(diffs, ", ")
}
func diffSite(existing, desired models.Site) string {
func diffSite(existing, desired models.SiteConfig) string {
var diffs []string
if existing.URL != desired.URL {
diffs = append(diffs, fmt.Sprintf("url: %s -> %s", existing.URL, desired.URL))
+71 -3
View File
@@ -114,8 +114,8 @@ func TestApplyUpdate(t *testing.T) {
func TestApplyPrune(t *testing.T) {
s := newTestStore(t)
s.AddSite(context.Background(), models.Site{Name: "Keep", URL: "https://keep.com", Type: "http", Interval: 30, ExpiryThreshold: 7, Method: "GET", AcceptedCodes: "200-299"})
s.AddSite(context.Background(), models.Site{Name: "Remove", URL: "https://remove.com", Type: "http", Interval: 30, ExpiryThreshold: 7, Method: "GET", AcceptedCodes: "200-299"})
s.AddSite(context.Background(), models.SiteConfig{Name: "Keep", URL: "https://keep.com", Type: "http", Interval: 30, ExpiryThreshold: 7, Method: "GET", AcceptedCodes: "200-299"})
s.AddSite(context.Background(), models.SiteConfig{Name: "Remove", URL: "https://remove.com", Type: "http", Interval: 30, ExpiryThreshold: 7, Method: "GET", AcceptedCodes: "200-299"})
f := &File{
Monitors: []Monitor{
@@ -191,7 +191,7 @@ func TestApplyGroupHierarchy(t *testing.T) {
}
sites, _ := s.GetSites(context.Background())
var group models.Site
var group models.SiteConfig
for _, s := range sites {
if s.Type == "group" {
group = s
@@ -266,6 +266,74 @@ func TestApplyDuplicateNames(t *testing.T) {
}
}
func TestApplyDryRunNewAlertAndMonitor(t *testing.T) {
s := newTestStore(t)
f := &File{
Alerts: []Alert{
{Name: "Discord", Type: "discord", Settings: map[string]string{"url": "https://example.com"}},
},
Monitors: []Monitor{
{Name: "Web", Type: "http", URL: "https://example.com", Interval: 30, Alert: "Discord"},
},
}
changes, err := Apply(context.Background(), s, f, ApplyOpts{DryRun: true})
if err != nil {
t.Fatalf("dry-run with new alert+monitor should not error: %v", err)
}
creates := 0
for _, c := range changes {
if c.Action == "create" {
creates++
}
}
if creates != 2 {
t.Fatalf("expected 2 creates (alert+monitor), got %d: %+v", creates, changes)
}
sites, _ := s.GetSites(context.Background())
alerts, _ := s.GetAllAlerts(context.Background())
if len(sites) != 0 {
t.Fatalf("dry-run should not persist sites, got %d", len(sites))
}
if len(alerts) != 0 {
t.Fatalf("dry-run should not persist alerts, got %d", len(alerts))
}
}
func TestApplyDryRunNewGroupWithChildren(t *testing.T) {
s := newTestStore(t)
f := &File{
Alerts: []Alert{
{Name: "Slack", Type: "slack", Settings: map[string]string{"url": "https://hooks.example.com"}},
},
Monitors: []Monitor{
{
Name: "Prod", Type: "group", Alert: "Slack",
Monitors: []Monitor{
{Name: "API", Type: "http", URL: "https://api.example.com", Interval: 15, Alert: "Slack"},
},
},
},
}
changes, err := Apply(context.Background(), s, f, ApplyOpts{DryRun: true})
if err != nil {
t.Fatalf("dry-run with new group+alert should not error: %v", err)
}
creates := 0
for _, c := range changes {
if c.Action == "create" {
creates++
}
}
if creates != 3 {
t.Fatalf("expected 3 creates (alert+group+child), got %d: %+v", creates, changes)
}
}
func TestApplyExistingAlertReference(t *testing.T) {
s := newTestStore(t)
s.AddAlert(context.Background(), "Existing", "webhook", map[string]string{"url": "https://example.com"})
+4 -4
View File
@@ -34,9 +34,9 @@ func Export(ctx context.Context, s store.Store) (*File, error) {
})
}
groups := make(map[int]models.Site)
children := make(map[int][]models.Site)
var topLevel []models.Site
groups := make(map[int]models.SiteConfig)
children := make(map[int][]models.SiteConfig)
var topLevel []models.SiteConfig
for _, s := range dbSites {
switch {
@@ -76,7 +76,7 @@ func Export(ctx context.Context, s store.Store) (*File, error) {
return &File{Alerts: yamlAlerts, Monitors: yamlMonitors}, nil
}
func siteToMonitor(s models.Site, alertIDToName map[int]string) Monitor {
func siteToMonitor(s models.SiteConfig, alertIDToName map[int]string) Monitor {
m := Monitor{
Name: s.Name,
Type: s.Type,
+7 -7
View File
@@ -22,7 +22,7 @@ func TestExportAlertNames(t *testing.T) {
s := newTestStore(t)
s.AddAlert(context.Background(), "Discord", "discord", map[string]string{"url": "https://example.com"})
alerts, _ := s.GetAllAlerts(context.Background())
s.AddSite(context.Background(), models.Site{Name: "Web", URL: "https://example.com", Type: "http", Interval: 30, AlertID: alerts[0].ID, ExpiryThreshold: 7, Method: "GET", AcceptedCodes: "200-299"})
s.AddSite(context.Background(), models.SiteConfig{Name: "Web", URL: "https://example.com", Type: "http", Interval: 30, AlertID: alerts[0].ID, ExpiryThreshold: 7, Method: "GET", AcceptedCodes: "200-299"})
f, err := Export(context.Background(), s)
if err != nil {
@@ -39,9 +39,9 @@ func TestExportAlertNames(t *testing.T) {
func TestExportGroupHierarchy(t *testing.T) {
s := newTestStore(t)
groupID, _ := s.AddSiteReturningID(context.Background(), models.Site{Name: "Prod", Type: "group", ExpiryThreshold: 7, Method: "GET", AcceptedCodes: "200-299"})
s.AddSite(context.Background(), models.Site{Name: "Prod Web", URL: "https://prod.example.com", Type: "http", Interval: 15, ParentID: groupID, ExpiryThreshold: 7, Method: "GET", AcceptedCodes: "200-299"})
s.AddSite(context.Background(), models.Site{Name: "Top Level", URL: "https://example.com", Type: "http", Interval: 30, ExpiryThreshold: 7, Method: "GET", AcceptedCodes: "200-299"})
groupID, _ := s.AddSiteReturningID(context.Background(), models.SiteConfig{Name: "Prod", Type: "group", ExpiryThreshold: 7, Method: "GET", AcceptedCodes: "200-299"})
s.AddSite(context.Background(), models.SiteConfig{Name: "Prod Web", URL: "https://prod.example.com", Type: "http", Interval: 15, ParentID: groupID, ExpiryThreshold: 7, Method: "GET", AcceptedCodes: "200-299"})
s.AddSite(context.Background(), models.SiteConfig{Name: "Top Level", URL: "https://example.com", Type: "http", Interval: 30, ExpiryThreshold: 7, Method: "GET", AcceptedCodes: "200-299"})
f, err := Export(context.Background(), s)
if err != nil {
@@ -72,7 +72,7 @@ func TestExportGroupHierarchy(t *testing.T) {
func TestExportOmitsDefaults(t *testing.T) {
s := newTestStore(t)
s.AddSite(context.Background(), models.Site{
s.AddSite(context.Background(), models.SiteConfig{
Name: "Web", URL: "https://example.com", Type: "http", Interval: 30,
Method: "GET", AcceptedCodes: "200-299", ExpiryThreshold: 7,
})
@@ -98,8 +98,8 @@ func TestExportRoundTrip(t *testing.T) {
s1 := newTestStore(t)
s1.AddAlert(context.Background(), "Discord", "discord", map[string]string{"url": "https://example.com"})
alerts, _ := s1.GetAllAlerts(context.Background())
s1.AddSite(context.Background(), models.Site{Name: "Web", URL: "https://example.com", Type: "http", Interval: 30, AlertID: alerts[0].ID, ExpiryThreshold: 7, Method: "GET", AcceptedCodes: "200-299"})
s1.AddSite(context.Background(), models.Site{Name: "Ping", Type: "ping", Hostname: "10.0.0.1", Interval: 60, ExpiryThreshold: 7, Method: "GET", AcceptedCodes: "200-299"})
s1.AddSite(context.Background(), models.SiteConfig{Name: "Web", URL: "https://example.com", Type: "http", Interval: 30, AlertID: alerts[0].ID, ExpiryThreshold: 7, Method: "GET", AcceptedCodes: "200-299"})
s1.AddSite(context.Background(), models.SiteConfig{Name: "Ping", Type: "ping", Hostname: "10.0.0.1", Interval: 60, ExpiryThreshold: 7, Method: "GET", AcceptedCodes: "200-299"})
exported, err := Export(context.Background(), s1)
if err != nil {
+15 -4
View File
@@ -1,11 +1,14 @@
package importer
import (
"crypto/rand"
"encoding/hex"
"encoding/json"
"fmt"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
"os"
"strings"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
)
type KumaBackup struct {
@@ -80,7 +83,7 @@ func ConvertKuma(kb *KumaBackup) models.Backup {
}
}
var sites []models.Site
var sites []models.SiteConfig
for _, m := range kb.MonitorList {
site := convertKumaMonitor(m, kumaToUpkeepAlert)
sites = append(sites, site)
@@ -132,8 +135,8 @@ func convertKumaNotifications(entries []KumaNotifEntry) map[int]models.AlertConf
return result
}
func convertKumaMonitor(m KumaMonitor, alertMap map[int]int) models.Site {
site := models.Site{
func convertKumaMonitor(m KumaMonitor, alertMap map[int]int) models.SiteConfig {
site := models.SiteConfig{
ID: m.ID,
Name: m.Name,
Description: m.Description,
@@ -155,10 +158,18 @@ func convertKumaMonitor(m KumaMonitor, alertMap map[int]int) models.Site {
site.DNSResolveType = m.DNSResolveType
site.DNSServer = m.DNSResolveServer
site.Paused = !m.Active
switch m.Type {
case "http":
site.URL = m.URL
site.CheckSSL = m.ExpiryNotif
case "push":
site.Type = "push"
b := make([]byte, 16)
if _, err := rand.Read(b); err == nil {
site.Token = hex.EncodeToString(b)
}
case "ping":
if m.Hostname != "" {
site.Hostname = m.Hostname
+210
View File
@@ -0,0 +1,210 @@
package importer
import (
"os"
"path/filepath"
"strings"
"testing"
)
func writeTemp(t *testing.T, content string) string {
t.Helper()
path := filepath.Join(t.TempDir(), "backup.json")
if err := os.WriteFile(path, []byte(content), 0o600); err != nil {
t.Fatal(err)
}
return path
}
func TestLoadKumaFileMissingFile(t *testing.T) {
_, err := LoadKumaFile(filepath.Join(t.TempDir(), "nope.json"))
if err == nil {
t.Fatal("expected error for missing file")
}
}
func TestLoadKumaFileMalformedInput(t *testing.T) {
cases := []struct {
name string
body string
}{
{"empty file", ""},
{"truncated JSON", `{"version": "1.23", "monitorList": [`},
{"not JSON", "definitely not json"},
{"wrong root type", `[1, 2, 3]`},
{"monitorList wrong type", `{"monitorList": {"a": 1}}`},
{"monitor field wrong type", `{"monitorList": [{"id": "not-an-int"}]}`},
{"notificationList wrong type", `{"notificationList": "oops"}`},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
_, err := LoadKumaFile(writeTemp(t, tc.body))
if err == nil {
t.Fatalf("expected parse error for %s", tc.name)
}
if !strings.Contains(err.Error(), "parse JSON") {
t.Fatalf("expected wrapped parse error, got: %v", err)
}
})
}
}
func TestLoadKumaFileNullLists(t *testing.T) {
kb, err := LoadKumaFile(writeTemp(t, `{"version": "1.23", "monitorList": null, "notificationList": null}`))
if err != nil {
t.Fatal(err)
}
backup := ConvertKuma(kb)
if len(backup.Sites) != 0 || len(backup.Alerts) != 0 {
t.Fatalf("expected empty backup, got %d sites %d alerts", len(backup.Sites), len(backup.Alerts))
}
}
func TestConvertKumaSkipsMalformedNotificationConfig(t *testing.T) {
kb := &KumaBackup{
NotificationList: []KumaNotifEntry{
{ID: 1, Name: "broken", Config: "{not json"},
{ID: 2, Name: "good", Config: `{"type": "discord", "ntfyserverurl": "https://example.com/hook"}`},
},
MonitorList: []KumaMonitor{
{ID: 10, Name: "site", Type: "http", URL: "https://example.com", NotificationIDs: map[string]bool{"1": true}},
},
}
backup := ConvertKuma(kb)
if len(backup.Alerts) != 1 {
t.Fatalf("expected broken notification skipped, got %d alerts", len(backup.Alerts))
}
if backup.Alerts[0].Type != "discord" {
t.Fatalf("expected discord alert, got %q", backup.Alerts[0].Type)
}
if backup.Sites[0].AlertID != 0 {
t.Fatalf("site referencing skipped notification should keep AlertID 0, got %d", backup.Sites[0].AlertID)
}
}
func TestConvertKumaNtfyNotification(t *testing.T) {
kb := &KumaBackup{
NotificationList: []KumaNotifEntry{
{ID: 3, Name: "ntfy", Config: `{
"type": "ntfy",
"ntfyserverurl": "https://ntfy.example.com/",
"ntfytopic": "uptime",
"ntfyPriority": 4,
"ntfyAuthenticationMethod": "usernamePassword",
"ntfyusername": "u",
"ntfypassword": "p"
}`},
},
}
backup := ConvertKuma(kb)
if len(backup.Alerts) != 1 {
t.Fatalf("expected 1 alert, got %d", len(backup.Alerts))
}
a := backup.Alerts[0]
if a.Type != "ntfy" {
t.Fatalf("expected ntfy, got %q", a.Type)
}
if a.Settings["url"] != "https://ntfy.example.com" {
t.Fatalf("expected trailing slash trimmed, got %q", a.Settings["url"])
}
if a.Settings["topic"] != "uptime" || a.Settings["priority"] != "4" {
t.Fatalf("unexpected settings: %v", a.Settings)
}
if a.Settings["username"] != "u" || a.Settings["password"] != "p" {
t.Fatalf("expected credentials mapped, got %v", a.Settings)
}
}
func TestConvertKumaUnknownNotificationFallsBackToWebhook(t *testing.T) {
kb := &KumaBackup{
NotificationList: []KumaNotifEntry{
{ID: 4, Name: "matrix", Config: `{"type": "matrix", "ntfyserverurl": "https://example.com/hook"}`},
},
}
backup := ConvertKuma(kb)
if len(backup.Alerts) != 1 || backup.Alerts[0].Type != "webhook" {
t.Fatalf("expected webhook fallback, got %+v", backup.Alerts)
}
}
func TestConvertKumaHTTPMonitor(t *testing.T) {
kb := &KumaBackup{
NotificationList: []KumaNotifEntry{
{ID: 1, Name: "hook", Config: `{"type": "slack", "ntfyserverurl": "https://example.com/hook"}`},
},
MonitorList: []KumaMonitor{{
ID: 7,
Name: "web",
Type: "http",
URL: "https://example.com",
Interval: 60,
Timeout: 30,
MaxRetries: 2,
Method: "GET",
AcceptedCodes: []string{"200", "301"},
IgnoreTLS: true,
ExpiryNotif: true,
Active: false,
NotificationIDs: map[string]bool{"1": true},
}},
}
backup := ConvertKuma(kb)
if len(backup.Sites) != 1 {
t.Fatalf("expected 1 site, got %d", len(backup.Sites))
}
s := backup.Sites[0]
if s.URL != "https://example.com" || !s.CheckSSL || !s.IgnoreTLS {
t.Fatalf("http fields not mapped: %+v", s)
}
if !s.Paused {
t.Fatal("inactive monitor should import paused")
}
if s.AcceptedCodes != "200,301" {
t.Fatalf("expected joined accepted codes, got %q", s.AcceptedCodes)
}
if s.AlertID != 1 {
t.Fatalf("expected alert mapped, got %d", s.AlertID)
}
}
func TestConvertKumaPushMonitorGetsToken(t *testing.T) {
kb := &KumaBackup{
MonitorList: []KumaMonitor{{ID: 1, Name: "push", Type: "push", Active: true}},
}
backup := ConvertKuma(kb)
token := backup.Sites[0].Token
if len(token) != 32 {
t.Fatalf("expected 32-char hex token, got %q", token)
}
}
func TestConvertKumaNonNumericNotificationID(t *testing.T) {
kb := &KumaBackup{
MonitorList: []KumaMonitor{{
ID: 1,
Name: "site",
Type: "http",
NotificationIDs: map[string]bool{"abc": true},
}},
}
backup := ConvertKuma(kb)
if backup.Sites[0].AlertID != 0 {
t.Fatalf("non-numeric notification ID should not map, got %d", backup.Sites[0].AlertID)
}
}
func TestConvertKumaGroupAndChildren(t *testing.T) {
kb := &KumaBackup{
MonitorList: []KumaMonitor{
{ID: 1, Name: "grp", Type: "group", Active: true},
{ID: 2, Name: "ping", Type: "ping", Hostname: "10.0.0.1", Parent: 1, Active: true},
},
}
backup := ConvertKuma(kb)
if backup.Sites[0].Type != "group" {
t.Fatalf("expected group type, got %q", backup.Sites[0].Type)
}
if backup.Sites[1].ParentID != 1 || backup.Sites[1].Hostname != "10.0.0.1" {
t.Fatalf("child not mapped: %+v", backup.Sites[1])
}
}
+3 -3
View File
@@ -15,16 +15,16 @@ import (
type mockStore struct {
storetest.BaseMock
sites []models.Site
sites []models.SiteConfig
}
func (m *mockStore) GetSites(_ context.Context) ([]models.Site, error) {
func (m *mockStore) GetSites(_ context.Context) ([]models.SiteConfig, error) {
return m.sites, nil
}
func TestMetricsHandler(t *testing.T) {
ms := &mockStore{
sites: []models.Site{
sites: []models.SiteConfig{
{ID: 1, Name: "Example", URL: "https://example.com", Type: "http", Interval: 30},
{ID: 2, Name: "DNS Check", Type: "dns", Interval: 60},
},
+9 -2
View File
@@ -2,7 +2,7 @@ package models
import "time"
type Site struct {
type SiteConfig struct {
ID int
Name string
URL string
@@ -26,7 +26,9 @@ type Site struct {
IgnoreTLS bool
Paused bool
Regions string
}
type SiteState struct {
FailureCount int
Status Status
StatusCode int
@@ -40,6 +42,11 @@ type Site struct {
LastSuccessAt time.Time
}
type Site struct {
SiteConfig
SiteState
}
type StateChange struct {
ID int
SiteID int
@@ -103,7 +110,7 @@ type MaintenanceWindow struct {
}
type Backup struct {
Sites []Site `json:"sites"`
Sites []SiteConfig `json:"sites"`
Alerts []AlertConfig `json:"alerts"`
Users []User `json:"users"`
MaintenanceWindows []MaintenanceWindow `json:"maintenance_windows,omitempty"`
+50 -20
View File
@@ -3,6 +3,7 @@ package monitor
import (
"context"
"fmt"
"io"
"net"
"net/http"
"strconv"
@@ -35,22 +36,27 @@ type CheckResult struct {
ErrorReason string
}
func RunCheck(ctx context.Context, site models.Site, strict, insecure *http.Client, globalInsecure bool, allowPrivate ...bool) CheckResult {
private := len(allowPrivate) > 0 && allowPrivate[0]
if site.Type != "http" && site.Type != "dns" && !private {
func RunCheck(ctx context.Context, site models.SiteConfig, strict, insecure *http.Client, globalInsecure, allowPrivate bool) CheckResult {
// Resolve + validate once for non-HTTP types to prevent DNS-rebind TOCTOU:
// a second resolve in the check function could return a different (private) IP.
// HTTP is safe — SafeDialContext resolves and validates at dial time.
var pinnedIP net.IP
if site.Type != "http" && site.Type != "dns" && !allowPrivate {
host := site.Hostname
if host == "" {
host = site.URL
}
if host != "" {
if ips, err := net.LookupIP(host); err == nil {
for _, ip := range ips {
if isPrivateIP(ip) {
return CheckResult{SiteID: site.ID, Status: string(models.StatusDown), ErrorReason: "target resolves to private IP"}
}
ips, err := net.LookupIP(host)
if err != nil {
return CheckResult{SiteID: site.ID, Status: string(models.StatusDown), ErrorReason: "resolve failed: " + err.Error()}
}
for _, ip := range ips {
if isPrivateIP(ip) {
return CheckResult{SiteID: site.ID, Status: string(models.StatusDown), ErrorReason: "target resolves to private IP"}
}
}
pinnedIP = ips[0]
}
}
@@ -58,17 +64,17 @@ func RunCheck(ctx context.Context, site models.Site, strict, insecure *http.Clie
case "http":
return runHTTPCheck(ctx, site, strict, insecure, globalInsecure)
case "ping":
return runPingCheck(ctx, site)
return runPingCheck(ctx, site, pinnedIP)
case "port":
return runPortCheck(ctx, site)
return runPortCheck(ctx, site, pinnedIP)
case "dns":
return runDNSCheck(ctx, site)
return runDNSCheck(ctx, site, allowPrivate)
default:
return CheckResult{SiteID: site.ID, Status: string(models.StatusDown), ErrorReason: "unsupported monitor type: " + site.Type}
}
}
func runHTTPCheck(ctx context.Context, site models.Site, strict, insecure *http.Client, globalInsecure bool) CheckResult {
func runHTTPCheck(ctx context.Context, site models.SiteConfig, strict, insecure *http.Client, globalInsecure bool) CheckResult {
method := site.Method
if method == "" {
method = "GET"
@@ -103,7 +109,10 @@ func runHTTPCheck(ctx context.Context, site models.Site, strict, insecure *http.
result.ErrorReason = truncateError(err.Error(), maxErrorLength)
return result
}
defer resp.Body.Close()
defer func() {
_, _ = io.Copy(io.Discard, resp.Body)
_ = resp.Body.Close()
}()
result.StatusCode = resp.StatusCode
if !isCodeAccepted(resp.StatusCode, site.AcceptedCodes) {
@@ -128,7 +137,7 @@ func runHTTPCheck(ctx context.Context, site models.Site, strict, insecure *http.
return result
}
func runPingCheck(_ context.Context, site models.Site) CheckResult {
func runPingCheck(_ context.Context, site models.SiteConfig, pinnedIP net.IP) CheckResult {
host := site.Hostname
if host == "" {
host = site.URL
@@ -138,6 +147,9 @@ func runPingCheck(_ context.Context, site models.Site) CheckResult {
if err != nil {
return CheckResult{SiteID: site.ID, Status: string(models.StatusDown), ErrorReason: "ping setup: " + err.Error()}
}
if pinnedIP != nil {
pinger.SetIPAddr(&net.IPAddr{IP: pinnedIP})
}
pinger.Count = 1
pinger.Timeout = siteTimeout(site)
pinger.SetPrivileged(false)
@@ -157,11 +169,14 @@ func runPingCheck(_ context.Context, site models.Site) CheckResult {
return CheckResult{SiteID: site.ID, Status: string(models.StatusUp), LatencyNs: stats.AvgRtt.Nanoseconds()}
}
func runPortCheck(_ context.Context, site models.Site) CheckResult {
func runPortCheck(_ context.Context, site models.SiteConfig, pinnedIP net.IP) CheckResult {
host := site.Hostname
if host == "" {
host = site.URL
}
if pinnedIP != nil {
host = pinnedIP.String()
}
addr := net.JoinHostPort(host, strconv.Itoa(site.Port))
timeout := siteTimeout(site)
@@ -176,7 +191,7 @@ func runPortCheck(_ context.Context, site models.Site) CheckResult {
return CheckResult{SiteID: site.ID, Status: string(models.StatusUp), LatencyNs: latency.Nanoseconds()}
}
func runDNSCheck(_ context.Context, site models.Site) CheckResult {
func runDNSCheck(_ context.Context, site models.SiteConfig, allowPrivate bool) CheckResult {
host := site.Hostname
if host == "" {
host = site.URL
@@ -186,9 +201,24 @@ func runDNSCheck(_ context.Context, site models.Site) CheckResult {
if server == "" {
server = defaultDNSServer
}
if _, _, err := net.SplitHostPort(server); err != nil {
server = net.JoinHostPort(server, defaultDNSPort)
serverHost, serverPort, err := net.SplitHostPort(server)
if err != nil {
serverHost = server
serverPort = defaultDNSPort
}
if !allowPrivate {
if serverPort != defaultDNSPort {
return CheckResult{SiteID: site.ID, Status: string(models.StatusDown), ErrorReason: "DNS server port must be 53"}
}
if ips, err := net.LookupIP(serverHost); err == nil {
for _, ip := range ips {
if isPrivateIP(ip) {
return CheckResult{SiteID: site.ID, Status: string(models.StatusDown), ErrorReason: "DNS server resolves to private address"}
}
}
}
}
server = net.JoinHostPort(serverHost, serverPort)
qtype := dns.TypeA
switch site.DNSResolveType {
@@ -229,7 +259,7 @@ func runDNSCheck(_ context.Context, site models.Site) CheckResult {
return CheckResult{SiteID: site.ID, Status: string(models.StatusUp), LatencyNs: latency.Nanoseconds()}
}
func siteTimeout(site models.Site) time.Duration {
func siteTimeout(site models.SiteConfig) time.Duration {
if site.Timeout > 0 {
return time.Duration(site.Timeout) * time.Second
}
+57 -20
View File
@@ -19,8 +19,8 @@ func TestRunCheck_HTTP_Success(t *testing.T) {
}))
defer srv.Close()
site := models.Site{ID: 1, Type: "http", URL: srv.URL}
result := RunCheck(context.Background(), site, http.DefaultClient, http.DefaultClient, false)
site := models.SiteConfig{ID: 1, Type: "http", URL: srv.URL}
result := RunCheck(context.Background(), site, http.DefaultClient, http.DefaultClient, false, false)
if result.Status != "UP" {
t.Errorf("expected UP, got %s", result.Status)
@@ -39,8 +39,8 @@ func TestRunCheck_HTTP_ServerError(t *testing.T) {
}))
defer srv.Close()
site := models.Site{ID: 1, Type: "http", URL: srv.URL}
result := RunCheck(context.Background(), site, http.DefaultClient, http.DefaultClient, false)
site := models.SiteConfig{ID: 1, Type: "http", URL: srv.URL}
result := RunCheck(context.Background(), site, http.DefaultClient, http.DefaultClient, false, false)
if result.Status != "DOWN" {
t.Errorf("expected DOWN, got %s", result.Status)
@@ -60,8 +60,8 @@ func TestRunCheck_HTTP_CustomAcceptedCodes(t *testing.T) {
return http.ErrUseLastResponse
}}
site := models.Site{ID: 1, Type: "http", URL: srv.URL, AcceptedCodes: "200-399"}
result := RunCheck(context.Background(), site, client, client, false)
site := models.SiteConfig{ID: 1, Type: "http", URL: srv.URL, AcceptedCodes: "200-399"}
result := RunCheck(context.Background(), site, client, client, false, false)
if result.Status != "UP" {
t.Errorf("expected UP with accepted 200-399, got %s", result.Status)
@@ -76,8 +76,8 @@ func TestRunCheck_HTTP_MethodRespected(t *testing.T) {
}))
defer srv.Close()
site := models.Site{ID: 1, Type: "http", URL: srv.URL, Method: "HEAD"}
RunCheck(context.Background(), site, http.DefaultClient, http.DefaultClient, false)
site := models.SiteConfig{ID: 1, Type: "http", URL: srv.URL, Method: "HEAD"}
RunCheck(context.Background(), site, http.DefaultClient, http.DefaultClient, false, false)
if receivedMethod != "HEAD" {
t.Errorf("expected HEAD, got %s", receivedMethod)
@@ -91,8 +91,8 @@ func TestRunCheck_HTTP_Timeout(t *testing.T) {
}))
defer srv.Close()
site := models.Site{ID: 1, Type: "http", URL: srv.URL, Timeout: 1}
result := RunCheck(context.Background(), site, http.DefaultClient, http.DefaultClient, false)
site := models.SiteConfig{ID: 1, Type: "http", URL: srv.URL, Timeout: 1}
result := RunCheck(context.Background(), site, http.DefaultClient, http.DefaultClient, false, false)
if result.Status != "DOWN" {
t.Errorf("expected DOWN on timeout, got %s", result.Status)
@@ -109,8 +109,8 @@ func TestRunCheck_HTTP_SSLFields(t *testing.T) {
Transport: &http.Transport{TLSClientConfig: &tls.Config{InsecureSkipVerify: true}},
}
site := models.Site{ID: 1, Type: "http", URL: srv.URL, CheckSSL: true, IgnoreTLS: true}
result := RunCheck(context.Background(), site, http.DefaultClient, insecureClient, false)
site := models.SiteConfig{ID: 1, Type: "http", URL: srv.URL, CheckSSL: true, IgnoreTLS: true}
result := RunCheck(context.Background(), site, http.DefaultClient, insecureClient, false, false)
if result.Status != "UP" {
t.Errorf("expected UP, got %s", result.Status)
@@ -133,7 +133,7 @@ func TestRunCheck_Port_Open(t *testing.T) {
_, portStr, _ := net.SplitHostPort(ln.Addr().String())
port, _ := strconv.Atoi(portStr)
site := models.Site{ID: 1, Type: "port", Hostname: "127.0.0.1", Port: port, Timeout: 2}
site := models.SiteConfig{ID: 1, Type: "port", Hostname: "127.0.0.1", Port: port, Timeout: 2}
result := RunCheck(context.Background(), site, nil, nil, false, true)
if result.Status != "UP" {
@@ -153,7 +153,7 @@ func TestRunCheck_Port_Closed(t *testing.T) {
port, _ := strconv.Atoi(portStr)
ln.Close()
site := models.Site{ID: 1, Type: "port", Hostname: "127.0.0.1", Port: port, Timeout: 1}
site := models.SiteConfig{ID: 1, Type: "port", Hostname: "127.0.0.1", Port: port, Timeout: 1}
result := RunCheck(context.Background(), site, nil, nil, false, true)
if result.Status != "DOWN" {
@@ -161,6 +161,43 @@ func TestRunCheck_Port_Closed(t *testing.T) {
}
}
func TestRunPortCheck_UsesPinnedIP(t *testing.T) {
ln, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatal(err)
}
defer ln.Close()
_, portStr, _ := net.SplitHostPort(ln.Addr().String())
port, _ := strconv.Atoi(portStr)
// Pass a pinned IP — runPortCheck should dial it instead of resolving Hostname.
site := models.SiteConfig{ID: 1, Type: "port", Hostname: "will-not-resolve.invalid", Port: port, Timeout: 2}
result := runPortCheck(context.Background(), site, net.ParseIP("127.0.0.1"))
if result.Status != "UP" {
t.Errorf("expected UP when pinned IP used, got %s: %s", result.Status, result.ErrorReason)
}
}
func TestRunPortCheck_NilPinnedIP_UsesHostname(t *testing.T) {
ln, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatal(err)
}
defer ln.Close()
_, portStr, _ := net.SplitHostPort(ln.Addr().String())
port, _ := strconv.Atoi(portStr)
site := models.SiteConfig{ID: 1, Type: "port", Hostname: "127.0.0.1", Port: port, Timeout: 2}
result := runPortCheck(context.Background(), site, nil)
if result.Status != "UP" {
t.Errorf("expected UP with nil pinnedIP fallback, got %s: %s", result.Status, result.ErrorReason)
}
}
func TestRunCheck_Port_BlocksPrivateByDefault(t *testing.T) {
ln, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
@@ -171,8 +208,8 @@ func TestRunCheck_Port_BlocksPrivateByDefault(t *testing.T) {
_, portStr, _ := net.SplitHostPort(ln.Addr().String())
port, _ := strconv.Atoi(portStr)
site := models.Site{ID: 1, Type: "port", Hostname: "127.0.0.1", Port: port, Timeout: 2}
result := RunCheck(context.Background(), site, nil, nil, false)
site := models.SiteConfig{ID: 1, Type: "port", Hostname: "127.0.0.1", Port: port, Timeout: 2}
result := RunCheck(context.Background(), site, nil, nil, false, false)
if result.Status != "DOWN" {
t.Errorf("expected DOWN when private targets blocked, got %s", result.Status)
@@ -180,8 +217,8 @@ func TestRunCheck_Port_BlocksPrivateByDefault(t *testing.T) {
}
func TestRunCheck_UnknownType(t *testing.T) {
site := models.Site{ID: 1, Type: "invalid"}
result := RunCheck(context.Background(), site, nil, nil, false)
site := models.SiteConfig{ID: 1, Type: "invalid"}
result := RunCheck(context.Background(), site, nil, nil, false, false)
if result.Status != "DOWN" {
t.Errorf("expected DOWN for unknown type, got %s", result.Status)
@@ -214,10 +251,10 @@ func TestIsCodeAccepted(t *testing.T) {
}
func TestSiteTimeout(t *testing.T) {
if got := siteTimeout(models.Site{Timeout: 0}); got != 5*time.Second {
if got := siteTimeout(models.SiteConfig{Timeout: 0}); got != 5*time.Second {
t.Errorf("expected 5s default, got %v", got)
}
if got := siteTimeout(models.Site{Timeout: 10}); got != 10*time.Second {
if got := siteTimeout(models.SiteConfig{Timeout: 10}); got != 10*time.Second {
t.Errorf("expected 10s, got %v", got)
}
}
+48 -29
View File
@@ -115,10 +115,14 @@ func newEngine(s store.Store, allowPrivateTargets bool) *Engine {
}
}
// SetInsecureSkipVerify must be called before Start: the field is read by
// checker goroutines without synchronization.
func (e *Engine) SetInsecureSkipVerify(skip bool) {
e.insecureSkipVerify = skip
}
// SetMaintRetention must be called before Start: the field is read by the
// maintenance prune goroutine without synchronization.
func (e *Engine) SetMaintRetention(d time.Duration) {
e.maintRetention = d
}
@@ -375,6 +379,8 @@ func (e *Engine) RecordHeartbeat(token string) bool {
go e.triggerAlert(alertID, "✅ RECOVERY", fmt.Sprintf("Push Monitor '%s' is receiving heartbeats.%s", name, downDur))
}
e.recordCheck(targetID, 0, true)
if prevStatus != models.StatusUp && prevStatus != models.StatusPending {
e.enqueueWrite(writeStateChange{siteID: targetID, fromStatus: string(prevStatus), toStatus: string(models.StatusUp)})
}
@@ -418,7 +424,7 @@ func (e *Engine) Start(ctx context.Context) {
e.refreshMaintenanceCache(ctx)
sites, err := e.db.GetSites(ctx)
configs, err := e.db.GetSites(ctx)
if err != nil {
e.AddLog(fmt.Sprintf("Failed to load sites: %v", err))
select {
@@ -428,34 +434,51 @@ func (e *Engine) Start(ctx context.Context) {
}
continue
}
for _, s := range sites {
dbIDs := make(map[int]bool, len(configs))
for _, cfg := range configs {
dbIDs[cfg.ID] = true
e.mu.RLock()
_, exists := e.liveState[s.ID]
existing, exists := e.liveState[cfg.ID]
e.mu.RUnlock()
if !exists {
e.mu.Lock()
s.Status = models.StatusPending
if h, ok := e.GetHistory(s.ID); ok && len(h.Statuses) > 0 {
site := models.Site{SiteConfig: cfg, SiteState: models.SiteState{Status: models.StatusPending}}
if h, ok := e.GetHistory(cfg.ID); ok && len(h.Statuses) > 0 {
if h.Statuses[len(h.Statuses)-1] {
s.Status = models.StatusUp
site.Status = models.StatusUp
} else {
s.Status = models.StatusDown
site.Status = models.StatusDown
}
if len(h.Latencies) > 0 {
s.Latency = h.Latencies[len(h.Latencies)-1]
site.Latency = h.Latencies[len(h.Latencies)-1]
}
}
e.liveState[s.ID] = s
e.addToTokenIndex(s)
e.liveState[cfg.ID] = site
e.addToTokenIndex(site)
e.mu.Unlock()
e.checkerWG.Add(1)
go func(id int) {
defer e.checkerWG.Done()
e.monitorRoutine(ctx, id)
}(s.ID)
}(cfg.ID)
} else if existing.SiteConfig != cfg {
e.UpdateSiteConfig(cfg)
}
}
e.mu.RLock()
var vanished []int
for id := range e.liveState {
if !dbIDs[id] {
vanished = append(vanished, id)
}
}
e.mu.RUnlock()
for _, id := range vanished {
e.RemoveSite(id)
e.AddLog(fmt.Sprintf("Monitor removed (no longer in DB): ID %d", id))
}
select {
case <-time.After(pollInterval):
case <-ctx.Done():
@@ -498,27 +521,17 @@ func (e *Engine) pruneMaintenanceWindows(ctx context.Context) {
}
}
func (e *Engine) UpdateSiteConfig(site models.Site) {
func (e *Engine) UpdateSiteConfig(cfg models.SiteConfig) {
e.mu.Lock()
if existing, ok := e.liveState[site.ID]; ok {
e.removeFromTokenIndex(site.ID)
site.Status = existing.Status
site.StatusCode = existing.StatusCode
site.Latency = existing.Latency
site.CertExpiry = existing.CertExpiry
site.HasSSL = existing.HasSSL
site.LastCheck = existing.LastCheck
site.SentSSLWarning = existing.SentSSLWarning
site.FailureCount = existing.FailureCount
site.LastError = existing.LastError
site.StatusChangedAt = existing.StatusChangedAt
site.LastSuccessAt = existing.LastSuccessAt
e.liveState[site.ID] = site
e.addToTokenIndex(site)
if existing, ok := e.liveState[cfg.ID]; ok {
e.removeFromTokenIndex(cfg.ID)
existing.SiteConfig = cfg
e.liveState[cfg.ID] = existing
e.addToTokenIndex(existing)
}
e.mu.Unlock()
e.signalRecheck(site.ID)
e.signalRecheck(cfg.ID)
}
func (e *Engine) getRecheckChan(id int) chan struct{} {
@@ -675,7 +688,7 @@ func (e *Engine) checkByID(ctx context.Context, id int) {
case "group":
e.checkGroup(ctx, site)
default:
result := RunCheck(ctx, site, e.strictClient, e.insecureClient, e.insecureSkipVerify, e.allowPrivateTargets)
result := RunCheck(ctx, site.SiteConfig, e.strictClient, e.insecureClient, e.insecureSkipVerify, e.allowPrivateTargets)
updatedSite := site
updatedSite.HasSSL = result.HasSSL
updatedSite.CertExpiry = result.CertExpiry
@@ -874,6 +887,9 @@ func (e *Engine) handleStatusChange(snap models.Site, rawStatus string, code int
}
func (e *Engine) triggerAlert(alertID int, title, message string) {
if alertID <= 0 {
return
}
cfg, err := e.db.GetAlert(context.Background(), alertID)
if err != nil {
e.AddLog(fmt.Sprintf("Failed to load alert config %d: %v", alertID, err))
@@ -1024,12 +1040,15 @@ func (e *Engine) checkGroup(_ context.Context, site models.Site) {
e.applyState(site.ID, func(s *models.Site) {
s.Status = status
})
e.recordCheck(site.ID, 0, !status.IsBroken())
}
func (e *Engine) EnqueueProbeCheck(siteID int, nodeID string, latencyNs int64, isUp bool) {
e.enqueueWrite(writeProbeCheck{siteID: siteID, nodeID: nodeID, latencyNs: latencyNs, isUp: isUp})
}
// SetAggStrategy must be called before Start: the field is read by the probe
// aggregation path without synchronization.
func (e *Engine) SetAggStrategy(strategy AggregationStrategy) {
e.aggStrategy = strategy
}
+211 -82
View File
@@ -22,7 +22,7 @@ type savedCheck struct {
type mockStore struct {
storetest.BaseMock
mu sync.Mutex
sites []models.Site
sites []models.SiteConfig
alerts map[int]models.AlertConfig
maintenance map[int]bool
logs []string
@@ -40,7 +40,7 @@ func newMockStore() *mockStore {
}
}
func (m *mockStore) GetSites(context.Context) ([]models.Site, error) { return m.sites, nil }
func (m *mockStore) GetSites(context.Context) ([]models.SiteConfig, error) { return m.sites, nil }
func (m *mockStore) GetActiveMaintenanceWindows(context.Context) ([]models.MaintenanceWindow, error) {
m.mu.Lock()
@@ -148,7 +148,10 @@ func (m *mockStore) getAlertCallsSnapshot() []int {
func TestHandleStatusChange_PendingToUp(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", Status: "PENDING", MaxRetries: 3, AlertID: 1}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", MaxRetries: 3, AlertID: 1},
SiteState: models.SiteState{Status: "PENDING"},
}
injectSite(e, site)
e.handleStatusChange(site, "UP", 200, 10*time.Millisecond, "")
@@ -169,7 +172,10 @@ func TestHandleStatusChange_PendingToUp(t *testing.T) {
func TestHandleStatusChange_UpIncrementFailure(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", Status: "UP", MaxRetries: 3, FailureCount: 0}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", MaxRetries: 3},
SiteState: models.SiteState{Status: "UP", FailureCount: 0},
}
injectSite(e, site)
e.handleStatusChange(site, "DOWN", 500, 0, "test error")
@@ -187,7 +193,10 @@ func TestHandleStatusChange_UpToDown_ExceedsRetries(t *testing.T) {
ms := newMockStore()
ms.alerts[1] = models.AlertConfig{ID: 1, Name: "discord", Type: "webhook", Settings: map[string]string{"url": "http://example.com"}}
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", Status: "UP", MaxRetries: 2, FailureCount: 2, AlertID: 1}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", MaxRetries: 2, AlertID: 1},
SiteState: models.SiteState{Status: "UP", FailureCount: 2},
}
injectSite(e, site)
e.handleStatusChange(site, "DOWN", 500, 0, "test error")
@@ -210,7 +219,10 @@ func TestHandleStatusChange_UpToDown_ZeroRetries(t *testing.T) {
ms := newMockStore()
ms.alerts[1] = models.AlertConfig{ID: 1, Name: "test", Type: "webhook", Settings: map[string]string{"url": "http://example.com"}}
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", Status: "UP", MaxRetries: 0, FailureCount: 0, AlertID: 1}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", MaxRetries: 0, AlertID: 1},
SiteState: models.SiteState{Status: "UP", FailureCount: 0},
}
injectSite(e, site)
e.handleStatusChange(site, "DOWN", 0, 0, "test error")
@@ -229,7 +241,10 @@ func TestHandleStatusChange_DownToUp_Recovery(t *testing.T) {
ms := newMockStore()
ms.alerts[1] = models.AlertConfig{ID: 1, Name: "test", Type: "webhook", Settings: map[string]string{"url": "http://example.com"}}
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", Status: "DOWN", FailureCount: 4, AlertID: 1}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", AlertID: 1},
SiteState: models.SiteState{Status: "DOWN", FailureCount: 4},
}
injectSite(e, site)
e.handleStatusChange(site, "UP", 200, 5*time.Millisecond, "")
@@ -250,7 +265,10 @@ func TestHandleStatusChange_DownToUp_Recovery(t *testing.T) {
func TestHandleStatusChange_DownStaysDown(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", Status: "DOWN", MaxRetries: 2, FailureCount: 3}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", MaxRetries: 2},
SiteState: models.SiteState{Status: "DOWN", FailureCount: 3},
}
injectSite(e, site)
e.handleStatusChange(site, "DOWN", 0, 0, "test error")
@@ -269,7 +287,10 @@ func TestHandleStatusChange_SSLExpired(t *testing.T) {
ms := newMockStore()
ms.alerts[1] = models.AlertConfig{ID: 1, Name: "test", Type: "webhook", Settings: map[string]string{"url": "http://example.com"}}
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", Status: "UP", MaxRetries: 0, AlertID: 1}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", MaxRetries: 0, AlertID: 1},
SiteState: models.SiteState{Status: "UP"},
}
injectSite(e, site)
e.handleStatusChange(site, "SSL EXP", 0, 0, "SSL certificate expired")
@@ -289,7 +310,10 @@ func TestHandleStatusChange_AlertSuppressedMaintenance(t *testing.T) {
ms.maintenance[1] = true
ms.alerts[1] = models.AlertConfig{ID: 1, Name: "test", Type: "webhook", Settings: map[string]string{"url": "http://example.com"}}
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", Status: "UP", MaxRetries: 0, AlertID: 1}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", MaxRetries: 0, AlertID: 1},
SiteState: models.SiteState{Status: "UP"},
}
injectSite(e, site)
e.refreshMaintenanceCache(context.Background())
@@ -321,7 +345,10 @@ func TestHandleStatusChange_RecoverySuppressedMaintenance(t *testing.T) {
ms.maintenance[1] = true
ms.alerts[1] = models.AlertConfig{ID: 1, Name: "test", Type: "webhook", Settings: map[string]string{"url": "http://example.com"}}
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", Status: "DOWN", AlertID: 1}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", AlertID: 1},
SiteState: models.SiteState{Status: "DOWN"},
}
injectSite(e, site)
e.refreshMaintenanceCache(context.Background())
@@ -342,10 +369,8 @@ func TestHandleStatusChange_SSLWarning(t *testing.T) {
ms.alerts[1] = models.AlertConfig{ID: 1, Name: "test", Type: "webhook", Settings: map[string]string{"url": "http://example.com"}}
e := newTestEngine(ms)
site := models.Site{
ID: 1, Name: "test", Status: "UP", Type: "http",
CheckSSL: true, HasSSL: true, ExpiryThreshold: 30,
SentSSLWarning: false, AlertID: 1,
CertExpiry: time.Now().Add(15 * 24 * time.Hour),
SiteConfig: models.SiteConfig{ID: 1, Name: "test", Type: "http", CheckSSL: true, ExpiryThreshold: 30, AlertID: 1},
SiteState: models.SiteState{Status: "UP", HasSSL: true, SentSSLWarning: false, CertExpiry: time.Now().Add(15 * 24 * time.Hour)},
}
injectSite(e, site)
@@ -365,10 +390,8 @@ func TestHandleStatusChange_SSLWarningNotRepeated(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{
ID: 1, Name: "test", Status: "UP", Type: "http",
CheckSSL: true, HasSSL: true, ExpiryThreshold: 30,
SentSSLWarning: true, AlertID: 1,
CertExpiry: time.Now().Add(15 * 24 * time.Hour),
SiteConfig: models.SiteConfig{ID: 1, Name: "test", Type: "http", CheckSSL: true, ExpiryThreshold: 30, AlertID: 1},
SiteState: models.SiteState{Status: "UP", HasSSL: true, SentSSLWarning: true, CertExpiry: time.Now().Add(15 * 24 * time.Hour)},
}
injectSite(e, site)
@@ -384,10 +407,8 @@ func TestHandleStatusChange_SSLWarningReset(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{
ID: 1, Name: "test", Status: "UP", Type: "http",
CheckSSL: true, HasSSL: true, ExpiryThreshold: 30,
SentSSLWarning: true,
CertExpiry: time.Now().Add(60 * 24 * time.Hour),
SiteConfig: models.SiteConfig{ID: 1, Name: "test", Type: "http", CheckSSL: true, ExpiryThreshold: 30},
SiteState: models.SiteState{Status: "UP", HasSSL: true, SentSSLWarning: true, CertExpiry: time.Now().Add(60 * 24 * time.Hour)},
}
injectSite(e, site)
@@ -405,10 +426,8 @@ func TestHandleStatusChange_SSLWarningSuppressedMaint(t *testing.T) {
ms.alerts[1] = models.AlertConfig{ID: 1, Name: "test", Type: "webhook", Settings: map[string]string{"url": "http://example.com"}}
e := newTestEngine(ms)
site := models.Site{
ID: 1, Name: "test", Status: "UP", Type: "http",
CheckSSL: true, HasSSL: true, ExpiryThreshold: 30,
SentSSLWarning: false, AlertID: 1,
CertExpiry: time.Now().Add(15 * 24 * time.Hour),
SiteConfig: models.SiteConfig{ID: 1, Name: "test", Type: "http", CheckSSL: true, ExpiryThreshold: 30, AlertID: 1},
SiteState: models.SiteState{Status: "UP", HasSSL: true, SentSSLWarning: false, CertExpiry: time.Now().Add(15 * 24 * time.Hour)},
}
injectSite(e, site)
e.refreshMaintenanceCache(context.Background())
@@ -428,7 +447,10 @@ func TestHandleStatusChange_SSLWarningSuppressedMaint(t *testing.T) {
func TestHandleStatusChange_InactiveEngine(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", Status: "UP", MaxRetries: 0}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", MaxRetries: 0},
SiteState: models.SiteState{Status: "UP"},
}
injectSite(e, site)
e.SetActive(false)
@@ -445,7 +467,10 @@ func TestHandleStatusChange_InactiveEngine(t *testing.T) {
func TestRecordHeartbeat_ValidToken(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "push-test", Type: "push", Token: "abc123", Status: "UP"}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "push-test", Type: "push", Token: "abc123"},
SiteState: models.SiteState{Status: "UP"},
}
injectSite(e, site)
if !e.RecordHeartbeat("abc123") {
@@ -465,7 +490,10 @@ func TestRecordHeartbeat_RecoveryFromDown(t *testing.T) {
ms := newMockStore()
ms.alerts[1] = models.AlertConfig{ID: 1, Name: "test", Type: "webhook", Settings: map[string]string{"url": "http://example.com"}}
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "push-test", Type: "push", Token: "abc123", Status: "DOWN", AlertID: 1, FailureCount: 3}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "push-test", Type: "push", Token: "abc123", AlertID: 1},
SiteState: models.SiteState{Status: "DOWN", FailureCount: 3},
}
injectSite(e, site)
if !e.RecordHeartbeat("abc123") {
@@ -497,7 +525,10 @@ func TestRecordHeartbeat_UnknownToken(t *testing.T) {
func TestRecordHeartbeat_InactiveEngine(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{ID: 1, Type: "push", Token: "abc123", Status: "UP"}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Type: "push", Token: "abc123"},
SiteState: models.SiteState{Status: "UP"},
}
injectSite(e, site)
e.SetActive(false)
@@ -512,9 +543,8 @@ func TestCheckPush_DeadlineMissed(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{
ID: 1, Name: "push", Type: "push", Status: "UP",
Interval: 10, MaxRetries: 0,
LastCheck: time.Now().Add(-120 * time.Second),
SiteConfig: models.SiteConfig{ID: 1, Name: "push", Type: "push", Interval: 10, MaxRetries: 0},
SiteState: models.SiteState{Status: "UP", LastCheck: time.Now().Add(-120 * time.Second)},
}
injectSite(e, site)
@@ -530,9 +560,8 @@ func TestCheckPush_OverdueBecomesLate(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{
ID: 1, Name: "push", Type: "push", Status: "UP",
Interval: 300,
LastCheck: time.Now().Add(-310 * time.Second),
SiteConfig: models.SiteConfig{ID: 1, Name: "push", Type: "push", Interval: 300},
SiteState: models.SiteState{Status: "UP", LastCheck: time.Now().Add(-310 * time.Second)},
}
injectSite(e, site)
@@ -550,9 +579,8 @@ func TestCheckPush_OverdueBecomesStale(t *testing.T) {
// interval=300, grace=150 (300/2), staleMark=overdue+75
// at 380s: past staleMark(375) but before graceEnd(450)
site := models.Site{
ID: 1, Name: "push", Type: "push", Status: "UP",
Interval: 300,
LastCheck: time.Now().Add(-380 * time.Second),
SiteConfig: models.SiteConfig{ID: 1, Name: "push", Type: "push", Interval: 300},
SiteState: models.SiteState{Status: "UP", LastCheck: time.Now().Add(-380 * time.Second)},
}
injectSite(e, site)
@@ -568,8 +596,8 @@ func TestCheckPush_WithinDeadline(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{
ID: 1, Name: "push", Type: "push", Status: "UP",
Interval: 60, LastCheck: time.Now(),
SiteConfig: models.SiteConfig{ID: 1, Name: "push", Type: "push", Interval: 60},
SiteState: models.SiteState{Status: "UP", LastCheck: time.Now()},
}
injectSite(e, site)
@@ -585,8 +613,8 @@ func TestCheckPush_PendingStaysPending(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{
ID: 1, Name: "push", Type: "push", Status: "PENDING",
Interval: 60,
SiteConfig: models.SiteConfig{ID: 1, Name: "push", Type: "push", Interval: 60},
SiteState: models.SiteState{Status: "PENDING"},
}
injectSite(e, site)
@@ -603,9 +631,18 @@ func TestCheckPush_PendingStaysPending(t *testing.T) {
func TestCheckGroup_AllChildrenUp(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
group := models.Site{ID: 1, Name: "group", Type: "group", Status: "PENDING"}
child1 := models.Site{ID: 2, Name: "child1", Type: "http", ParentID: 1, Status: "UP"}
child2 := models.Site{ID: 3, Name: "child2", Type: "http", ParentID: 1, Status: "UP"}
group := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "group", Type: "group"},
SiteState: models.SiteState{Status: "PENDING"},
}
child1 := models.Site{
SiteConfig: models.SiteConfig{ID: 2, Name: "child1", Type: "http", ParentID: 1},
SiteState: models.SiteState{Status: "UP"},
}
child2 := models.Site{
SiteConfig: models.SiteConfig{ID: 3, Name: "child2", Type: "http", ParentID: 1},
SiteState: models.SiteState{Status: "UP"},
}
injectSite(e, group)
injectSite(e, child1)
injectSite(e, child2)
@@ -621,9 +658,18 @@ func TestCheckGroup_AllChildrenUp(t *testing.T) {
func TestCheckGroup_OneChildDown(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
group := models.Site{ID: 1, Name: "group", Type: "group", Status: "UP"}
child1 := models.Site{ID: 2, Name: "child1", Type: "http", ParentID: 1, Status: "UP"}
child2 := models.Site{ID: 3, Name: "child2", Type: "http", ParentID: 1, Status: "DOWN"}
group := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "group", Type: "group"},
SiteState: models.SiteState{Status: "UP"},
}
child1 := models.Site{
SiteConfig: models.SiteConfig{ID: 2, Name: "child1", Type: "http", ParentID: 1},
SiteState: models.SiteState{Status: "UP"},
}
child2 := models.Site{
SiteConfig: models.SiteConfig{ID: 3, Name: "child2", Type: "http", ParentID: 1},
SiteState: models.SiteState{Status: "DOWN"},
}
injectSite(e, group)
injectSite(e, child1)
injectSite(e, child2)
@@ -639,9 +685,17 @@ func TestCheckGroup_OneChildDown(t *testing.T) {
func TestCheckGroup_PausedChildIgnored(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
group := models.Site{ID: 1, Name: "group", Type: "group"}
child1 := models.Site{ID: 2, Name: "child1", Type: "http", ParentID: 1, Status: "UP"}
child2 := models.Site{ID: 3, Name: "child2", Type: "http", ParentID: 1, Status: "DOWN", Paused: true}
group := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "group", Type: "group"},
}
child1 := models.Site{
SiteConfig: models.SiteConfig{ID: 2, Name: "child1", Type: "http", ParentID: 1},
SiteState: models.SiteState{Status: "UP"},
}
child2 := models.Site{
SiteConfig: models.SiteConfig{ID: 3, Name: "child2", Type: "http", ParentID: 1, Paused: true},
SiteState: models.SiteState{Status: "DOWN"},
}
injectSite(e, group)
injectSite(e, child1)
injectSite(e, child2)
@@ -658,9 +712,17 @@ func TestCheckGroup_MaintenanceChildIgnored(t *testing.T) {
ms := newMockStore()
ms.maintenance[3] = true
e := newTestEngine(ms)
group := models.Site{ID: 1, Name: "group", Type: "group"}
child1 := models.Site{ID: 2, Name: "child1", Type: "http", ParentID: 1, Status: "UP"}
child2 := models.Site{ID: 3, Name: "child2", Type: "http", ParentID: 1, Status: "DOWN"}
group := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "group", Type: "group"},
}
child1 := models.Site{
SiteConfig: models.SiteConfig{ID: 2, Name: "child1", Type: "http", ParentID: 1},
SiteState: models.SiteState{Status: "UP"},
}
child2 := models.Site{
SiteConfig: models.SiteConfig{ID: 3, Name: "child2", Type: "http", ParentID: 1},
SiteState: models.SiteState{Status: "DOWN"},
}
injectSite(e, group)
injectSite(e, child1)
injectSite(e, child2)
@@ -677,7 +739,10 @@ func TestCheckGroup_MaintenanceChildIgnored(t *testing.T) {
func TestCheckGroup_NoChildren(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
group := models.Site{ID: 1, Name: "group", Type: "group", Status: "UP"}
group := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "group", Type: "group"},
SiteState: models.SiteState{Status: "UP"},
}
injectSite(e, group)
e.checkGroup(context.Background(), group)
@@ -772,10 +837,13 @@ func TestInitHistory_LoadsFromDB(t *testing.T) {
func TestUpdateSiteConfig_PreservesRuntime(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", URL: "http://old.com", Status: "DOWN", FailureCount: 3, Latency: 100 * time.Millisecond}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", URL: "http://old.com"},
SiteState: models.SiteState{Status: "DOWN", FailureCount: 3, Latency: 100 * time.Millisecond},
}
injectSite(e, site)
updated := models.Site{ID: 1, Name: "test", URL: "http://new.com", Interval: 60}
updated := models.SiteConfig{ID: 1, Name: "test", URL: "http://new.com", Interval: 60}
e.UpdateSiteConfig(updated)
s, _ := getSite(e, 1)
@@ -796,7 +864,10 @@ func TestUpdateSiteConfig_PreservesRuntime(t *testing.T) {
func TestRemoveSite_CleansUp(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", Type: "push", Token: "tok1", Status: "UP"}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", Type: "push", Token: "tok1"},
SiteState: models.SiteState{Status: "UP"},
}
injectSite(e, site)
e.recordCheck(1, 5*time.Millisecond, true)
@@ -816,7 +887,10 @@ func TestRemoveSite_CleansUp(t *testing.T) {
func TestToggleSitePause(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", Status: "UP"}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test"},
SiteState: models.SiteState{Status: "UP"},
}
injectSite(e, site)
paused := e.ToggleSitePause(1)
@@ -845,8 +919,14 @@ func TestToggleSitePause_NonexistentSite(t *testing.T) {
func TestGetAllSites_ReturnsCopy(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
injectSite(e, models.Site{ID: 1, Name: "s1", Status: "UP"})
injectSite(e, models.Site{ID: 2, Name: "s2", Status: "DOWN"})
injectSite(e, models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "s1"},
SiteState: models.SiteState{Status: "UP"},
})
injectSite(e, models.Site{
SiteConfig: models.SiteConfig{ID: 2, Name: "s2"},
SiteState: models.SiteState{Status: "DOWN"},
})
sites := e.GetAllSites()
if len(sites) != 2 {
@@ -865,10 +945,13 @@ func TestGetAllSites_ReturnsCopy(t *testing.T) {
func TestGetLiveState_ReturnsCopy(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
injectSite(e, models.Site{ID: 1, Name: "s1", Status: "UP"})
injectSite(e, models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "s1"},
SiteState: models.SiteState{Status: "UP"},
})
state := e.GetLiveState()
state[1] = models.Site{Name: "mutated"}
state[1] = models.Site{SiteConfig: models.SiteConfig{Name: "mutated"}}
fresh := e.GetLiveState()
if fresh[1].Name == "mutated" {
@@ -984,7 +1067,8 @@ func TestConcurrent_RecordHeartbeat(t *testing.T) {
e := newTestEngine(ms)
for i := 0; i < 10; i++ {
injectSite(e, models.Site{
ID: i + 1, Type: "push", Token: fmt.Sprintf("tok-%d", i+1), Status: "UP",
SiteConfig: models.SiteConfig{ID: i + 1, Type: "push", Token: fmt.Sprintf("tok-%d", i+1)},
SiteState: models.SiteState{Status: "UP"},
})
}
@@ -1002,7 +1086,10 @@ func TestConcurrent_RecordHeartbeat(t *testing.T) {
func TestConcurrent_HandleStatusChangeAndGetState(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", Status: "UP", MaxRetries: 100}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", MaxRetries: 100},
SiteState: models.SiteState{Status: "UP"},
}
injectSite(e, site)
var wg sync.WaitGroup
@@ -1055,7 +1142,10 @@ func TestConcurrent_RecordCheckAndGetHistory(t *testing.T) {
func TestHandleStatusChange_PauseDuringCheckSurvives(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", Status: "UP", MaxRetries: 0}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", MaxRetries: 0},
SiteState: models.SiteState{Status: "UP"},
}
injectSite(e, site)
// `site` is the stale snapshot the check ran against (Paused=false).
@@ -1079,11 +1169,14 @@ func TestHandleStatusChange_PauseDuringCheckSurvives(t *testing.T) {
func TestHandleStatusChange_ConfigEditDuringCheckSurvives(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", URL: "http://old.com", Type: "http", Status: "UP", MaxRetries: 0, Interval: 30}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", URL: "http://old.com", Type: "http", MaxRetries: 0, Interval: 30},
SiteState: models.SiteState{Status: "UP"},
}
injectSite(e, site)
// Config changes mid-check.
e.UpdateSiteConfig(models.Site{ID: 1, Name: "test", URL: "http://new.com", Type: "http", Interval: 60})
e.UpdateSiteConfig(models.SiteConfig{ID: 1, Name: "test", URL: "http://new.com", Type: "http", Interval: 60})
// Stale check (ran against http://old.com) folds its result in.
e.handleStatusChange(site, "UP", 200, 5*time.Millisecond, "")
@@ -1105,7 +1198,10 @@ func TestHandleStatusChange_HeartbeatNotOverwrittenByStaleDown(t *testing.T) {
e := newTestEngine(ms)
// Snapshot the engine would have taken before evaluating staleness:
// LastCheck is old, so checkPush decided "DOWN".
snap := models.Site{ID: 1, Name: "push", Type: "push", Token: "tok", Status: "UP", Interval: 10, LastCheck: time.Now().Add(-120 * time.Second)}
snap := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "push", Type: "push", Token: "tok", Interval: 10},
SiteState: models.SiteState{Status: "UP", LastCheck: time.Now().Add(-120 * time.Second)},
}
injectSite(e, snap)
// A heartbeat lands first, advancing LastCheck and confirming UP.
@@ -1126,7 +1222,10 @@ func TestHandleStatusChange_HeartbeatNotOverwrittenByStaleDown(t *testing.T) {
func TestHandleStatusChange_RemovedSiteDropped(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", Status: "UP", MaxRetries: 0}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", MaxRetries: 0},
SiteState: models.SiteState{Status: "UP"},
}
injectSite(e, site)
e.RemoveSite(1)
@@ -1189,9 +1288,18 @@ func TestEngineStop_Idempotent(t *testing.T) {
func TestCheckGroup_AllPausedNoAutoFreeze(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
group := models.Site{ID: 1, Name: "group", Type: "group", Status: "UP"}
child1 := models.Site{ID: 2, Name: "child1", Type: "http", ParentID: 1, Status: "UP", Paused: true}
child2 := models.Site{ID: 3, Name: "child2", Type: "http", ParentID: 1, Status: "UP", Paused: true}
group := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "group", Type: "group"},
SiteState: models.SiteState{Status: "UP"},
}
child1 := models.Site{
SiteConfig: models.SiteConfig{ID: 2, Name: "child1", Type: "http", ParentID: 1, Paused: true},
SiteState: models.SiteState{Status: "UP"},
}
child2 := models.Site{
SiteConfig: models.SiteConfig{ID: 3, Name: "child2", Type: "http", ParentID: 1, Paused: true},
SiteState: models.SiteState{Status: "UP"},
}
injectSite(e, group)
injectSite(e, child1)
injectSite(e, child2)
@@ -1208,7 +1316,10 @@ func TestCheckGroup_AllPausedNoAutoFreeze(t *testing.T) {
func TestHandleStatusChange_PendingRetriesBeforeDown(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "new-monitor", Status: "PENDING", MaxRetries: 2}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "new-monitor", MaxRetries: 2},
SiteState: models.SiteState{Status: "PENDING"},
}
injectSite(e, site)
e.handleStatusChange(site, "DOWN", 0, 0, "timeout")
@@ -1237,7 +1348,10 @@ func TestHandleStatusChange_PendingRetriesBeforeDown(t *testing.T) {
func TestHandleStatusChange_LateRetriesBeforeDown(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "push-mon", Status: "LATE", MaxRetries: 1}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "push-mon", MaxRetries: 1},
SiteState: models.SiteState{Status: "LATE"},
}
injectSite(e, site)
e.handleStatusChange(site, "DOWN", 0, 0, "missed heartbeat")
@@ -1257,7 +1371,10 @@ func TestHandleStatusChange_LateRetriesBeforeDown(t *testing.T) {
func TestIngestProbeResult_ExpiresStaleProbes(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", Type: "http", Status: "UP", Interval: 30}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", Type: "http", Interval: 30},
SiteState: models.SiteState{Status: "UP"},
}
injectSite(e, site)
e.probeResultsMu.Lock()
@@ -1289,7 +1406,10 @@ func TestIngestProbeResult_ExpiresStaleProbes(t *testing.T) {
func TestRemoveSite_CleansProbeResults(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", Type: "http", Status: "UP"}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", Type: "http"},
SiteState: models.SiteState{Status: "UP"},
}
injectSite(e, site)
e.probeResultsMu.Lock()
@@ -1312,8 +1432,14 @@ func TestIsInMaintenance_UsesCache(t *testing.T) {
ms := newMockStore()
ms.maintenance[10] = true // direct maintenance on group
e := newTestEngine(ms)
group := models.Site{ID: 10, Name: "group", Type: "group", Status: "UP"}
child := models.Site{ID: 20, Name: "child", Type: "http", ParentID: 10, Status: "UP"}
group := models.Site{
SiteConfig: models.SiteConfig{ID: 10, Name: "group", Type: "group"},
SiteState: models.SiteState{Status: "UP"},
}
child := models.Site{
SiteConfig: models.SiteConfig{ID: 20, Name: "child", Type: "http", ParentID: 10},
SiteState: models.SiteState{Status: "UP"},
}
injectSite(e, group)
injectSite(e, child)
e.refreshMaintenanceCache(context.Background())
@@ -1334,7 +1460,10 @@ func TestIsInMaintenance_GlobalMaintenance(t *testing.T) {
ms := newMockStore()
ms.maintenance[0] = true
e := newTestEngine(ms)
site := models.Site{ID: 1, Name: "test", Type: "http", Status: "UP"}
site := models.Site{
SiteConfig: models.SiteConfig{ID: 1, Name: "test", Type: "http"},
SiteState: models.SiteState{Status: "UP"},
}
injectSite(e, site)
e.refreshMaintenanceCache(context.Background())
+5
View File
@@ -11,9 +11,11 @@ var privateRanges []*net.IPNet
func init() {
cidrs := []string{
"0.0.0.0/8",
"127.0.0.0/8",
"::1/128",
"10.0.0.0/8",
"100.64.0.0/10",
"172.16.0.0/12",
"192.168.0.0/16",
"169.254.0.0/16",
@@ -27,6 +29,9 @@ func init() {
}
func isPrivateIP(ip net.IP) bool {
if ip.IsUnspecified() || ip.IsMulticast() || ip.IsLoopback() {
return true
}
for _, network := range privateRanges {
if network.Contains(ip) {
return true
+2 -5
View File
@@ -131,14 +131,11 @@ func ComputeDailyBreakdown(changes []models.StateChange, currentStatus models.St
reports := make([]DayReport, days)
for i := 0; i < days; i++ {
dayEnd := time.Date(now.Year(), now.Month(), now.Day(), 0, 0, 0, 0, now.Location()).Add(-time.Duration(i) * 24 * time.Hour)
dayStart := time.Date(now.Year(), now.Month(), now.Day()-i, 0, 0, 0, 0, now.Location())
dayEnd := time.Date(now.Year(), now.Month(), now.Day()-i+1, 0, 0, 0, 0, now.Location())
if i == 0 {
dayEnd = now
}
dayStart := time.Date(now.Year(), now.Month(), now.Day(), 0, 0, 0, 0, now.Location()).Add(-time.Duration(i) * 24 * time.Hour)
if i > 0 {
dayEnd = dayStart.Add(24 * time.Hour)
}
windowChanges := filterChangesForWindow(changes, dayStart, dayEnd)
+19 -7
View File
@@ -25,6 +25,7 @@ type RateLimiter struct {
rate float64
burst float64
trusted []*net.IPNet
stop chan struct{}
}
func NewRateLimiter(requestsPerMinute int, trusted []*net.IPNet) *RateLimiter {
@@ -33,11 +34,16 @@ func NewRateLimiter(requestsPerMinute int, trusted []*net.IPNet) *RateLimiter {
rate: float64(requestsPerMinute) / 60.0,
burst: float64(requestsPerMinute),
trusted: trusted,
stop: make(chan struct{}),
}
go rl.cleanup()
return rl
}
func (rl *RateLimiter) Stop() {
close(rl.stop)
}
func (rl *RateLimiter) Allow(ip string) bool {
rl.mu.Lock()
defer rl.mu.Unlock()
@@ -84,16 +90,22 @@ func (rl *RateLimiter) evictOldest() {
}
func (rl *RateLimiter) cleanup() {
ticker := time.NewTicker(5 * time.Minute)
defer ticker.Stop()
for {
time.Sleep(5 * time.Minute)
rl.mu.Lock()
cutoff := time.Now().Add(-10 * time.Minute)
for ip, v := range rl.visitors {
if v.lastSeen.Before(cutoff) {
delete(rl.visitors, ip)
select {
case <-ticker.C:
rl.mu.Lock()
cutoff := time.Now().Add(-10 * time.Minute)
for ip, v := range rl.visitors {
if v.lastSeen.Before(cutoff) {
delete(rl.visitors, ip)
}
}
rl.mu.Unlock()
case <-rl.stop:
return
}
rl.mu.Unlock()
}
}
+463 -423
View File
@@ -5,7 +5,7 @@ import (
"encoding/json"
"fmt"
"html/template"
"log"
"log/slog"
"net"
"net/http"
"sort"
@@ -21,6 +21,395 @@ import (
const maxRequestBody = 1 << 20
type ServerConfig struct {
Port int
EnableStatus bool
Title string
ClusterKey string
TLSCert string
TLSKey string
ClusterMode string
MetricsPublic bool
CORSOrigin string
TrustedProxies []*net.IPNet
QuietHTTPLog bool
}
type Server struct {
cfg ServerConfig
store store.Store
eng *monitor.Engine
pushRL *RateLimiter
probeRL *RateLimiter
backupRL *RateLimiter
statusRL *RateLimiter
}
func NewServer(cfg ServerConfig, s store.Store, eng *monitor.Engine) *Server {
return &Server{
cfg: cfg,
store: s,
eng: eng,
pushRL: NewRateLimiter(60, cfg.TrustedProxies),
probeRL: NewRateLimiter(30, cfg.TrustedProxies),
backupRL: NewRateLimiter(10, cfg.TrustedProxies),
statusRL: NewRateLimiter(120, cfg.TrustedProxies),
}
}
func Start(cfg ServerConfig, s store.Store, eng *monitor.Engine) *http.Server {
srv := NewServer(cfg, s, eng)
return srv.Start()
}
func (s *Server) Start() *http.Server {
if s.cfg.ClusterKey == "" {
slog.Warn("no UPTOP_CLUSTER_SECRET set, cluster API endpoints will reject all requests")
}
if s.cfg.ClusterMode != "" && s.cfg.ClusterMode != "leader" && s.cfg.TLSCert == "" {
slog.Warn("cluster mode active without TLS, secrets transmitted in cleartext")
}
handler := s.routes()
addr := fmt.Sprintf(":%d", s.cfg.Port)
httpSrv := &http.Server{
Addr: addr,
Handler: handler,
ReadHeaderTimeout: 10 * time.Second,
ReadTimeout: 30 * time.Second,
WriteTimeout: 60 * time.Second,
IdleTimeout: 120 * time.Second,
}
go func() {
if s.cfg.TLSCert != "" && s.cfg.TLSKey != "" {
slog.Info("HTTPS server listening", "addr", addr)
if err := httpSrv.ListenAndServeTLS(s.cfg.TLSCert, s.cfg.TLSKey); err != nil && err != http.ErrServerClosed {
slog.Error("HTTPS server failed", "err", err)
}
} else {
slog.Info("HTTP server listening", "addr", addr)
if err := httpSrv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
slog.Error("HTTP server failed", "err", err)
}
}
}()
return httpSrv
}
func (s *Server) routes() http.Handler {
mux := http.NewServeMux()
mux.HandleFunc("/api/push", RateLimit(s.pushRL, s.handlePush))
mux.HandleFunc("/api/health", s.handleHealth)
mux.HandleFunc("/api/backup/export", RateLimit(s.backupRL, s.handleExport))
mux.HandleFunc("/api/backup/import", RateLimit(s.backupRL, s.handleImport))
mux.HandleFunc("/api/import/kuma", RateLimit(s.backupRL, s.handleKumaImport))
mux.HandleFunc("/api/probe/register", RateLimit(s.probeRL, s.handleProbeRegister))
mux.HandleFunc("/api/probe/assignments", RateLimit(s.probeRL, s.handleProbeAssignments))
mux.HandleFunc("/api/probe/results", RateLimit(s.probeRL, s.handleProbeResults))
mux.HandleFunc("/metrics", s.handleMetrics)
if s.cfg.EnableStatus {
mux.HandleFunc("/status", RateLimit(s.statusRL, s.handleStatus))
mux.HandleFunc("/status/json", RateLimit(s.statusRL, s.handleStatusJSON))
}
handler := securityHeadersMiddleware(mux)
if !s.cfg.QuietHTTPLog {
handler = loggingMiddleware(s.cfg.TrustedProxies, handler)
}
if s.cfg.TLSCert != "" {
handler = hstsMiddleware(handler)
}
return handler
}
func (s *Server) requireAuth(r *http.Request) bool {
return s.cfg.ClusterKey != "" && checkSecret(r.Header.Get("X-Uptop-Secret"), s.cfg.ClusterKey)
}
func (s *Server) handlePush(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet && r.Method != http.MethodPost {
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
return
}
token := extractBearerToken(r)
if token == "" {
if qt := r.URL.Query().Get("token"); qt != "" {
token = qt
slog.Warn("push token in query string is deprecated, use Authorization: Bearer header")
}
}
if token == "" {
http.Error(w, "Missing token", http.StatusBadRequest)
return
}
if s.eng.RecordHeartbeat(token) {
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte("OK"))
} else {
http.Error(w, "Invalid Token", http.StatusNotFound)
}
}
func (s *Server) handleHealth(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
return
}
if s.cfg.ClusterKey != "" && !checkSecret(r.Header.Get("X-Uptop-Secret"), s.cfg.ClusterKey) {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte("OK"))
}
func (s *Server) handleExport(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
return
}
if !s.requireAuth(r) {
http.Error(w, "Unauthorized: UPTOP_CLUSTER_SECRET required", http.StatusUnauthorized)
return
}
data, err := s.store.ExportData(r.Context())
if err != nil {
slog.Error("export failed", "err", err)
http.Error(w, "Export failed", http.StatusInternalServerError)
return
}
if r.URL.Query().Get("redact_secrets") != "false" {
for i := range data.Alerts {
data.Alerts[i].Settings = models.RedactAlertSettings(data.Alerts[i].Type, data.Alerts[i].Settings)
}
}
_ = json.NewEncoder(w).Encode(data) //nolint:errcheck
}
func (s *Server) handleImport(w http.ResponseWriter, r *http.Request) {
if r.Method != "POST" {
http.Error(w, "POST required", http.StatusMethodNotAllowed)
return
}
if !s.requireAuth(r) {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
r.Body = http.MaxBytesReader(w, r.Body, maxRequestBody)
var data models.Backup
if err := json.NewDecoder(r.Body).Decode(&data); err != nil {
http.Error(w, "Invalid JSON", http.StatusBadRequest)
return
}
// API import never modifies users — cluster-secret holder shouldn't be
// able to replace admin accounts. CLI restore still does full import.
data.Users = nil
if err := s.store.ImportData(r.Context(), data); err != nil {
slog.Error("import failed", "err", err)
http.Error(w, "Import failed", http.StatusInternalServerError)
return
}
_, _ = w.Write([]byte("Import Successful (users excluded — manage via CLI or UPTOP_KEYS)"))
}
func (s *Server) handleKumaImport(w http.ResponseWriter, r *http.Request) {
if r.Method != "POST" {
http.Error(w, "POST required", http.StatusMethodNotAllowed)
return
}
if !s.requireAuth(r) {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
r.Body = http.MaxBytesReader(w, r.Body, maxRequestBody)
var kb importer.KumaBackup
if err := json.NewDecoder(r.Body).Decode(&kb); err != nil {
slog.Error("invalid Kuma JSON", "err", err)
http.Error(w, "Invalid Kuma JSON", http.StatusBadRequest)
return
}
backup := importer.ConvertKuma(&kb)
if err := s.store.ImportData(r.Context(), backup); err != nil {
slog.Error("Kuma import failed", "err", err)
http.Error(w, "Import failed", http.StatusInternalServerError)
return
}
fmt.Fprintf(w, "Imported %d monitors, %d alerts from Kuma v%s", len(backup.Sites), len(backup.Alerts), kb.Version)
}
func (s *Server) handleProbeRegister(w http.ResponseWriter, r *http.Request) {
if r.Method != "POST" {
http.Error(w, "POST required", http.StatusMethodNotAllowed)
return
}
if !s.requireAuth(r) {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
r.Body = http.MaxBytesReader(w, r.Body, maxRequestBody)
var req struct {
ID string `json:"id"`
Name string `json:"name"`
Region string `json:"region"`
Version string `json:"version"`
}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, "Invalid JSON", http.StatusBadRequest)
return
}
if req.ID == "" {
http.Error(w, "id is required", http.StatusBadRequest)
return
}
if err := s.store.RegisterNode(r.Context(), models.ProbeNode{
ID: req.ID, Name: req.Name, Region: req.Region, Version: req.Version,
}); err != nil {
slog.Error("probe registration failed", "err", err)
http.Error(w, "Registration failed", http.StatusInternalServerError)
return
}
_ = json.NewEncoder(w).Encode(map[string]bool{"ok": true}) //nolint:errcheck
}
func (s *Server) handleProbeAssignments(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
return
}
if !s.requireAuth(r) {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
nodeID := r.URL.Query().Get("node_id")
var nodeRegion string
if nodeID != "" {
if node, err := s.store.GetNode(r.Context(), nodeID); err == nil {
nodeRegion = node.Region
}
}
sites := s.eng.GetAllSites()
var assigned []models.Site
for _, site := range sites {
if site.Paused || site.Type == "push" || site.Type == "group" {
continue
}
if site.Regions != "" && nodeRegion != "" {
matched := false
for _, reg := range strings.Split(site.Regions, ",") {
if strings.TrimSpace(reg) == nodeRegion {
matched = true
break
}
}
if !matched {
continue
}
}
assigned = append(assigned, site)
}
w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode(map[string][]models.Site{"sites": assigned}) //nolint:errcheck
}
func (s *Server) handleProbeResults(w http.ResponseWriter, r *http.Request) {
if r.Method != "POST" {
http.Error(w, "POST required", http.StatusMethodNotAllowed)
return
}
if !s.requireAuth(r) {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
r.Body = http.MaxBytesReader(w, r.Body, maxRequestBody)
var req struct {
NodeID string `json:"node_id"`
Results []struct {
SiteID int `json:"site_id"`
LatencyNs int64 `json:"latency_ns"`
IsUp bool `json:"is_up"`
ErrorReason string `json:"error_reason"`
} `json:"results"`
}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, "Invalid JSON", http.StatusBadRequest)
return
}
if req.NodeID == "" {
http.Error(w, "node_id is required", http.StatusBadRequest)
return
}
for _, result := range req.Results {
s.eng.EnqueueProbeCheck(result.SiteID, req.NodeID, result.LatencyNs, result.IsUp)
s.eng.IngestProbeResult(req.NodeID, result.SiteID, result.LatencyNs, result.IsUp, result.ErrorReason)
}
if err := s.store.UpdateNodeLastSeen(r.Context(), req.NodeID); err != nil {
slog.Error("node last-seen update failed", "err", err)
}
_ = json.NewEncoder(w).Encode(map[string]bool{"ok": true}) //nolint:errcheck
}
func (s *Server) handleMetrics(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
return
}
if !s.cfg.MetricsPublic {
if !s.requireAuth(r) {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
}
metrics.Handler(s.eng)(w, r)
}
func (s *Server) handleStatus(w http.ResponseWriter, _ *http.Request) {
renderStatusPage(w, s.cfg.Title, s.eng)
}
func (s *Server) handleStatusJSON(w http.ResponseWriter, r *http.Request) {
state := s.eng.GetLiveState()
activeWindows, _ := s.store.GetActiveMaintenanceWindows(r.Context())
maintSet := make(map[int]bool)
allInMaint := false
for _, mw := range activeWindows {
if mw.Type != "maintenance" {
continue
}
if mw.MonitorID == 0 {
allInMaint = true
} else {
maintSet[mw.MonitorID] = true
}
}
public := make(map[int]statusSite, len(state))
for id, site := range state {
displayStatus := string(site.Status)
if allInMaint || maintSet[site.ID] || (site.ParentID > 0 && maintSet[site.ParentID]) {
displayStatus = "MAINT"
}
public[id] = statusSite{
Name: site.Name,
Type: site.Type,
URL: site.URL,
Status: displayStatus,
Paused: site.Paused,
LastCheck: site.LastCheck,
Latency: site.Latency,
}
}
if s.cfg.CORSOrigin != "" {
w.Header().Set("Access-Control-Allow-Origin", s.cfg.CORSOrigin)
}
w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode(public) //nolint:errcheck
}
// --- Helpers ---
func checkSecret(got, want string) bool {
return subtle.ConstantTimeCompare([]byte(got), []byte(want)) == 1
}
@@ -33,8 +422,79 @@ func extractBearerToken(r *http.Request) string {
return ""
}
// Alert-settings redaction policy lives in models.RedactAlertSettings so the
// TUI detail panel and this export path share one allowlist.
// statusSite is the public DTO for /status/json.
type statusSite struct {
Name string
Type string
URL string
Status string
Paused bool
LastCheck time.Time
Latency time.Duration
}
// --- Middleware ---
type statusWriter struct {
http.ResponseWriter
code int
}
func (w *statusWriter) WriteHeader(code int) {
w.code = code
w.ResponseWriter.WriteHeader(code)
}
func loggingMiddleware(trusted []*net.IPNet, next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
sw := &statusWriter{ResponseWriter: w, code: 200}
next.ServeHTTP(sw, r)
path := strings.ReplaceAll(strings.ReplaceAll(r.URL.Path, "\n", ""), "\r", "")
slog.Info("http request", "method", r.Method, "path", path, "status", sw.code, "duration", time.Since(start).Round(time.Millisecond), "ip", clientIP(r, trusted)) //nolint:gosec // structured slog, not format string
})
}
func securityHeadersMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("X-Content-Type-Options", "nosniff")
w.Header().Set("X-Frame-Options", "DENY")
w.Header().Set("Referrer-Policy", "no-referrer")
w.Header().Set("Content-Security-Policy", "default-src 'self'; script-src 'unsafe-inline'; style-src 'unsafe-inline'")
next.ServeHTTP(w, r)
})
}
func hstsMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Strict-Transport-Security", "max-age=63072000; includeSubDomains")
next.ServeHTTP(w, r)
})
}
func renderStatusPage(w http.ResponseWriter, title string, eng *monitor.Engine) {
sites := eng.GetAllSites()
sort.Slice(sites, func(i, j int) bool {
if sites[i].Status != sites[j].Status {
if sites[i].Status == models.StatusDown {
return true
}
if sites[j].Status == models.StatusDown {
return false
}
}
return sites[i].Name < sites[j].Name
})
data := struct {
Title string
Sites []models.Site
}{Title: title, Sites: sites}
if err := statusTpl.Execute(w, data); err != nil {
slog.Error("status page render failed", "err", err)
}
}
var statusTpl = template.Must(template.New("status").Parse(`
<!DOCTYPE html>
@@ -167,423 +627,3 @@ var statusTpl = template.Must(template.New("status").Parse(`
</script>
</body>
</html>`))
type ServerConfig struct {
Port int
EnableStatus bool
Title string
ClusterKey string
TLSCert string
TLSKey string
ClusterMode string
MetricsPublic bool
CORSOrigin string
TrustedProxies []*net.IPNet
// QuietHTTPLog disables per-request stderr logging. Set when the local
// TUI owns the terminal — request logs would scribble over the alt screen.
QuietHTTPLog bool
}
// statusSite is the public DTO for /status/json. models.Site must never be
// serialized raw here: it carries internal fields (LastError, Hostname, Port,
// DNSServer, AlertID, Token, ...) and every field added to it would become
// public by default. Field names match what the status page JS reads.
type statusSite struct {
Name string
Type string
URL string
Status string
Paused bool
LastCheck time.Time
Latency time.Duration
}
func Start(cfg ServerConfig, s store.Store, eng *monitor.Engine) *http.Server {
if cfg.ClusterKey == "" {
fmt.Println("WARNING: No UPTOP_CLUSTER_SECRET set. Cluster API endpoints are unauthenticated.")
}
pushRL := NewRateLimiter(60, cfg.TrustedProxies)
probeRL := NewRateLimiter(30, cfg.TrustedProxies)
backupRL := NewRateLimiter(10, cfg.TrustedProxies)
statusRL := NewRateLimiter(120, cfg.TrustedProxies)
mux := http.NewServeMux()
// 1. Push Heartbeat
mux.HandleFunc("/api/push", RateLimit(pushRL, func(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet && r.Method != http.MethodPost {
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
return
}
token := extractBearerToken(r)
if token == "" {
if qt := r.URL.Query().Get("token"); qt != "" {
token = qt
log.Printf("DEPRECATED: push token in query string — use Authorization: Bearer header instead")
}
}
if token == "" {
http.Error(w, "Missing token", http.StatusBadRequest)
return
}
if eng.RecordHeartbeat(token) {
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte("OK"))
} else {
http.Error(w, "Invalid Token", http.StatusNotFound)
}
}))
// 2. Health Check (For Cluster Follower)
mux.HandleFunc("/api/health", func(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
return
}
if cfg.ClusterKey != "" && !checkSecret(r.Header.Get("X-Upkeep-Secret"), cfg.ClusterKey) {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte("OK"))
})
// 3. Config Export
mux.HandleFunc("/api/backup/export", RateLimit(backupRL, func(w http.ResponseWriter, r *http.Request) {
if cfg.ClusterKey == "" || !checkSecret(r.Header.Get("X-Upkeep-Secret"), cfg.ClusterKey) {
http.Error(w, "Unauthorized: UPTOP_CLUSTER_SECRET required", http.StatusUnauthorized)
return
}
data, err := s.ExportData(r.Context())
if err != nil {
log.Printf("Export failed: %v", err)
http.Error(w, "Export failed", http.StatusInternalServerError)
return
}
if r.URL.Query().Get("redact_secrets") != "false" {
for i := range data.Alerts {
data.Alerts[i].Settings = models.RedactAlertSettings(data.Alerts[i].Type, data.Alerts[i].Settings)
}
}
_ = json.NewEncoder(w).Encode(data) //nolint:errcheck
}))
// 4. Config Import
mux.HandleFunc("/api/backup/import", RateLimit(backupRL, func(w http.ResponseWriter, r *http.Request) {
if r.Method != "POST" {
http.Error(w, "POST required", http.StatusMethodNotAllowed)
return
}
if cfg.ClusterKey == "" || !checkSecret(r.Header.Get("X-Upkeep-Secret"), cfg.ClusterKey) {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
r.Body = http.MaxBytesReader(w, r.Body, maxRequestBody)
var data models.Backup
if err := json.NewDecoder(r.Body).Decode(&data); err != nil {
http.Error(w, "Invalid JSON", http.StatusBadRequest)
return
}
if err := s.ImportData(r.Context(), data); err != nil {
log.Printf("Import failed: %v", err)
http.Error(w, "Import failed", http.StatusInternalServerError)
return
}
_, _ = w.Write([]byte("Import Successful"))
}))
// 5. Kuma Import
mux.HandleFunc("/api/import/kuma", RateLimit(backupRL, func(w http.ResponseWriter, r *http.Request) {
if r.Method != "POST" {
http.Error(w, "POST required", http.StatusMethodNotAllowed)
return
}
if cfg.ClusterKey == "" || !checkSecret(r.Header.Get("X-Upkeep-Secret"), cfg.ClusterKey) {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
r.Body = http.MaxBytesReader(w, r.Body, maxRequestBody)
var kb importer.KumaBackup
if err := json.NewDecoder(r.Body).Decode(&kb); err != nil {
log.Printf("Invalid Kuma JSON: %v", err)
http.Error(w, "Invalid Kuma JSON", http.StatusBadRequest)
return
}
backup := importer.ConvertKuma(&kb)
if err := s.ImportData(r.Context(), backup); err != nil {
log.Printf("Kuma import failed: %v", err)
http.Error(w, "Import failed", http.StatusInternalServerError)
return
}
fmt.Fprintf(w, "Imported %d monitors, %d alerts from Kuma v%s", len(backup.Sites), len(backup.Alerts), kb.Version)
}))
// 6. Probe Registration
mux.HandleFunc("/api/probe/register", RateLimit(probeRL, func(w http.ResponseWriter, r *http.Request) {
if r.Method != "POST" {
http.Error(w, "POST required", http.StatusMethodNotAllowed)
return
}
if cfg.ClusterKey == "" || !checkSecret(r.Header.Get("X-Upkeep-Secret"), cfg.ClusterKey) {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
r.Body = http.MaxBytesReader(w, r.Body, maxRequestBody)
var req struct {
ID string `json:"id"`
Name string `json:"name"`
Region string `json:"region"`
Version string `json:"version"`
}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, "Invalid JSON", http.StatusBadRequest)
return
}
if req.ID == "" {
http.Error(w, "id is required", http.StatusBadRequest)
return
}
if err := s.RegisterNode(r.Context(), models.ProbeNode{
ID: req.ID, Name: req.Name, Region: req.Region, Version: req.Version,
}); err != nil {
log.Printf("Probe register failed: %v", err)
http.Error(w, "Registration failed", http.StatusInternalServerError)
return
}
_ = json.NewEncoder(w).Encode(map[string]bool{"ok": true}) //nolint:errcheck
}))
// 7. Probe Assignment Fetch
mux.HandleFunc("/api/probe/assignments", RateLimit(probeRL, func(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
return
}
if cfg.ClusterKey == "" || !checkSecret(r.Header.Get("X-Upkeep-Secret"), cfg.ClusterKey) {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
nodeID := r.URL.Query().Get("node_id")
var nodeRegion string
if nodeID != "" {
if node, err := s.GetNode(r.Context(), nodeID); err == nil {
nodeRegion = node.Region
}
}
sites := eng.GetAllSites()
var assigned []models.Site
for _, site := range sites {
if site.Paused || site.Type == "push" || site.Type == "group" {
continue
}
if site.Regions != "" && nodeRegion != "" {
matched := false
for _, r := range strings.Split(site.Regions, ",") {
if strings.TrimSpace(r) == nodeRegion {
matched = true
break
}
}
if !matched {
continue
}
}
assigned = append(assigned, site)
}
w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode(map[string][]models.Site{"sites": assigned}) //nolint:errcheck
}))
// 8. Probe Result Submission
mux.HandleFunc("/api/probe/results", RateLimit(probeRL, func(w http.ResponseWriter, r *http.Request) {
if r.Method != "POST" {
http.Error(w, "POST required", http.StatusMethodNotAllowed)
return
}
if cfg.ClusterKey == "" || !checkSecret(r.Header.Get("X-Upkeep-Secret"), cfg.ClusterKey) {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
r.Body = http.MaxBytesReader(w, r.Body, maxRequestBody)
var req struct {
NodeID string `json:"node_id"`
Results []struct {
SiteID int `json:"site_id"`
LatencyNs int64 `json:"latency_ns"`
IsUp bool `json:"is_up"`
ErrorReason string `json:"error_reason"`
} `json:"results"`
}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, "Invalid JSON", http.StatusBadRequest)
return
}
if req.NodeID == "" {
http.Error(w, "node_id is required", http.StatusBadRequest)
return
}
for _, result := range req.Results {
eng.EnqueueProbeCheck(result.SiteID, req.NodeID, result.LatencyNs, result.IsUp)
eng.IngestProbeResult(req.NodeID, result.SiteID, result.LatencyNs, result.IsUp, result.ErrorReason)
}
if err := s.UpdateNodeLastSeen(r.Context(), req.NodeID); err != nil {
log.Printf("Failed to update node last seen: %v", err)
}
_ = json.NewEncoder(w).Encode(map[string]bool{"ok": true}) //nolint:errcheck
}))
// 9. Prometheus Metrics
mux.HandleFunc("/metrics", func(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
return
}
if !cfg.MetricsPublic && cfg.ClusterKey != "" {
if !checkSecret(r.Header.Get("X-Upkeep-Secret"), cfg.ClusterKey) {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
}
metrics.Handler(eng)(w, r)
})
// 10. Status Page
if cfg.EnableStatus {
mux.HandleFunc("/status", RateLimit(statusRL, func(w http.ResponseWriter, r *http.Request) { renderStatusPage(w, cfg.Title, eng) }))
mux.HandleFunc("/status/json", RateLimit(statusRL, func(w http.ResponseWriter, r *http.Request) {
state := eng.GetLiveState()
activeWindows, _ := s.GetActiveMaintenanceWindows(r.Context())
maintSet := make(map[int]bool)
allInMaint := false
for _, mw := range activeWindows {
if mw.Type != "maintenance" {
continue
}
if mw.MonitorID == 0 {
allInMaint = true
} else {
maintSet[mw.MonitorID] = true
}
}
public := make(map[int]statusSite, len(state))
for id, site := range state {
displayStatus := string(site.Status)
if allInMaint || maintSet[site.ID] || (site.ParentID > 0 && maintSet[site.ParentID]) {
displayStatus = "MAINT"
}
public[id] = statusSite{
Name: site.Name,
Type: site.Type,
URL: site.URL,
Status: displayStatus,
Paused: site.Paused,
LastCheck: site.LastCheck,
Latency: site.Latency,
}
}
if cfg.CORSOrigin != "" {
w.Header().Set("Access-Control-Allow-Origin", cfg.CORSOrigin)
}
w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode(public) //nolint:errcheck
}))
}
if cfg.ClusterMode != "" && cfg.ClusterMode != "leader" && cfg.TLSCert == "" {
fmt.Println("WARNING: Cluster mode active without TLS. Secrets transmitted in cleartext.")
}
handler := securityHeadersMiddleware(mux)
if !cfg.QuietHTTPLog {
handler = loggingMiddleware(cfg.TrustedProxies, handler)
}
if cfg.TLSCert != "" {
handler = hstsMiddleware(handler)
}
addr := fmt.Sprintf(":%d", cfg.Port)
srv := &http.Server{
Addr: addr,
Handler: handler,
ReadHeaderTimeout: 10 * time.Second,
ReadTimeout: 30 * time.Second,
WriteTimeout: 60 * time.Second,
IdleTimeout: 120 * time.Second,
}
go func() {
if cfg.TLSCert != "" && cfg.TLSKey != "" {
fmt.Printf("HTTPS Server listening on %s\n", addr)
if err := srv.ListenAndServeTLS(cfg.TLSCert, cfg.TLSKey); err != nil && err != http.ErrServerClosed {
log.Printf("HTTPS server error: %v", err)
}
} else {
fmt.Printf("HTTP Server listening on %s\n", addr)
if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
log.Printf("HTTP server error: %v", err)
}
}
}()
return srv
}
type statusWriter struct {
http.ResponseWriter
code int
}
func (w *statusWriter) WriteHeader(code int) {
w.code = code
w.ResponseWriter.WriteHeader(code)
}
func loggingMiddleware(trusted []*net.IPNet, next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
sw := &statusWriter{ResponseWriter: w, code: 200}
next.ServeHTTP(sw, r)
path := strings.ReplaceAll(strings.ReplaceAll(r.URL.Path, "\n", ""), "\r", "")
log.Printf("%s %s %d %s %s", r.Method, path, sw.code, time.Since(start).Round(time.Millisecond), clientIP(r, trusted)) //nolint:gosec // path sanitized above
})
}
func securityHeadersMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("X-Content-Type-Options", "nosniff")
w.Header().Set("X-Frame-Options", "DENY")
w.Header().Set("Referrer-Policy", "no-referrer")
w.Header().Set("Content-Security-Policy", "default-src 'self'; script-src 'unsafe-inline'; style-src 'unsafe-inline'")
next.ServeHTTP(w, r)
})
}
func hstsMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Strict-Transport-Security", "max-age=63072000; includeSubDomains")
next.ServeHTTP(w, r)
})
}
func renderStatusPage(w http.ResponseWriter, title string, eng *monitor.Engine) {
sites := eng.GetAllSites()
sort.Slice(sites, func(i, j int) bool {
if sites[i].Status != sites[j].Status {
if sites[i].Status == models.StatusDown {
return true
}
if sites[j].Status == models.StatusDown {
return false
}
}
return sites[i].Name < sites[j].Name
})
data := struct {
Title string
Sites []models.Site
}{Title: title, Sites: sites}
if err := statusTpl.Execute(w, data); err != nil {
log.Printf("Failed to render status page: %v", err)
}
}
+7 -7
View File
@@ -21,7 +21,7 @@ import (
type mockStore struct {
storetest.BaseMock
mu sync.Mutex
sites []models.Site
sites []models.SiteConfig
alerts []models.AlertConfig
nodes map[string]models.ProbeNode
importedData *models.Backup
@@ -35,7 +35,7 @@ func newMockStore() *mockStore {
}
}
func (m *mockStore) GetSites(_ context.Context) ([]models.Site, error) { return m.sites, nil }
func (m *mockStore) GetSites(_ context.Context) ([]models.SiteConfig, error) { return m.sites, nil }
func (m *mockStore) GetAllAlerts(_ context.Context) ([]models.AlertConfig, error) {
return m.alerts, nil
}
@@ -141,7 +141,7 @@ func authReq(method, url, secret string, body []byte) (*http.Response, error) {
return nil, err
}
if secret != "" {
req.Header.Set("X-Upkeep-Secret", secret)
req.Header.Set("X-Uptop-Secret", secret)
}
return http.DefaultClient.Do(req)
}
@@ -252,7 +252,7 @@ func TestExport_Unauthorized_WrongKey(t *testing.T) {
func TestExport_Success(t *testing.T) {
ts := newTestServer(t, "secret", false)
ts.store.sites = []models.Site{{ID: 1, Name: "example", URL: "http://example.com"}}
ts.store.sites = []models.SiteConfig{{ID: 1, Name: "example", URL: "http://example.com"}}
resp, err := authReq("GET", ts.baseURL+"/api/backup/export", "secret", nil)
if err != nil {
@@ -299,7 +299,7 @@ func TestImport_Unauthorized(t *testing.T) {
func TestImport_Success(t *testing.T) {
ts := newTestServer(t, "secret", false)
backup := models.Backup{
Sites: []models.Site{{Name: "imported", URL: "http://example.com"}},
Sites: []models.SiteConfig{{Name: "imported", URL: "http://example.com"}},
}
body, _ := json.Marshal(backup)
resp, err := authReq("POST", ts.baseURL+"/api/backup/import", "secret", body)
@@ -437,9 +437,9 @@ func TestStatusJSON_PublicDTOOnly(t *testing.T) {
// take. The old version of this test injected via UpdateSiteConfig, which
// no-ops for unknown IDs, so it asserted over zero sites and passed
// against a server that leaked tokens.
ts.store.sites = []models.Site{{
ts.store.sites = []models.SiteConfig{{
ID: 1, Name: "test", Type: "push", Token: "secret-token",
Hostname: "internal-host", LastError: "internal failure detail", AlertID: 3,
Hostname: "internal-host", AlertID: 3,
}}
ctx, cancel := context.WithCancel(context.Background())
ts.engine.Start(ctx)
+1
View File
@@ -18,6 +18,7 @@ type Dialect interface {
BoolFalse() string
ResetSequenceOnEmpty(db *sql.DB, table string)
ImportWipe(tx *sql.Tx)
ImportWipeUsers(tx *sql.Tx)
ImportResetSequences(tx *sql.Tx)
UpsertNodeSQL() string
UpsertAlertHealthSQL() string
+17 -14
View File
@@ -2,7 +2,7 @@ package store
import (
"database/sql"
"log"
"log/slog"
_ "github.com/lib/pq"
)
@@ -133,39 +133,42 @@ func (d *PostgresDialect) ResetSequenceOnEmpty(db *sql.DB, table string) {}
func (d *PostgresDialect) ImportWipe(tx *sql.Tx) {
if _, err := tx.Exec("TRUNCATE TABLE sites RESTART IDENTITY CASCADE"); err != nil {
log.Printf("import wipe error: %v", err)
slog.Debug("import wipe failed", "table", "sites", "err", err)
}
if _, err := tx.Exec("TRUNCATE TABLE alerts RESTART IDENTITY CASCADE"); err != nil {
log.Printf("import wipe error: %v", err)
}
if _, err := tx.Exec("TRUNCATE TABLE users RESTART IDENTITY CASCADE"); err != nil {
log.Printf("import wipe error: %v", err)
slog.Debug("import wipe failed", "table", "alerts", "err", err)
}
if _, err := tx.Exec("TRUNCATE TABLE maintenance_windows RESTART IDENTITY CASCADE"); err != nil {
log.Printf("import wipe error: %v", err)
slog.Debug("import wipe failed", "table", "maintenance_windows", "err", err)
}
if _, err := tx.Exec("TRUNCATE TABLE check_history RESTART IDENTITY CASCADE"); err != nil {
log.Printf("import wipe error: %v", err)
slog.Debug("import wipe failed", "table", "check_history", "err", err)
}
if _, err := tx.Exec("TRUNCATE TABLE state_changes RESTART IDENTITY CASCADE"); err != nil {
log.Printf("import wipe error: %v", err)
slog.Debug("import wipe failed", "table", "state_changes", "err", err)
}
if _, err := tx.Exec("TRUNCATE TABLE alert_health RESTART IDENTITY CASCADE"); err != nil {
log.Printf("import wipe error: %v", err)
slog.Debug("import wipe failed", "table", "alert_health", "err", err)
}
}
func (d *PostgresDialect) ImportWipeUsers(tx *sql.Tx) {
if _, err := tx.Exec("TRUNCATE TABLE users RESTART IDENTITY CASCADE"); err != nil {
slog.Debug("import wipe failed", "table", "users", "err", err)
}
}
func (d *PostgresDialect) ImportResetSequences(tx *sql.Tx) {
if _, err := tx.Exec("SELECT setval('sites_id_seq', (SELECT COALESCE(MAX(id), 1) FROM sites))"); err != nil {
log.Printf("sequence reset error: %v", err)
slog.Debug("sequence reset failed", "table", "sites", "err", err)
}
if _, err := tx.Exec("SELECT setval('alerts_id_seq', (SELECT COALESCE(MAX(id), 1) FROM alerts))"); err != nil {
log.Printf("sequence reset error: %v", err)
slog.Debug("sequence reset failed", "table", "alerts", "err", err)
}
if _, err := tx.Exec("SELECT setval('users_id_seq', (SELECT COALESCE(MAX(id), 1) FROM users))"); err != nil {
log.Printf("sequence reset error: %v", err)
slog.Debug("sequence reset failed", "table", "users", "err", err)
}
if _, err := tx.Exec("SELECT setval('maintenance_windows_id_seq', (SELECT COALESCE(MAX(id), 1) FROM maintenance_windows))"); err != nil {
log.Printf("sequence reset error: %v", err)
slog.Debug("sequence reset failed", "table", "maintenance_windows", "err", err)
}
}
+28 -17
View File
@@ -3,7 +3,8 @@ package store
import (
"database/sql"
"fmt"
"log"
"log/slog"
"os"
_ "modernc.org/sqlite"
)
@@ -25,6 +26,13 @@ func NewSQLiteStore(path string) (*SQLStore, error) {
if err != nil {
return nil, err
}
if path != ":memory:" {
for _, suffix := range []string{"", "-wal", "-shm"} {
if err := os.Chmod(path+suffix, 0600); err != nil && !os.IsNotExist(err) {
slog.Warn("failed to chmod database file", "path", path+suffix, "err", err)
}
}
}
return s, nil
}
@@ -141,44 +149,47 @@ func (d *SQLiteDialect) ResetSequenceOnEmpty(db *sql.DB, table string) {
_ = db.QueryRow("SELECT COUNT(*) FROM " + table).Scan(&count) //nolint:errcheck
if count == 0 {
if _, err := db.Exec("DELETE FROM sqlite_sequence WHERE name=?", table); err != nil {
log.Printf("sequence cleanup error: %v", err)
slog.Debug("sequence cleanup failed", "table", table, "err", err)
}
}
}
func (d *SQLiteDialect) ImportWipe(tx *sql.Tx) {
if _, err := tx.Exec("DELETE FROM sites"); err != nil {
log.Printf("import wipe error: %v", err)
slog.Debug("import wipe failed", "table", "sites", "err", err)
}
if _, err := tx.Exec("DELETE FROM sqlite_sequence WHERE name='sites'"); err != nil {
log.Printf("import wipe error: %v", err)
slog.Debug("import wipe failed", "table", "sqlite_sequence(sites)", "err", err)
}
if _, err := tx.Exec("DELETE FROM alerts"); err != nil {
log.Printf("import wipe error: %v", err)
slog.Debug("import wipe failed", "table", "alerts", "err", err)
}
if _, err := tx.Exec("DELETE FROM sqlite_sequence WHERE name='alerts'"); err != nil {
log.Printf("import wipe error: %v", err)
}
if _, err := tx.Exec("DELETE FROM users"); err != nil {
log.Printf("import wipe error: %v", err)
}
if _, err := tx.Exec("DELETE FROM sqlite_sequence WHERE name='users'"); err != nil {
log.Printf("import wipe error: %v", err)
slog.Debug("import wipe failed", "table", "sqlite_sequence(alerts)", "err", err)
}
if _, err := tx.Exec("DELETE FROM maintenance_windows"); err != nil {
log.Printf("import wipe error: %v", err)
slog.Debug("import wipe failed", "table", "maintenance_windows", "err", err)
}
if _, err := tx.Exec("DELETE FROM sqlite_sequence WHERE name='maintenance_windows'"); err != nil {
log.Printf("import wipe error: %v", err)
slog.Debug("import wipe failed", "table", "sqlite_sequence(maintenance_windows)", "err", err)
}
if _, err := tx.Exec("DELETE FROM check_history"); err != nil {
log.Printf("import wipe error: %v", err)
slog.Debug("import wipe failed", "table", "check_history", "err", err)
}
if _, err := tx.Exec("DELETE FROM state_changes"); err != nil {
log.Printf("import wipe error: %v", err)
slog.Debug("import wipe failed", "table", "state_changes", "err", err)
}
if _, err := tx.Exec("DELETE FROM alert_health"); err != nil {
log.Printf("import wipe error: %v", err)
slog.Debug("import wipe failed", "table", "alert_health", "err", err)
}
}
func (d *SQLiteDialect) ImportWipeUsers(tx *sql.Tx) {
if _, err := tx.Exec("DELETE FROM users"); err != nil {
slog.Debug("import wipe failed", "table", "users", "err", err)
}
if _, err := tx.Exec("DELETE FROM sqlite_sequence WHERE name='users'"); err != nil {
slog.Debug("import wipe failed", "table", "sqlite_sequence(users)", "err", err)
}
}
+19 -13
View File
@@ -17,7 +17,6 @@ const (
maxLogRows = 200
maxStateChangesPerSite = 5000
maxMaintenanceExport = 1000
maxRequestBody = 1 << 20
)
type SQLStore struct {
@@ -112,7 +111,7 @@ func (s *SQLStore) Init(ctx context.Context) error {
return nil
}
func (s *SQLStore) GetSites(ctx context.Context) ([]models.Site, error) {
func (s *SQLStore) GetSites(ctx context.Context) ([]models.SiteConfig, error) {
bf := s.dialect.BoolFalse()
query := fmt.Sprintf( //nolint:gosec // bf is a dialect boolean literal, not user input
"SELECT id, COALESCE(name, url), url, COALESCE(type, 'http'), COALESCE(token, ''), interval, alert_id, check_ssl, threshold, max_retries, COALESCE(hostname, ''), COALESCE(port, 0), COALESCE(timeout, 0), COALESCE(method, 'GET'), COALESCE(description, ''), COALESCE(parent_id, 0), COALESCE(accepted_codes, '200-299'), COALESCE(dns_resolve_type, ''), COALESCE(dns_server, ''), COALESCE(ignore_tls, %s), COALESCE(paused, %s), COALESCE(regions, '') FROM sites",
@@ -123,9 +122,9 @@ func (s *SQLStore) GetSites(ctx context.Context) ([]models.Site, error) {
return nil, err
}
defer rows.Close()
var sites []models.Site
var sites []models.SiteConfig
for rows.Next() {
var st models.Site
var st models.SiteConfig
if err := rows.Scan(&st.ID, &st.Name, &st.URL, &st.Type, &st.Token, &st.Interval, &st.AlertID,
&st.CheckSSL, &st.ExpiryThreshold, &st.MaxRetries, &st.Hostname, &st.Port, &st.Timeout,
&st.Method, &st.Description, &st.ParentID, &st.AcceptedCodes, &st.DNSResolveType,
@@ -137,7 +136,7 @@ func (s *SQLStore) GetSites(ctx context.Context) ([]models.Site, error) {
return sites, rows.Err()
}
func (s *SQLStore) AddSite(ctx context.Context, site models.Site) error {
func (s *SQLStore) AddSite(ctx context.Context, site models.SiteConfig) error {
token := ""
if site.Type == "push" {
var err error
@@ -152,9 +151,11 @@ func (s *SQLStore) AddSite(ctx context.Context, site models.Site) error {
return err
}
func (s *SQLStore) UpdateSite(ctx context.Context, site models.Site) error {
func (s *SQLStore) UpdateSite(ctx context.Context, site models.SiteConfig) error {
var existingToken string
_ = s.db.QueryRowContext(ctx, s.q("SELECT token FROM sites WHERE id=?"), site.ID).Scan(&existingToken) //nolint:errcheck
if err := s.db.QueryRowContext(ctx, s.q("SELECT token FROM sites WHERE id=?"), site.ID).Scan(&existingToken); err != nil && err != sql.ErrNoRows {
return fmt.Errorf("read existing token: %w", err)
}
if site.Type == "push" && existingToken == "" {
var err error
existingToken, err = generateToken()
@@ -198,13 +199,13 @@ func (s *SQLStore) DeleteSite(ctx context.Context, id int) error {
return nil
}
func (s *SQLStore) GetSiteByName(ctx context.Context, name string) (models.Site, error) {
func (s *SQLStore) GetSiteByName(ctx context.Context, name string) (models.SiteConfig, error) {
bf := s.dialect.BoolFalse()
query := fmt.Sprintf( //nolint:gosec // bf is a dialect boolean literal, not user input
"SELECT id, COALESCE(name, url), url, COALESCE(type, 'http'), COALESCE(token, ''), interval, alert_id, check_ssl, threshold, max_retries, COALESCE(hostname, ''), COALESCE(port, 0), COALESCE(timeout, 0), COALESCE(method, 'GET'), COALESCE(description, ''), COALESCE(parent_id, 0), COALESCE(accepted_codes, '200-299'), COALESCE(dns_resolve_type, ''), COALESCE(dns_server, ''), COALESCE(ignore_tls, %s), COALESCE(paused, %s), COALESCE(regions, '') FROM sites WHERE name = %s",
bf, bf, s.q("?"),
)
var st models.Site
var st models.SiteConfig
err := s.db.QueryRowContext(ctx, query, name).Scan(&st.ID, &st.Name, &st.URL, &st.Type, &st.Token, &st.Interval, &st.AlertID,
&st.CheckSSL, &st.ExpiryThreshold, &st.MaxRetries, &st.Hostname, &st.Port, &st.Timeout,
&st.Method, &st.Description, &st.ParentID, &st.AcceptedCodes, &st.DNSResolveType,
@@ -246,7 +247,7 @@ func (s *SQLStore) GetAlertByName(ctx context.Context, name string) (models.Aler
return a, nil
}
func (s *SQLStore) AddSiteReturningID(ctx context.Context, site models.Site) (int, error) {
func (s *SQLStore) AddSiteReturningID(ctx context.Context, site models.SiteConfig) (int, error) {
token := ""
if site.Type == "push" {
var err error
@@ -741,9 +742,14 @@ func (s *SQLStore) ImportData(ctx context.Context, data models.Backup) error {
s.dialect.ImportWipe(tx)
for _, u := range data.Users {
if _, err := tx.ExecContext(ctx, s.q("INSERT INTO users (username, public_key, role) VALUES (?, ?, ?)"), u.Username, u.PublicKey, u.Role); err != nil {
return err
// Only wipe+replace users when callers explicitly provide them (CLI
// full restore). API/Kuma imports pass nil — existing users preserved.
if data.Users != nil {
s.dialect.ImportWipeUsers(tx)
for _, u := range data.Users {
if _, err := tx.ExecContext(ctx, s.q("INSERT INTO users (username, public_key, role) VALUES (?, ?, ?)"), u.Username, u.PublicKey, u.Role); err != nil {
return err
}
}
}
for _, a := range data.Alerts {
+31 -6
View File
@@ -33,7 +33,7 @@ func TestSiteCRUD(t *testing.T) {
t.Fatalf("expected 0 sites, got %d", len(sites))
}
if err := s.AddSite(context.Background(), models.Site{Name: "Test", URL: "https://example.com", Type: "http", Interval: 30}); err != nil {
if err := s.AddSite(context.Background(), models.SiteConfig{Name: "Test", URL: "https://example.com", Type: "http", Interval: 30}); err != nil {
t.Fatalf("AddSite: %v", err)
}
@@ -174,7 +174,7 @@ func TestUserCRUD(t *testing.T) {
func TestPushTokenGeneration(t *testing.T) {
s := newTestStore(t)
if err := s.AddSite(context.Background(), models.Site{Name: "Push Monitor", Type: "push", Interval: 60}); err != nil {
if err := s.AddSite(context.Background(), models.SiteConfig{Name: "Push Monitor", Type: "push", Interval: 60}); err != nil {
t.Fatalf("AddSite: %v", err)
}
@@ -199,7 +199,7 @@ func TestImportExport(t *testing.T) {
if err := s.AddAlert(context.Background(), "Test Alert", "webhook", map[string]string{"url": "https://example.com"}); err != nil {
t.Fatalf("AddAlert: %v", err)
}
if err := s.AddSite(context.Background(), models.Site{Name: "Site1", URL: "https://example.com", Type: "http", Interval: 30}); err != nil {
if err := s.AddSite(context.Background(), models.SiteConfig{Name: "Site1", URL: "https://example.com", Type: "http", Interval: 30}); err != nil {
t.Fatalf("AddSite: %v", err)
}
if err := s.AddUser(context.Background(), "user1", "ssh-ed25519 KEY", "user"); err != nil {
@@ -239,7 +239,7 @@ func TestImportExport(t *testing.T) {
func TestImportData_WipesHistory(t *testing.T) {
s := newTestStore(t)
if err := s.AddSite(context.Background(), models.Site{Name: "OldSite", URL: "https://old.com", Type: "http", Interval: 30}); err != nil {
if err := s.AddSite(context.Background(), models.SiteConfig{Name: "OldSite", URL: "https://old.com", Type: "http", Interval: 30}); err != nil {
t.Fatalf("AddSite: %v", err)
}
if err := s.SaveCheck(context.Background(), 1, 5000, true); err != nil {
@@ -253,7 +253,7 @@ func TestImportData_WipesHistory(t *testing.T) {
}
backup := models.Backup{
Sites: []models.Site{{ID: 1, Name: "NewSite", URL: "https://new.com", Type: "http", Interval: 60}},
Sites: []models.SiteConfig{{ID: 1, Name: "NewSite", URL: "https://new.com", Type: "http", Interval: 60}},
}
if err := s.ImportData(context.Background(), backup); err != nil {
t.Fatalf("ImportData: %v", err)
@@ -276,6 +276,31 @@ func TestImportData_WipesHistory(t *testing.T) {
}
}
func TestImportData_NilUsersPreservesExisting(t *testing.T) {
s := newTestStore(t)
if err := s.AddUser(context.Background(), "admin", "ssh-ed25519 ADMINKEY", "admin"); err != nil {
t.Fatalf("AddUser: %v", err)
}
backup := models.Backup{
Sites: []models.SiteConfig{{ID: 1, Name: "New", URL: "https://new.com", Type: "http", Interval: 30}},
Alerts: []models.AlertConfig{{ID: 1, Name: "a", Type: "webhook", Settings: map[string]string{"url": "https://h.com"}}},
Users: nil,
}
if err := s.ImportData(context.Background(), backup); err != nil {
t.Fatalf("ImportData: %v", err)
}
users, err := s.GetAllUsers(context.Background())
if err != nil {
t.Fatalf("GetAllUsers: %v", err)
}
if len(users) != 1 || users[0].Username != "admin" {
t.Errorf("expected existing admin user preserved, got %d users", len(users))
}
}
func TestCheckHistory(t *testing.T) {
s := newTestStore(t)
@@ -314,7 +339,7 @@ func TestCheckHistory(t *testing.T) {
func TestDeleteSiteCascade(t *testing.T) {
s := newTestStore(t)
site := models.Site{Name: "Cascade Test", URL: "https://example.com", Interval: 30}
site := models.SiteConfig{Name: "Cascade Test", URL: "https://example.com", Interval: 30}
if err := s.AddSite(context.Background(), site); err != nil {
t.Fatalf("AddSite: %v", err)
}
+5 -5
View File
@@ -11,9 +11,9 @@ type Store interface {
Init(ctx context.Context) error
// Sites
GetSites(ctx context.Context) ([]models.Site, error)
AddSite(ctx context.Context, site models.Site) error
UpdateSite(ctx context.Context, site models.Site) error
GetSites(ctx context.Context) ([]models.SiteConfig, error)
AddSite(ctx context.Context, site models.SiteConfig) error
UpdateSite(ctx context.Context, site models.SiteConfig) error
UpdateSitePaused(ctx context.Context, id int, paused bool) error
DeleteSite(ctx context.Context, id int) error
@@ -25,9 +25,9 @@ type Store interface {
DeleteAlert(ctx context.Context, id int) error
// Declarative config support
GetSiteByName(ctx context.Context, name string) (models.Site, error)
GetSiteByName(ctx context.Context, name string) (models.SiteConfig, error)
GetAlertByName(ctx context.Context, name string) (models.AlertConfig, error)
AddSiteReturningID(ctx context.Context, site models.Site) (int, error)
AddSiteReturningID(ctx context.Context, site models.SiteConfig) (int, error)
AddAlertReturningID(ctx context.Context, name, aType string, settings map[string]string) (int, error)
// Users
+11 -9
View File
@@ -11,9 +11,9 @@ import (
// mocks and override only the methods you need via the exported Func fields or
// by shadowing the method on the embedding struct.
type BaseMock struct {
GetSitesFunc func(ctx context.Context) ([]models.Site, error)
AddSiteFunc func(ctx context.Context, site models.Site) error
UpdateSiteFunc func(ctx context.Context, site models.Site) error
GetSitesFunc func(ctx context.Context) ([]models.SiteConfig, error)
AddSiteFunc func(ctx context.Context, site models.SiteConfig) error
UpdateSiteFunc func(ctx context.Context, site models.SiteConfig) error
GetAllAlertsFunc func(ctx context.Context) ([]models.AlertConfig, error)
GetAlertFunc func(ctx context.Context, id int) (models.AlertConfig, error)
GetAllUsersFunc func(ctx context.Context) ([]models.User, error)
@@ -41,21 +41,21 @@ type BaseMock struct {
func (m *BaseMock) Init(_ context.Context) error { return nil }
func (m *BaseMock) Close() error { return nil }
func (m *BaseMock) GetSites(ctx context.Context) ([]models.Site, error) {
func (m *BaseMock) GetSites(ctx context.Context) ([]models.SiteConfig, error) {
if m.GetSitesFunc != nil {
return m.GetSitesFunc(ctx)
}
return nil, nil
}
func (m *BaseMock) AddSite(ctx context.Context, site models.Site) error {
func (m *BaseMock) AddSite(ctx context.Context, site models.SiteConfig) error {
if m.AddSiteFunc != nil {
return m.AddSiteFunc(ctx, site)
}
return nil
}
func (m *BaseMock) UpdateSite(ctx context.Context, site models.Site) error {
func (m *BaseMock) UpdateSite(ctx context.Context, site models.SiteConfig) error {
if m.UpdateSiteFunc != nil {
return m.UpdateSiteFunc(ctx, site)
}
@@ -90,15 +90,17 @@ func (m *BaseMock) UpdateAlert(_ context.Context, _ int, _ string, _ string, _ m
func (m *BaseMock) DeleteAlert(_ context.Context, _ int) error { return nil }
func (m *BaseMock) GetSiteByName(_ context.Context, _ string) (models.Site, error) {
return models.Site{}, nil
func (m *BaseMock) GetSiteByName(_ context.Context, _ string) (models.SiteConfig, error) {
return models.SiteConfig{}, nil
}
func (m *BaseMock) GetAlertByName(_ context.Context, _ string) (models.AlertConfig, error) {
return models.AlertConfig{}, nil
}
func (m *BaseMock) AddSiteReturningID(_ context.Context, _ models.Site) (int, error) { return 0, nil }
func (m *BaseMock) AddSiteReturningID(_ context.Context, _ models.SiteConfig) (int, error) {
return 0, nil
}
func (m *BaseMock) AddAlertReturningID(_ context.Context, _ string, _ string, _ map[string]string) (int, error) {
return 0, nil
-103
View File
@@ -1,103 +0,0 @@
package tui
// braillePlane is a subpixel canvas where each terminal cell maps to a 2×4
// dot grid, rendered via Unicode braille (U+2800..U+28FF).
type braillePlane struct {
wCells, hCells int
wDots, hDots int
dots []bool
}
func newBraillePlane(wCells, hCells int) *braillePlane {
wd, hd := wCells*2, hCells*4
return &braillePlane{
wCells: wCells, hCells: hCells,
wDots: wd, hDots: hd,
dots: make([]bool, wd*hd),
}
}
func (p *braillePlane) set(dx, dy int) {
if dx < 0 || dy < 0 || dx >= p.wDots || dy >= p.hDots {
return
}
p.dots[dy*p.wDots+dx] = true
}
// line draws a Bresenham line between two dot coordinates.
func (p *braillePlane) line(x0, y0, x1, y1 int) {
dx := intAbs(x1 - x0)
sx := 1
if x0 >= x1 {
sx = -1
}
dy := -intAbs(y1 - y0)
sy := 1
if y0 >= y1 {
sy = -1
}
err := dx + dy
for {
p.set(x0, y0)
if x0 == x1 && y0 == y1 {
return
}
e2 := 2 * err
if e2 >= dy {
err += dy
x0 += sx
}
if e2 <= dx {
err += dx
y0 += sy
}
}
}
// fillBelow fills all dots below the topmost lit dot in each column,
// producing an area-chart effect.
func (p *braillePlane) fillBelow() {
for x := 0; x < p.wDots; x++ {
topY := -1
for y := 0; y < p.hDots; y++ {
if p.dots[y*p.wDots+x] {
topY = y
break
}
}
if topY >= 0 {
for y := topY + 1; y < p.hDots; y++ {
p.dots[y*p.wDots+x] = true
}
}
}
}
// cellMask builds the U+2800-relative bitmask for one terminal cell.
func (p *braillePlane) cellMask(cx, cy int) byte {
type bit struct {
dx, dy int
m byte
}
bits := [...]bit{
{0, 0, 0x01}, {0, 1, 0x02}, {0, 2, 0x04},
{1, 0, 0x08}, {1, 1, 0x10}, {1, 2, 0x20},
{0, 3, 0x40}, {1, 3, 0x80},
}
var mask byte
for _, b := range bits {
dx := cx*2 + b.dx
dy := cy*4 + b.dy
if dx >= 0 && dx < p.wDots && dy >= 0 && dy < p.hDots && p.dots[dy*p.wDots+dx] {
mask |= b.m
}
}
return mask
}
func intAbs(n int) int {
if n < 0 {
return -n
}
return n
}
-64
View File
@@ -1,64 +0,0 @@
package tui
import "testing"
func TestBraillePlane_Set(t *testing.T) {
p := newBraillePlane(2, 1)
if p.wDots != 4 || p.hDots != 4 {
t.Fatalf("expected 4x4 dots, got %dx%d", p.wDots, p.hDots)
}
p.set(0, 0)
if !p.dots[0] {
t.Error("dot at (0,0) should be set")
}
p.set(-1, 0) // out of bounds, should not panic
p.set(0, 99) // out of bounds, should not panic
}
func TestBraillePlane_CellMask(t *testing.T) {
p := newBraillePlane(1, 1)
// Set bottom-left dot
p.set(0, 3)
mask := p.cellMask(0, 0)
if mask != 0x40 {
t.Errorf("bottom-left dot should be 0x40, got 0x%02x", mask)
}
// Set all dots
for y := 0; y < 4; y++ {
for x := 0; x < 2; x++ {
p.set(x, y)
}
}
mask = p.cellMask(0, 0)
if mask != 0xFF {
t.Errorf("all dots should be 0xFF, got 0x%02x", mask)
}
}
func TestBraillePlane_Line(t *testing.T) {
p := newBraillePlane(3, 1)
p.line(0, 2, 5, 2) // horizontal line
for x := 0; x <= 5; x++ {
if !p.dots[2*p.wDots+x] {
t.Errorf("dot at (%d, 2) should be set", x)
}
}
}
func TestBraillePlane_FillBelow(t *testing.T) {
p := newBraillePlane(1, 1)
p.set(0, 1) // set dot at row 1
p.fillBelow()
if !p.dots[1*p.wDots+0] {
t.Error("original dot should still be set")
}
if !p.dots[2*p.wDots+0] {
t.Error("row 2 should be filled")
}
if !p.dots[3*p.wDots+0] {
t.Error("row 3 should be filled")
}
if p.dots[0*p.wDots+0] {
t.Error("row 0 above the dot should not be filled")
}
}
+16 -1
View File
@@ -104,10 +104,25 @@ func (m *Model) refreshLive() {
ordered = filterSites(ordered, m.filterText)
}
m.sites = ordered
m.logViewport.SetContent(strings.Join(m.engine.GetLogs(), "\n"))
m.refreshLogContent()
if m.currentTab == 0 && m.selectedID != 0 {
for i, s := range m.sites {
if s.ID == m.selectedID {
m.cursor = i
break
}
}
}
m.clampCursor()
}
func (m *Model) syncSelectedID() {
if m.currentTab == 0 && m.cursor < len(m.sites) {
m.selectedID = m.sites[m.cursor].ID
}
}
// clampCursor keeps the cursor and scroll offset within the current tab's list.
func (m *Model) clampCursor() {
listLen := m.currentListLen()
+13 -13
View File
@@ -8,9 +8,9 @@ import (
func TestSortSitesForDisplay_GroupsFirst(t *testing.T) {
sites := []models.Site{
{ID: 3, Name: "ungrouped", Type: "http", Status: "UP"},
{ID: 1, Name: "group-a", Type: "group", Status: "UP"},
{ID: 2, Name: "child", Type: "http", Status: "UP", ParentID: 1},
{SiteConfig: models.SiteConfig{ID: 3, Name: "ungrouped", Type: "http"}, SiteState: models.SiteState{Status: "UP"}},
{SiteConfig: models.SiteConfig{ID: 1, Name: "group-a", Type: "group"}, SiteState: models.SiteState{Status: "UP"}},
{SiteConfig: models.SiteConfig{ID: 2, Name: "child", Type: "http", ParentID: 1}, SiteState: models.SiteState{Status: "UP"}},
}
result := sortSitesForDisplay(sites, nil)
if len(result) != 3 {
@@ -29,9 +29,9 @@ func TestSortSitesForDisplay_GroupsFirst(t *testing.T) {
func TestSortSitesForDisplay_CollapsedHidesChildren(t *testing.T) {
sites := []models.Site{
{ID: 1, Name: "group-a", Type: "group", Status: "UP"},
{ID: 2, Name: "child-1", Type: "http", Status: "UP", ParentID: 1},
{ID: 3, Name: "child-2", Type: "http", Status: "UP", ParentID: 1},
{SiteConfig: models.SiteConfig{ID: 1, Name: "group-a", Type: "group"}, SiteState: models.SiteState{Status: "UP"}},
{SiteConfig: models.SiteConfig{ID: 2, Name: "child-1", Type: "http", ParentID: 1}, SiteState: models.SiteState{Status: "UP"}},
{SiteConfig: models.SiteConfig{ID: 3, Name: "child-2", Type: "http", ParentID: 1}, SiteState: models.SiteState{Status: "UP"}},
}
collapsed := map[int]bool{1: true}
result := sortSitesForDisplay(sites, collapsed)
@@ -45,9 +45,9 @@ func TestSortSitesForDisplay_CollapsedHidesChildren(t *testing.T) {
func TestSortSitesForDisplay_StatusOrdering(t *testing.T) {
sites := []models.Site{
{ID: 1, Name: "up-site", Type: "http", Status: "UP"},
{ID: 2, Name: "down-site", Type: "http", Status: "DOWN"},
{ID: 3, Name: "late-site", Type: "http", Status: "LATE"},
{SiteConfig: models.SiteConfig{ID: 1, Name: "up-site", Type: "http"}, SiteState: models.SiteState{Status: "UP"}},
{SiteConfig: models.SiteConfig{ID: 2, Name: "down-site", Type: "http"}, SiteState: models.SiteState{Status: "DOWN"}},
{SiteConfig: models.SiteConfig{ID: 3, Name: "late-site", Type: "http"}, SiteState: models.SiteState{Status: "LATE"}},
}
result := sortSitesForDisplay(sites, nil)
if result[0].Status != "DOWN" {
@@ -63,9 +63,9 @@ func TestSortSitesForDisplay_StatusOrdering(t *testing.T) {
func TestFilterSites(t *testing.T) {
sites := []models.Site{
{Name: "Production API"},
{Name: "Staging API"},
{Name: "Database"},
{SiteConfig: models.SiteConfig{Name: "Production API"}},
{SiteConfig: models.SiteConfig{Name: "Staging API"}},
{SiteConfig: models.SiteConfig{Name: "Database"}},
}
tests := []struct {
@@ -87,7 +87,7 @@ func TestFilterSites(t *testing.T) {
}
func TestFilterSites_EmptyNeedle(t *testing.T) {
sites := []models.Site{{Name: "a"}, {Name: "b"}}
sites := []models.Site{{SiteConfig: models.SiteConfig{Name: "a"}}, {SiteConfig: models.SiteConfig{Name: "b"}}}
got := filterSites(sites, "")
if len(got) != 2 {
t.Errorf("empty needle should return all, got %d", len(got))
+3
View File
@@ -34,6 +34,9 @@ func (m Model) emptyState(message, hint string) string {
}
func limitStr(text string, max int) string {
if max < 3 {
return text
}
runes := []rune(text)
if len(runes) > max {
return string(runes[:max-3]) + "..."
+7 -7
View File
@@ -38,13 +38,13 @@ func TestSiteOrder(t *testing.T) {
site models.Site
want int
}{
{"down", models.Site{Status: "DOWN"}, 0},
{"ssl exp", models.Site{Status: "SSL EXP"}, 0},
{"late", models.Site{Status: "LATE"}, 1},
{"up", models.Site{Status: "UP"}, 2},
{"pending", models.Site{Status: "PENDING"}, 3},
{"paused up", models.Site{Status: "UP", Paused: true}, 3},
{"paused down", models.Site{Status: "DOWN", Paused: true}, 3},
{"down", models.Site{SiteState: models.SiteState{Status: "DOWN"}}, 0},
{"ssl exp", models.Site{SiteState: models.SiteState{Status: "SSL EXP"}}, 0},
{"late", models.Site{SiteState: models.SiteState{Status: "LATE"}}, 1},
{"up", models.Site{SiteState: models.SiteState{Status: "UP"}}, 2},
{"pending", models.Site{SiteState: models.SiteState{Status: "PENDING"}}, 3},
{"paused up", models.Site{SiteConfig: models.SiteConfig{Paused: true}, SiteState: models.SiteState{Status: "UP"}}, 3},
{"paused down", models.Site{SiteConfig: models.SiteConfig{Paused: true}, SiteState: models.SiteState{Status: "DOWN"}}, 3},
}
for _, tt := range tests {
got := siteOrder(tt.site)
+22 -10
View File
@@ -17,6 +17,17 @@ func parseHex(hex string) (r, g, b uint8) {
return
}
func trueColorHex(c lipgloss.TerminalColor) string {
switch v := c.(type) {
case lipgloss.CompleteColor:
return v.TrueColor
case lipgloss.Color:
return string(v)
default:
return ""
}
}
func dimColor(hex string, brightness float64) lipgloss.Color {
r, g, b := parseHex(hex)
f := 0.3 + brightness*0.7
@@ -27,35 +38,36 @@ func dimColor(hex string, brightness float64) lipgloss.Color {
))
}
func withBg(s lipgloss.Style, bg lipgloss.Color) lipgloss.Style {
if bg != "" {
func withBg(s lipgloss.Style, bg lipgloss.TerminalColor) lipgloss.Style {
if bg != nil {
return s.Background(bg)
}
return s
}
func (m Model) latencyStyle(ms int64, bg lipgloss.Color) lipgloss.Style {
var hex string
func (m Model) latencyStyle(ms int64, bg lipgloss.TerminalColor) lipgloss.Style {
var base lipgloss.TerminalColor
var t float64
switch {
case ms < 200:
hex = m.st.sparkSuccess
base = m.st.sparkSuccess
t = float64(ms) / 200
case ms < 500:
hex = m.st.sparkWarning
base = m.st.sparkWarning
t = float64(ms-200) / 300
default:
hex = m.st.sparkDanger
base = m.st.sparkDanger
t = float64(ms-500) / 1500
if t > 1 {
t = 1
}
}
hex := trueColorHex(base)
s := lipgloss.NewStyle().Foreground(dimColor(hex, t))
return withBg(s, bg)
}
func (m Model) latencySparkline(latencies []time.Duration, statuses []bool, width int, bg lipgloss.Color) string {
func (m Model) latencySparkline(latencies []time.Duration, statuses []bool, width int, bg lipgloss.TerminalColor) string {
if len(latencies) == 0 {
return withBg(m.st.subtleStyle, bg).Render(strings.Repeat("·", width))
}
@@ -103,7 +115,7 @@ func (m Model) latencySparkline(latencies []time.Duration, statuses []bool, widt
return sb.String()
}
func (m Model) heartbeatSparkline(statuses []bool, width int, bg lipgloss.Color) string {
func (m Model) heartbeatSparkline(statuses []bool, width int, bg lipgloss.TerminalColor) string {
if len(statuses) == 0 {
return withBg(m.st.subtleStyle, bg).Render(strings.Repeat("·", width))
}
@@ -143,7 +155,7 @@ func resolveSparklineIndex(x, sparkWidth, dataLen int) int {
return offset + (x - padding)
}
func (m Model) groupSparkline(groupID int, width int, bg lipgloss.Color) string {
func (m Model) groupSparkline(groupID int, width int, bg lipgloss.TerminalColor) string {
allSites := m.engine.GetAllSites()
var childStatuses [][]bool
for _, s := range allSites {
+19 -17
View File
@@ -5,10 +5,12 @@ import (
"testing"
"time"
"unicode/utf8"
"github.com/charmbracelet/lipgloss"
)
func TestLatencySparkline_Empty(t *testing.T) {
got := styledModel.latencySparkline(nil, nil, 10, "")
got := styledModel.latencySparkline(nil, nil, 10, nil)
if !strings.Contains(got, "··········") {
t.Errorf("empty sparkline should be dots, got %q", got)
}
@@ -17,7 +19,7 @@ func TestLatencySparkline_Empty(t *testing.T) {
func TestLatencySparkline_SingleValue(t *testing.T) {
latencies := []time.Duration{100 * time.Millisecond}
statuses := []bool{true}
got := styledModel.latencySparkline(latencies, statuses, 5, "")
got := styledModel.latencySparkline(latencies, statuses, 5, nil)
if len(got) == 0 {
t.Error("sparkline should not be empty")
}
@@ -33,7 +35,7 @@ func TestLatencySparkline_WidthTruncation(t *testing.T) {
latencies[i] = time.Duration(i*50) * time.Millisecond
statuses[i] = true
}
got := styledModel.latencySparkline(latencies, statuses, 5, "")
got := styledModel.latencySparkline(latencies, statuses, 5, nil)
if len(got) == 0 {
t.Error("sparkline should not be empty")
}
@@ -45,7 +47,7 @@ func TestLatencySparkline_WidthTruncation(t *testing.T) {
func TestLatencySparkline_RelativeHeight(t *testing.T) {
latencies := []time.Duration{10 * time.Millisecond, 50 * time.Millisecond, 10 * time.Millisecond}
statuses := []bool{true, true, true}
out := stripANSI(styledModel.latencySparkline(latencies, statuses, 3, ""))
out := stripANSI(styledModel.latencySparkline(latencies, statuses, 3, nil))
runes := []rune(out)
if len(runes) < 3 {
t.Fatalf("expected 3 runes, got %d", len(runes))
@@ -57,14 +59,14 @@ func TestLatencySparkline_RelativeHeight(t *testing.T) {
func TestLatencyStyle_BandsProduceDifferentColors(t *testing.T) {
st := newStyles(themeFlexokiDark)
st.sparkSuccess = "#00ff00"
st.sparkWarning = "#ffff00"
st.sparkDanger = "#ff0000"
st.sparkSuccess = lipgloss.Color("#00ff00")
st.sparkWarning = lipgloss.Color("#ffff00")
st.sparkDanger = lipgloss.Color("#ff0000")
m := Model{st: st}
green := m.latencyStyle(50, "")
yellow := m.latencyStyle(300, "")
red := m.latencyStyle(800, "")
green := m.latencyStyle(50, nil)
yellow := m.latencyStyle(300, nil)
red := m.latencyStyle(800, nil)
gfg := green.GetForeground()
yfg := yellow.GetForeground()
@@ -77,11 +79,11 @@ func TestLatencyStyle_BandsProduceDifferentColors(t *testing.T) {
func TestLatencyStyle_BrightnessVariesWithinBand(t *testing.T) {
st := newStyles(themeFlexokiDark)
st.sparkSuccess = "#00ff00"
st.sparkSuccess = lipgloss.Color("#00ff00")
m := Model{st: st}
dim := m.latencyStyle(10, "")
bright := m.latencyStyle(190, "")
dim := m.latencyStyle(10, nil)
bright := m.latencyStyle(190, nil)
if dim.GetForeground() == bright.GetForeground() {
t.Error("10ms and 190ms should have different brightness within green band")
@@ -91,7 +93,7 @@ func TestLatencyStyle_BrightnessVariesWithinBand(t *testing.T) {
func TestLatencySparkline_OutputWidth(t *testing.T) {
latencies := []time.Duration{100 * time.Millisecond, 200 * time.Millisecond, 300 * time.Millisecond}
statuses := []bool{true, true, true}
got := styledModel.latencySparkline(latencies, statuses, 5, "")
got := styledModel.latencySparkline(latencies, statuses, 5, nil)
count := utf8.RuneCountInString(stripANSI(got))
if count != 5 {
t.Errorf("expected 5 rune-width output, got %d from %q", count, got)
@@ -116,7 +118,7 @@ func stripANSI(s string) string {
}
func TestHeartbeatSparkline_Empty(t *testing.T) {
got := styledModel.heartbeatSparkline(nil, 10, "")
got := styledModel.heartbeatSparkline(nil, 10, nil)
if !strings.Contains(got, "··········") {
t.Errorf("empty heartbeat should be dots, got %q", got)
}
@@ -124,7 +126,7 @@ func TestHeartbeatSparkline_Empty(t *testing.T) {
func TestHeartbeatSparkline_Mixed(t *testing.T) {
statuses := []bool{true, false, true, true, false}
got := styledModel.heartbeatSparkline(statuses, 5, "")
got := styledModel.heartbeatSparkline(statuses, 5, nil)
if len(got) == 0 {
t.Error("heartbeat sparkline should not be empty")
}
@@ -132,7 +134,7 @@ func TestHeartbeatSparkline_Mixed(t *testing.T) {
func TestHeartbeatSparkline_PaddedWidth(t *testing.T) {
statuses := []bool{true, true}
got := styledModel.heartbeatSparkline(statuses, 5, "")
got := styledModel.heartbeatSparkline(statuses, 5, nil)
if !strings.Contains(got, "···") {
t.Errorf("should have dot padding for width > data, got %q", got)
}
+3 -9
View File
@@ -75,10 +75,7 @@ func fmtAlertType(t string) string {
}
}
func (m Model) fmtAlertConfig(alert struct {
Type string
Settings map[string]string
}) string {
func (m Model) fmtAlertConfig(alert models.AlertConfig) string {
switch alert.Type {
case "email":
host := alert.Settings["host"]
@@ -201,10 +198,7 @@ func (m Model) viewAlertsTab() string {
m.fmtAlertHealth(h),
m.zones.Mark(fmt.Sprintf("alert-%d", i), limitStr(a.Name, nameW-2)),
fmtAlertType(a.Type),
limitStr(m.fmtAlertConfig(struct {
Type string
Settings map[string]string
}{a.Type, a.Settings}), cfgW-2),
limitStr(m.fmtAlertConfig(a), cfgW-2),
m.fmtAlertLastSent(h),
})
}
@@ -271,7 +265,7 @@ func (m Model) viewAlertDetailPanel() string {
}
b.WriteString(m.divider() + "\n")
b.WriteString(m.st.subtleStyle.Render(" [i/Esc] Back [e] Edit [t] Test [q] Quit"))
b.WriteString(m.st.subtleStyle.Render(" [q/Esc] Back [e] Edit [t] Test"))
return lipgloss.NewStyle().Padding(1, 2).Render(b.String())
}
+2 -8
View File
@@ -44,10 +44,7 @@ func TestAlertDetailPanel_MasksSecretsStableOrder(t *testing.T) {
func TestFmtAlertConfig_MasksSecrets(t *testing.T) {
m := newTestModel(&tuiMockStore{})
webhook := m.fmtAlertConfig(struct {
Type string
Settings map[string]string
}{"discord", map[string]string{"url": "https://discord.com/api/webhooks/123456/SeCrEtToKeN"}})
webhook := m.fmtAlertConfig(models.AlertConfig{Type: "discord", Settings: map[string]string{"url": "https://discord.com/api/webhooks/123456/SeCrEtToKeN"}})
if strings.Contains(webhook, "SeCrEtToKeN") || strings.Contains(webhook, "123456") {
t.Errorf("webhook URL path (the credential) rendered in table: %q", webhook)
}
@@ -55,10 +52,7 @@ func TestFmtAlertConfig_MasksSecrets(t *testing.T) {
t.Errorf("webhook host missing from table config: %q", webhook)
}
pd := m.fmtAlertConfig(struct {
Type string
Settings map[string]string
}{"pagerduty", map[string]string{"routing_key": "R0123456789ABCDEFGHIJ"}})
pd := m.fmtAlertConfig(models.AlertConfig{Type: "pagerduty", Settings: map[string]string{"routing_key": "R0123456789ABCDEFGHIJ"}})
if strings.Contains(pd, "R0123456789ABCDEFGHIJ") {
t.Errorf("pagerduty routing key rendered raw in table: %q", pd)
}
+18 -12
View File
@@ -82,18 +82,15 @@ func (m Model) renderLogLine(line string) string {
return fmt.Sprintf(" %s %s", tag, msg)
}
func (m Model) viewLogsTab() string {
content := m.logViewport.View()
if strings.TrimSpace(content) == "" || content == "Waiting for logs..." {
return m.emptyState("No log entries yet.", "Logs appear as monitors run checks")
}
lines := strings.Split(content, "\n")
// refreshLogContent rebuilds the log viewport from the full engine log list,
// filtering before windowing so the entry count and "(n hidden)" reflect all
// logs, not just the visible viewport slice.
func (m *Model) refreshLogContent() {
var rendered []string
total := 0
shown := 0
for _, line := range lines {
for _, line := range m.engine.GetLogs() {
if strings.TrimSpace(line) == "" {
continue
}
@@ -106,18 +103,27 @@ func (m Model) viewLogsTab() string {
rendered = append(rendered, m.renderLogLine(line))
}
m.logTotal = total
m.logShown = shown
m.logViewport.SetContent(strings.Join(rendered, "\n"))
}
func (m Model) viewLogsTab() string {
if m.logTotal == 0 {
return m.emptyState("No log entries yet.", "Logs appear as monitors run checks")
}
filterLabel := "All"
if m.logFilterImportant {
filterLabel = "Important"
}
header := m.st.subtleStyle.Render(fmt.Sprintf(
" %d entries Filter: %s", shown, filterLabel))
" %d entries Filter: %s", m.logShown, filterLabel))
if m.logFilterImportant && shown < total {
header += m.st.subtleStyle.Render(fmt.Sprintf(" (%d hidden)", total-shown))
if m.logFilterImportant && m.logShown < m.logTotal {
header += m.st.subtleStyle.Render(fmt.Sprintf(" (%d hidden)", m.logTotal-m.logShown))
}
m.logViewport.SetContent(strings.Join(rendered, "\n"))
return "\n" + header + "\n\n" + m.logViewport.View()
}
+168 -146
View File
@@ -204,7 +204,7 @@ func (m Model) viewSitesTab() string {
for i := start; i < end; i++ {
site := m.sites[i]
rowIdx := i - start
var rowBg lipgloss.Color
var rowBg lipgloss.TerminalColor
if i == m.cursor {
rowBg = m.theme.SelectedBg
} else if rowIdx%2 == 1 {
@@ -326,101 +326,104 @@ func (m *Model) initSiteHuhForm() tea.Cmd {
}
}
// m.alerts is the tab-data cache (≤5s stale) — no store IO in Update.
alertOpts := []huh.Option[string]{huh.NewOption("None", "0")}
return m.rebuildSiteForm()
}
func (m *Model) rebuildSiteForm() tea.Cmd {
groups := m.buildSiteFormGroups()
m.huhForm = huh.NewForm(groups...).WithTheme(m.theme.HuhTheme())
if m.termWidth > 0 {
m.huhForm.WithWidth(m.termWidth)
}
formHeight := m.termHeight - 7
if formHeight < 5 {
formHeight = 5
}
m.huhForm.WithHeight(formHeight)
m.lastSiteType = m.siteFormData.SiteType
return m.huhForm.Init()
}
func (m *Model) siteFormOptions() (alertOpts, groupOpts []huh.Option[string]) {
alertOpts = []huh.Option[string]{huh.NewOption("None", "0")}
for _, a := range m.alerts {
alertOpts = append(alertOpts, huh.NewOption(
fmt.Sprintf("%s (%s)", a.Name, a.Type),
strconv.Itoa(a.ID),
))
}
groupOpts := []huh.Option[string]{huh.NewOption("None", "0")}
groupOpts = []huh.Option[string]{huh.NewOption("None", "0")}
for _, s := range m.sites {
if s.Type == "group" && s.ID != m.editID {
groupOpts = append(groupOpts, huh.NewOption(s.Name, strconv.Itoa(s.ID)))
}
}
return
}
m.huhForm = huh.NewForm(
huh.NewGroup(
huh.NewInput().Title("Monitor Name").
Placeholder("My Service").
Value(&m.siteFormData.Name).
Validate(func(s string) error {
if s == "" {
return fmt.Errorf("name is required")
}
return nil
}),
huh.NewSelect[string]().Title("Monitor Type").
Options(
huh.NewOption("HTTP/HTTPS", "http"),
huh.NewOption("Push / Heartbeat", "push"),
huh.NewOption("Ping (ICMP)", "ping"),
huh.NewOption("TCP Port", "port"),
huh.NewOption("DNS", "dns"),
huh.NewOption("Group", "group"),
).Value(&m.siteFormData.SiteType),
huh.NewSelect[string]().Title("Alert Channel").
Options(alertOpts...).
Value(&m.siteFormData.AlertID),
).Title("Monitor Settings"),
huh.NewGroup(
huh.NewInput().Title("URL").
Placeholder("https://example.com").
Description("Required for HTTP monitors").
Value(&m.siteFormData.URL).
Validate(func(s string) error {
if m.siteFormData.SiteType != "http" {
return nil
}
if s == "" {
return fmt.Errorf("URL is required for HTTP monitors")
}
u, err := url.Parse(s)
if err != nil {
return fmt.Errorf("invalid URL")
}
if u.Scheme != "http" && u.Scheme != "https" {
return fmt.Errorf("URL must start with http:// or https://")
}
if u.Host == "" {
return fmt.Errorf("URL must include a host")
}
return nil
}),
huh.NewInput().Title("Check Interval (seconds)").
Placeholder("60").
Value(&m.siteFormData.Interval).
Validate(func(s string) error {
if m.siteFormData.SiteType == "group" {
return nil
}
v, err := strconv.Atoi(s)
if err != nil {
return fmt.Errorf("must be a number")
}
if v < 5 {
return fmt.Errorf("minimum interval is 5 seconds")
}
return nil
}),
huh.NewSelect[string]().Title("Parent Group").
Options(groupOpts...).
Value(&m.siteFormData.GroupID),
func (m *Model) buildSiteFormGroups() []*huh.Group {
d := m.siteFormData
alertOpts, groupOpts := m.siteFormOptions()
// Page 1 — Monitor Setup: core fields + type-specific target
setup := []huh.Field{
huh.NewInput().Title("Monitor Name").
Placeholder("My Service").
Value(&d.Name).
Validate(func(s string) error {
if s == "" {
return fmt.Errorf("name is required")
}
return nil
}),
huh.NewSelect[string]().Title("Monitor Type").
Options(
huh.NewOption("HTTP/HTTPS", "http"),
huh.NewOption("Push / Heartbeat", "push"),
huh.NewOption("Ping (ICMP)", "ping"),
huh.NewOption("TCP Port", "port"),
huh.NewOption("DNS", "dns"),
huh.NewOption("Group", "group"),
).Value(&d.SiteType),
huh.NewSelect[string]().Title("Alert Channel").
Options(alertOpts...).
Value(&d.AlertID),
}
switch d.SiteType {
case "http":
setup = append(setup, huh.NewInput().Title("URL").
Placeholder("https://example.com").
Value(&d.URL).
Validate(func(s string) error {
if s == "" {
return fmt.Errorf("URL is required")
}
u, err := url.Parse(s)
if err != nil {
return fmt.Errorf("invalid URL")
}
if u.Scheme != "http" && u.Scheme != "https" {
return fmt.Errorf("URL must start with http:// or https://")
}
if u.Host == "" {
return fmt.Errorf("URL must include a host")
}
return nil
}))
case "ping", "dns":
setup = append(setup, huh.NewInput().Title("Hostname / IP").
Placeholder("10.0.0.1").
Value(&d.Hostname))
case "port":
setup = append(setup,
huh.NewInput().Title("Hostname / IP").
Placeholder("10.0.0.1").
Description("Target for ping/port/DNS monitors").
Value(&m.siteFormData.Hostname),
Value(&d.Hostname),
huh.NewInput().Title("Port").
Placeholder("0").
Description("Target port for TCP port monitors").
Value(&m.siteFormData.Port).
Placeholder("443").
Value(&d.Port).
Validate(func(s string) error {
if m.siteFormData.SiteType != "port" {
return nil
}
v, err := strconv.Atoi(s)
if err != nil {
return fmt.Errorf("must be a number")
@@ -429,34 +432,20 @@ func (m *Model) initSiteHuhForm() tea.Cmd {
return fmt.Errorf("port must be 1-65535")
}
return nil
}),
huh.NewInput().Title("Timeout (seconds)").
Placeholder("5").
Value(&m.siteFormData.Timeout).
Validate(func(s string) error {
if m.siteFormData.SiteType == "group" {
return nil
}
v, err := strconv.Atoi(s)
if err != nil {
return fmt.Errorf("must be a number")
}
if v < 1 || v > 300 {
return fmt.Errorf("timeout must be 1-300 seconds")
}
return nil
}),
huh.NewInput().Title("Description").
Placeholder("Optional description").
Value(&m.siteFormData.Description),
huh.NewInput().Title("Probe Regions").
Placeholder("us-east, eu-west (empty = all)").
Description("Comma-separated regions for distributed probing").
Value(&m.siteFormData.Regions),
).Title("Connection").WithHideFunc(func() bool {
return m.siteFormData.SiteType == "group"
}),
huh.NewGroup(
}))
}
groups := []*huh.Group{huh.NewGroup(setup...).Title("Monitor Setup")}
if d.SiteType == "group" {
return groups
}
// Page 2 — Configuration: type-specific options + shared defaults
var config []huh.Field
if d.SiteType == "http" {
config = append(config,
huh.NewSelect[string]().Title("HTTP Method").
Options(
huh.NewOption("GET", "GET"),
@@ -466,22 +455,75 @@ func (m *Model) initSiteHuhForm() tea.Cmd {
huh.NewOption("DELETE", "DELETE"),
huh.NewOption("HEAD", "HEAD"),
huh.NewOption("OPTIONS", "OPTIONS"),
).Value(&m.siteFormData.Method),
).Value(&d.Method),
huh.NewInput().Title("Accepted Status Codes").
Placeholder("200-299").
Description("Ranges (200-299) and singles (301) separated by commas").
Value(&m.siteFormData.AcceptedCodes),
).Title("HTTP Settings").WithHideFunc(func() bool {
return m.siteFormData.SiteType != "http"
}),
huh.NewGroup(
Value(&d.AcceptedCodes),
)
}
config = append(config,
huh.NewInput().Title("Check Interval (seconds)").
Placeholder("60").
Value(&d.Interval).
Validate(func(s string) error {
v, err := strconv.Atoi(s)
if err != nil {
return fmt.Errorf("must be a number")
}
if v < 5 {
return fmt.Errorf("minimum interval is 5 seconds")
}
return nil
}),
huh.NewInput().Title("Timeout (seconds)").
Placeholder("5").
Value(&d.Timeout).
Validate(func(s string) error {
v, err := strconv.Atoi(s)
if err != nil {
return fmt.Errorf("must be a number")
}
if v < 1 || v > 300 {
return fmt.Errorf("timeout must be 1-300 seconds")
}
return nil
}),
huh.NewInput().Title("Max Retries Before Alert").
Placeholder("0").
Value(&d.Retries).
Validate(func(s string) error {
v, err := strconv.Atoi(s)
if err != nil {
return fmt.Errorf("must be a number")
}
if v < 0 {
return fmt.Errorf("retries cannot be negative")
}
return nil
}),
huh.NewSelect[string]().Title("Parent Group").
Options(groupOpts...).
Value(&d.GroupID),
huh.NewInput().Title("Description").
Placeholder("Optional description").
Value(&d.Description),
huh.NewInput().Title("Probe Regions").
Placeholder("us-east, eu-west (empty = all)").
Description("Comma-separated regions for distributed probing").
Value(&d.Regions),
)
if d.SiteType == "http" {
config = append(config,
huh.NewConfirm().Title("Monitor SSL Certificate?").
Value(&m.siteFormData.CheckSSL),
Value(&d.CheckSSL),
huh.NewInput().Title("SSL Warning Threshold (days)").
Placeholder("7").
Value(&m.siteFormData.Threshold).
Value(&d.Threshold).
Validate(func(s string) error {
if !m.siteFormData.CheckSSL {
if !d.CheckSSL {
return nil
}
v, err := strconv.Atoi(s)
@@ -493,30 +535,13 @@ func (m *Model) initSiteHuhForm() tea.Cmd {
}
return nil
}),
huh.NewInput().Title("Max Retries Before Alert").
Placeholder("0").
Value(&m.siteFormData.Retries).
Validate(func(s string) error {
if m.siteFormData.SiteType == "group" {
return nil
}
v, err := strconv.Atoi(s)
if err != nil {
return fmt.Errorf("must be a number")
}
if v < 0 {
return fmt.Errorf("retries cannot be negative")
}
return nil
}),
huh.NewConfirm().Title("Ignore TLS Errors?").
Value(&m.siteFormData.IgnoreTLS),
).Title("Advanced").WithHideFunc(func() bool {
return m.siteFormData.SiteType == "group"
}),
).WithTheme(m.theme.HuhTheme())
Value(&d.IgnoreTLS),
)
}
return m.huhForm.Init()
groups = append(groups, huh.NewGroup(config...).Title("Configuration"))
return groups
}
func (m *Model) submitSiteForm() tea.Cmd {
@@ -535,7 +560,7 @@ func (m *Model) submitSiteForm() tea.Cmd {
threshold = 7
}
site := models.Site{
cfg := models.SiteConfig{
ID: m.editID,
Name: d.Name,
URL: d.URL,
@@ -559,11 +584,8 @@ func (m *Model) submitSiteForm() tea.Cmd {
st := m.store
m.state = stateDashboard
if m.editID > 0 {
// The engine's in-memory config updates immediately; the DB write
// follows in the Cmd. New sites enter the engine via its poll loop
// once the insert lands.
m.engine.UpdateSiteConfig(site)
return writeCmd("Update site", func() error { return st.UpdateSite(context.Background(), site) })
m.engine.UpdateSiteConfig(cfg)
return writeCmd("Update site", func() error { return st.UpdateSite(context.Background(), cfg) })
}
return writeCmd("Add site", func() error { return st.AddSite(context.Background(), site) })
return writeCmd("Add site", func() error { return st.AddSite(context.Background(), cfg) })
}
+110 -102
View File
@@ -5,35 +5,43 @@ import (
"github.com/charmbracelet/lipgloss"
)
func cc(hex, ansi string) lipgloss.CompleteColor {
return lipgloss.CompleteColor{
TrueColor: hex,
ANSI256: hex,
ANSI: ansi,
}
}
type Theme struct {
Name string
// Base layers
Bg lipgloss.Color
Surface lipgloss.Color
Panel lipgloss.Color
Border lipgloss.Color
Bg lipgloss.TerminalColor
Surface lipgloss.TerminalColor
Panel lipgloss.TerminalColor
Border lipgloss.TerminalColor
// Text
Fg lipgloss.Color
Muted lipgloss.Color
Subtle lipgloss.Color
Fg lipgloss.TerminalColor
Muted lipgloss.TerminalColor
Subtle lipgloss.TerminalColor
// Semantic
Success lipgloss.Color
Warning lipgloss.Color
Stale lipgloss.Color
Danger lipgloss.Color
Info lipgloss.Color
Accent lipgloss.Color
Purple lipgloss.Color
Success lipgloss.TerminalColor
Warning lipgloss.TerminalColor
Stale lipgloss.TerminalColor
Danger lipgloss.TerminalColor
Info lipgloss.TerminalColor
Accent lipgloss.TerminalColor
Purple lipgloss.TerminalColor
// Table
ZebraBg lipgloss.Color
ZebraBg lipgloss.TerminalColor
// Selection
SelectedFg lipgloss.Color
SelectedBg lipgloss.Color
SelectedFg lipgloss.TerminalColor
SelectedBg lipgloss.TerminalColor
}
var themes = []Theme{
@@ -46,107 +54,107 @@ var themes = []Theme{
var themeFlexokiDark = Theme{
Name: "Flexoki Dark",
Bg: "#1C1B1A",
Surface: "#282726",
Panel: "#343331",
Border: "#575653",
Fg: "#CECDC3",
Muted: "#878580",
Subtle: "#6F6E69",
Success: "#879A39",
Warning: "#D0A215",
Stale: "#DA702C",
Danger: "#D14D41",
Info: "#4385BE",
Accent: "#3AA99F",
Purple: "#8B7EC8",
ZebraBg: "#222120",
SelectedFg: "#FFFCF0",
SelectedBg: "#403E3C",
Bg: cc("#1C1B1A", ""),
Surface: cc("#282726", ""),
Panel: cc("#343331", ""),
Border: cc("#575653", "8"),
Fg: cc("#CECDC3", "15"),
Muted: cc("#878580", "7"),
Subtle: cc("#6F6E69", "7"),
Success: cc("#879A39", "10"),
Warning: cc("#D0A215", "11"),
Stale: cc("#DA702C", "3"),
Danger: cc("#D14D41", "9"),
Info: cc("#4385BE", "12"),
Accent: cc("#3AA99F", "14"),
Purple: cc("#8B7EC8", "13"),
ZebraBg: cc("#222120", ""),
SelectedFg: cc("#FFFCF0", "15"),
SelectedBg: cc("#403E3C", "4"),
}
var themeTokyoNight = Theme{
Name: "Tokyo Night",
Bg: "#1a1b26",
Surface: "#24283b",
Panel: "#292e42",
Border: "#3b4261",
Fg: "#c0caf5",
Muted: "#a9b1d6",
Subtle: "#565f89",
Success: "#9ece6a",
Warning: "#e0af68",
Stale: "#ff9e64",
Danger: "#f7768e",
Info: "#7aa2f7",
Accent: "#7dcfff",
Purple: "#bb9af7",
ZebraBg: "#1c1d28",
SelectedFg: "#c0caf5",
SelectedBg: "#292e42",
Bg: cc("#1a1b26", ""),
Surface: cc("#24283b", ""),
Panel: cc("#292e42", ""),
Border: cc("#3b4261", "8"),
Fg: cc("#c0caf5", "15"),
Muted: cc("#a9b1d6", "7"),
Subtle: cc("#565f89", "7"),
Success: cc("#9ece6a", "10"),
Warning: cc("#e0af68", "11"),
Stale: cc("#ff9e64", "3"),
Danger: cc("#f7768e", "9"),
Info: cc("#7aa2f7", "12"),
Accent: cc("#7dcfff", "14"),
Purple: cc("#bb9af7", "13"),
ZebraBg: cc("#1c1d28", ""),
SelectedFg: cc("#c0caf5", "15"),
SelectedBg: cc("#292e42", "4"),
}
var themeGruvbox = Theme{
Name: "Gruvbox",
Bg: "#282828",
Surface: "#3c3836",
Panel: "#504945",
Border: "#665c54",
Fg: "#ebdbb2",
Muted: "#bdae93",
Subtle: "#7c6f64",
Success: "#b8bb26",
Warning: "#fabd2f",
Stale: "#fe8019",
Danger: "#fb4934",
Info: "#83a598",
Accent: "#8ec07c",
Purple: "#d3869b",
ZebraBg: "#2a2a2a",
SelectedFg: "#fbf1c7",
SelectedBg: "#504945",
Bg: cc("#282828", ""),
Surface: cc("#3c3836", ""),
Panel: cc("#504945", ""),
Border: cc("#665c54", "8"),
Fg: cc("#ebdbb2", "15"),
Muted: cc("#bdae93", "7"),
Subtle: cc("#7c6f64", "7"),
Success: cc("#b8bb26", "10"),
Warning: cc("#fabd2f", "11"),
Stale: cc("#fe8019", "3"),
Danger: cc("#fb4934", "9"),
Info: cc("#83a598", "12"),
Accent: cc("#8ec07c", "14"),
Purple: cc("#d3869b", "13"),
ZebraBg: cc("#2a2a2a", ""),
SelectedFg: cc("#fbf1c7", "15"),
SelectedBg: cc("#504945", "4"),
}
var themeCatppuccinMocha = Theme{
Name: "Catppuccin Mocha",
Bg: "#1e1e2e",
Surface: "#313244",
Panel: "#45475a",
Border: "#585b70",
Fg: "#cdd6f4",
Muted: "#a6adc8",
Subtle: "#6c7086",
Success: "#a6e3a1",
Warning: "#f9e2af",
Stale: "#fab387",
Danger: "#f38ba8",
Info: "#89b4fa",
Accent: "#94e2d5",
Purple: "#cba6f7",
ZebraBg: "#232334",
SelectedFg: "#cdd6f4",
SelectedBg: "#45475a",
Bg: cc("#1e1e2e", ""),
Surface: cc("#313244", ""),
Panel: cc("#45475a", ""),
Border: cc("#585b70", "8"),
Fg: cc("#cdd6f4", "15"),
Muted: cc("#a6adc8", "7"),
Subtle: cc("#6c7086", "7"),
Success: cc("#a6e3a1", "10"),
Warning: cc("#f9e2af", "11"),
Stale: cc("#fab387", "3"),
Danger: cc("#f38ba8", "9"),
Info: cc("#89b4fa", "12"),
Accent: cc("#94e2d5", "14"),
Purple: cc("#cba6f7", "13"),
ZebraBg: cc("#232334", ""),
SelectedFg: cc("#cdd6f4", "15"),
SelectedBg: cc("#45475a", "4"),
}
var themeNord = Theme{
Name: "Nord",
Bg: "#2e3440",
Surface: "#3b4252",
Panel: "#434c5e",
Border: "#4c566a",
Fg: "#d8dee9",
Muted: "#d8dee9",
Subtle: "#4c566a",
Success: "#a3be8c",
Warning: "#ebcb8b",
Stale: "#d08770",
Danger: "#bf616a",
Info: "#81a1c1",
Accent: "#88c0d0",
Purple: "#b48ead",
ZebraBg: "#323845",
SelectedFg: "#eceff4",
SelectedBg: "#434c5e",
Bg: cc("#2e3440", ""),
Surface: cc("#3b4252", ""),
Panel: cc("#434c5e", ""),
Border: cc("#4c566a", "8"),
Fg: cc("#d8dee9", "15"),
Muted: cc("#d8dee9", "7"),
Subtle: cc("#4c566a", "7"),
Success: cc("#a3be8c", "10"),
Warning: cc("#ebcb8b", "11"),
Stale: cc("#d08770", "3"),
Danger: cc("#bf616a", "9"),
Info: cc("#81a1c1", "12"),
Accent: cc("#88c0d0", "14"),
Purple: cc("#b48ead", "13"),
ZebraBg: cc("#323845", ""),
SelectedFg: cc("#eceff4", "15"),
SelectedBg: cc("#434c5e", "4"),
}
func (t Theme) HuhTheme() *huh.Theme {
+18 -12
View File
@@ -30,9 +30,9 @@ type styles struct {
activeTab lipgloss.Style
inactiveTab lipgloss.Style
sparkSuccess string
sparkWarning string
sparkDanger string
sparkSuccess lipgloss.TerminalColor
sparkWarning lipgloss.TerminalColor
sparkDanger lipgloss.TerminalColor
tableHeaderStyle lipgloss.Style
tableCellStyle lipgloss.Style
@@ -46,23 +46,23 @@ type styles struct {
func newStyles(t Theme) *styles {
return &styles{
subtleStyle: lipgloss.NewStyle().Foreground(t.Subtle),
subtleStyle: lipgloss.NewStyle().Foreground(t.Subtle).Faint(true),
specialStyle: lipgloss.NewStyle().Foreground(t.Success),
warnStyle: lipgloss.NewStyle().Foreground(t.Warning),
staleStyle: lipgloss.NewStyle().Foreground(t.Stale),
dangerStyle: lipgloss.NewStyle().Foreground(t.Danger),
warnStyle: lipgloss.NewStyle().Foreground(t.Warning).Bold(true),
staleStyle: lipgloss.NewStyle().Foreground(t.Stale).Faint(true),
dangerStyle: lipgloss.NewStyle().Foreground(t.Danger).Bold(true),
titleStyle: lipgloss.NewStyle().Foreground(t.Accent).Bold(true),
activeTab: lipgloss.NewStyle().Background(t.Surface).Foreground(t.Accent).Bold(true).Padding(0, 1),
inactiveTab: lipgloss.NewStyle().Padding(0, 1).Foreground(t.Muted),
inactiveTab: lipgloss.NewStyle().Padding(0, 1).Foreground(t.Muted).Faint(true),
sparkSuccess: string(t.Success),
sparkWarning: string(t.Warning),
sparkDanger: string(t.Danger),
sparkSuccess: t.Success,
sparkWarning: t.Warning,
sparkDanger: t.Danger,
tableHeaderStyle: lipgloss.NewStyle().Foreground(t.Accent).Bold(true).Padding(0, 1),
tableCellStyle: lipgloss.NewStyle().Padding(0, 1),
tableSelectedStyle: lipgloss.NewStyle().Padding(0, 1).Bold(true).Foreground(t.SelectedFg).Background(t.SelectedBg),
tableBorderStyle: lipgloss.NewStyle().Foreground(t.Border),
tableBorderStyle: lipgloss.NewStyle().Foreground(t.Border).Faint(true),
tableZebraStyle: lipgloss.NewStyle().Padding(0, 1).Background(t.ZebraBg),
siteGroupStyle: lipgloss.NewStyle().Padding(0, 1).Bold(true).Foreground(t.Accent),
@@ -80,6 +80,8 @@ const (
chromeFooter = 2 // footer: "\n" prefix + text line
chromeTable = 3 // renderTable "\n" prefix + top border + header + bottom border (lipgloss collapses two into three rendered lines)
chromeBase = chromePadV + chromeHeader + chromeGaps + chromeFooter + chromeTable
detailSparkWidth = 40
)
type sessionState int
@@ -103,6 +105,7 @@ type Model struct {
state sessionState
currentTab int
cursor int
selectedID int
tableOffset int
maxTableRows int
termWidth int
@@ -112,12 +115,15 @@ type Model struct {
huhForm *huh.Form
siteFormData *siteFormData
lastSiteType string
alertFormData *alertFormData
userFormData *userFormData
maintFormData *maintFormData
logViewport viewport.Model
logFilterImportant bool
logTotal int
logShown int
historyViewport viewport.Model
historyChanges []models.StateChange
+40 -30
View File
@@ -110,10 +110,6 @@ func (m *Model) handleConfirmDelete(msg tea.Msg) (tea.Model, tea.Cmd) {
}
func (m *Model) handleFormMsg(msg tea.Msg) (tea.Model, tea.Cmd) {
if wsm, ok := msg.(tea.WindowSizeMsg); ok {
m.termWidth = wsm.Width
m.termHeight = wsm.Height
}
if keyMsg, ok := msg.(tea.KeyMsg); ok {
if keyMsg.String() == "ctrl+c" {
return m, tea.Quit
@@ -132,6 +128,13 @@ func (m *Model) handleFormMsg(msg tea.Msg) (tea.Model, tea.Cmd) {
if f, ok := form.(*huh.Form); ok {
m.huhForm = f
}
if m.state == stateFormSite && m.siteFormData != nil &&
m.siteFormData.SiteType != m.lastSiteType {
rebuildCmd := m.rebuildSiteForm()
// Advance to Type select — user just changed it.
skipName := m.huhForm.NextField()
return m, tea.Batch(rebuildCmd, skipName)
}
if m.huhForm.State == huh.StateCompleted {
// The store write runs in the returned Cmd; its writeDoneMsg
// triggers the tab-data reload once the row actually exists.
@@ -145,24 +148,35 @@ func (m *Model) handleFormMsg(msg tea.Msg) (tea.Model, tea.Cmd) {
return m, nil
}
func (m *Model) handleResize(msg tea.WindowSizeMsg) (tea.Model, tea.Cmd) {
m.termWidth = msg.Width
m.termHeight = msg.Height
func (m *Model) recalcLayout() {
chrome := chromeBase
if m.filterMode || m.filterText != "" {
chrome++
}
m.maxTableRows = msg.Height - chrome
m.maxTableRows = m.termHeight - chrome
if m.maxTableRows < 1 {
m.maxTableRows = 1
}
}
func (m *Model) handleResize(msg tea.WindowSizeMsg) (tea.Model, tea.Cmd) {
m.termWidth = msg.Width
m.termHeight = msg.Height
m.recalcLayout()
m.logViewport.Width = msg.Width - chromePadH
m.logViewport.Height = msg.Height - (chromePadV + chromeHeader + chromeFooter + 2)
m.historyViewport.Width = msg.Width - chromePadH
m.historyViewport.Height = msg.Height - 10
m.slaViewport.Width = msg.Width - chromePadH
m.slaViewport.Height = msg.Height - 16
return m, tea.ClearScreen
if m.huhForm != nil {
formHeight := msg.Height - 7
if formHeight < 5 {
formHeight = 5
}
m.huhForm.WithHeight(formHeight)
}
return m, nil
}
func (m *Model) handleTick(t time.Time) (tea.Model, tea.Cmd) {
@@ -286,6 +300,7 @@ func (m *Model) handleMouse(msg tea.MouseMsg) (tea.Model, tea.Cmd) {
}
}
}
m.syncSelectedID()
return m, nil
}
@@ -323,9 +338,11 @@ func (m *Model) handleFilterKey(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
m.filterText = ""
m.cursor = 0
m.tableOffset = 0
m.recalcLayout()
m.refreshLive()
case "enter":
m.filterMode = false
m.recalcLayout()
case "backspace":
if len(m.filterText) > 0 {
m.filterText = m.filterText[:len(m.filterText)-1]
@@ -336,8 +353,8 @@ func (m *Model) handleFilterKey(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
case "ctrl+c":
return m, tea.Quit
default:
if len(msg.String()) == 1 {
m.filterText += msg.String()
if len(msg.Runes) == 1 {
m.filterText += string(msg.Runes)
m.cursor = 0
m.tableOffset = 0
m.refreshLive()
@@ -379,7 +396,7 @@ func (m *Model) handleDetailKey(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
return m, m.openSLAView(m.sites[m.cursor])
}
case "q":
return m, tea.Quit
m.state = stateDashboard
}
return m, nil
}
@@ -391,16 +408,14 @@ func (m *Model) handleSparklineClick(msg tea.MouseMsg) (tea.Model, tea.Cmd) {
site := m.sites[m.cursor]
hist, _ := m.engine.GetHistory(site.ID)
const sparkWidth = 40
if zi := m.zones.Get("spark-latency"); zi != nil && !zi.IsZero() && zi.InBounds(msg) {
x, _ := zi.Pos(msg)
m.sparkTooltipIdx = resolveSparklineIndex(x, sparkWidth, len(hist.Latencies))
m.sparkTooltipIdx = resolveSparklineIndex(x, detailSparkWidth, len(hist.Latencies))
return m, nil
}
if zi := m.zones.Get("spark-heartbeat"); zi != nil && !zi.IsZero() && zi.InBounds(msg) {
x, _ := zi.Pos(msg)
m.sparkTooltipIdx = resolveSparklineIndex(x, sparkWidth, len(hist.Statuses))
m.sparkTooltipIdx = resolveSparklineIndex(x, detailSparkWidth, len(hist.Statuses))
return m, nil
}
@@ -499,10 +514,8 @@ func (m *Model) handleHistoryKey(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
func (m *Model) handleAlertDetailKey(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
switch msg.String() {
case "i", "esc":
case "q", "i", "esc":
m.state = stateDashboard
case "q":
return m, tea.Quit
}
return m, nil
}
@@ -515,11 +528,13 @@ func (m *Model) handleDashboardKey(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
case "/":
if m.currentTab == 0 {
m.filterMode = true
m.recalcLayout()
return m, nil
}
case "f":
if m.state == stateLogs {
m.logFilterImportant = !m.logFilterImportant
m.refreshLogContent()
return m, nil
}
case "tab":
@@ -537,6 +552,7 @@ func (m *Model) handleDashboardKey(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
if m.cursor < m.tableOffset {
m.tableOffset = m.cursor
}
m.syncSelectedID()
}
case "down", "j":
if m.state == stateLogs {
@@ -548,6 +564,7 @@ func (m *Model) handleDashboardKey(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
if m.cursor >= m.tableOffset+m.maxTableRows {
m.tableOffset++
}
m.syncSelectedID()
}
}
case "n":
@@ -610,7 +627,7 @@ func (m *Model) handleDashboardKey(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
return m, writeCmd("Save theme", func() error {
return st.SetPreference(context.Background(), "theme", name)
})
case "d", "backspace":
case "d":
return m.handleDeleteItem()
}
return m, nil
@@ -717,6 +734,7 @@ func (m *Model) handleClick(msg tea.MouseMsg) (tea.Model, tea.Cmd) {
for i := m.tableOffset; i < end; i++ {
if m.zones.Get(fmt.Sprintf("%s-%d", prefix, i)).InBounds(msg) {
m.cursor = i
m.syncSelectedID()
return m, nil
}
}
@@ -745,16 +763,8 @@ func (m *Model) switchTab(idx int) {
}
}
func (m *Model) adjustCursor(newLen int) {
if m.cursor >= newLen && m.cursor > 0 {
m.cursor--
}
if m.cursor < m.tableOffset {
m.tableOffset = m.cursor
if m.tableOffset < 0 {
m.tableOffset = 0
}
}
func (m *Model) adjustCursor(_ int) {
m.clampCursor()
}
func (m *Model) submitForm() tea.Cmd {
+5 -5
View File
@@ -116,7 +116,7 @@ func (*stubErr) Error() string { return "boom" }
func TestDetailLoad_CachesAndViewDoesNoIO(t *testing.T) {
ms := &tuiMockStore{stateChanges: []models.StateChange{{FromStatus: "UP", ToStatus: "DOWN"}}}
m := newTestModel(ms)
m.sites = []models.Site{{ID: 1, Name: "site", Status: "DOWN"}}
m.sites = []models.Site{{SiteConfig: models.SiteConfig{ID: 1, Name: "site"}, SiteState: models.SiteState{Status: "DOWN"}}}
m.cursor = 0
m.state = stateDetail
m.termWidth = 120
@@ -201,7 +201,7 @@ func TestHandleTabData_DropsStaleSeq(t *testing.T) {
func TestHistoryKey_LoadsOffUIGoroutine(t *testing.T) {
ms := &tuiMockStore{stateChanges: []models.StateChange{{FromStatus: "UP", ToStatus: "DOWN"}}}
m := newTestModel(ms)
m.sites = []models.Site{{ID: 7, Name: "site"}}
m.sites = []models.Site{{SiteConfig: models.SiteConfig{ID: 7, Name: "site"}}}
m.state = stateDetail
m.termWidth, m.termHeight = 120, 40
@@ -240,7 +240,7 @@ func TestHistoryKey_LoadsOffUIGoroutine(t *testing.T) {
func TestSLAData_DropsStaleReply(t *testing.T) {
m := newTestModel(&tuiMockStore{})
m.termWidth, m.termHeight = 120, 40
m.sites = []models.Site{{ID: 3, Status: "UP"}}
m.sites = []models.Site{{SiteConfig: models.SiteConfig{ID: 3}, SiteState: models.SiteState{Status: "UP"}}}
if cmd := (&m).openSLAView(m.sites[0]); cmd == nil {
t.Fatal("openSLAView should return a load Cmd")
@@ -264,7 +264,7 @@ func TestSLAData_DropsStaleReply(t *testing.T) {
func TestConfirmDelete_WritesOffUIGoroutine(t *testing.T) {
ms := &tuiMockStore{}
m := newTestModel(ms)
m.sites = []models.Site{{ID: 4, Name: "s"}}
m.sites = []models.Site{{SiteConfig: models.SiteConfig{ID: 4, Name: "s"}}}
m.state = stateConfirmDelete
m.deleteTab = 0
m.deleteID = 4
@@ -312,7 +312,7 @@ func TestWriteDoneMsg_LogsErrorAndReloads(t *testing.T) {
func TestDetailRefreshCmd_OnlyWhileDetailOpen(t *testing.T) {
ms := &tuiMockStore{stateChanges: []models.StateChange{{FromStatus: "UP", ToStatus: "DOWN"}}}
m := newTestModel(ms)
m.sites = []models.Site{{ID: 5, Name: "site"}}
m.sites = []models.Site{{SiteConfig: models.SiteConfig{ID: 5, Name: "site"}}}
m.state = stateDashboard
if (&m).detailRefreshCmd() != nil {
+1 -6
View File
@@ -85,11 +85,6 @@ func (m Model) View() string {
case stateFormMaint:
title = "New Maintenance Window"
}
formHeight := m.termHeight - 7
if formHeight < 5 {
formHeight = 5
}
m.huhForm.WithHeight(formHeight)
header := m.st.titleStyle.Render(title)
footer := m.st.subtleStyle.Render("\n[Esc] Cancel")
return lipgloss.NewStyle().Padding(1, 2).Render(header + "\n\n" + m.huhForm.View() + "\n" + footer)
@@ -270,7 +265,7 @@ func (m Model) renderFooter(stats dashboardStats) string {
var keys string
switch m.currentTab {
case 0:
keys = "[/]Filter [n]New [e]Edit [i]Info [d]Del [p]Pause [T]Theme [Tab]Switch [q]Quit"
keys = "[/]Filter [n]New [e]Edit [i]Info [d]Del [p]Pause [Space]Collapse [T]Theme [Tab]Switch [q]Quit"
case 1:
keys = "[n]New [e]Edit [i]Info [d]Del [t]Test [T]Theme [Tab]Switch [q]Quit"
case 2:
+12 -6
View File
@@ -2,6 +2,7 @@ package tui
import (
"fmt"
"sort"
"strconv"
"strings"
"time"
@@ -163,8 +164,14 @@ func (m Model) viewDetailPanel() string {
probeResults := m.engine.GetProbeResults(site.ID)
if len(probeResults) > 0 {
nodeIDs := make([]string, 0, len(probeResults))
for id := range probeResults {
nodeIDs = append(nodeIDs, id)
}
sort.Strings(nodeIDs)
b.WriteString("\n" + m.st.subtleStyle.Render(" PROBE RESULTS") + "\n")
for nodeID, result := range probeResults {
for _, nodeID := range nodeIDs {
result := probeResults[nodeID]
status := m.st.specialStyle.Render("UP")
if !result.IsUp {
status = m.st.dangerStyle.Render("DN")
@@ -207,9 +214,8 @@ func (m Model) viewDetailPanel() string {
}
b.WriteString(m.divider() + "\n")
const sparkWidth = 40
if site.Type == "push" {
b.WriteString(" " + m.zones.Mark("spark-heartbeat", m.heartbeatSparkline(hist.Statuses, sparkWidth, "")))
b.WriteString(" " + m.zones.Mark("spark-heartbeat", m.heartbeatSparkline(hist.Statuses, detailSparkWidth, nil)))
if len(hist.Statuses) > 0 {
up := 0
for _, s := range hist.Statuses {
@@ -222,7 +228,7 @@ func (m Model) viewDetailPanel() string {
up, len(hist.Statuses))
}
} else {
b.WriteString(" " + m.zones.Mark("spark-latency", m.latencySparkline(hist.Latencies, hist.Statuses, sparkWidth, "")))
b.WriteString(" " + m.zones.Mark("spark-latency", m.latencySparkline(hist.Latencies, hist.Statuses, detailSparkWidth, nil)))
var minL, maxL, total time.Duration
count := 0
for i, l := range hist.Latencies {
@@ -249,12 +255,12 @@ func (m Model) viewDetailPanel() string {
}
if m.sparkTooltipIdx >= 0 {
b.WriteString("\n" + m.renderSparkTooltip(site, hist, sparkWidth))
b.WriteString("\n" + m.renderSparkTooltip(site, hist, detailSparkWidth))
}
b.WriteString("\n")
b.WriteString(m.divider() + "\n")
b.WriteString(m.st.subtleStyle.Render(" [i/Esc] Back [e] Edit [h] History [s] SLA [click] Inspect [q] Quit"))
b.WriteString(m.st.subtleStyle.Render(" [q/Esc] Back [e] Edit [h] History [s] SLA [click] Inspect"))
return lipgloss.NewStyle().Padding(1, 2).Render(b.String())
}