Replaces the stale CalVer changelog (last updated 2026-06-02, all
referenced tags since deleted) with git-cliff output simulated against
the v0.1.0 tag — matches what release-binaries.yml will publish as
release notes.
The body=".*security" parser ran before the docs/chore skip rules, so
any commit merely mentioning security in its body landed in the
Security section (e.g. a docs commit and the CalVer->SemVer ci commit).
Real security fixes never reached it — ^fix matches first — so the rule
only ever produced miscategorized entries. Removed.
polish(...) commits had no parser, and git-cliff defaults the group to
the raw type — rendering a stray lowercase "polish" section. Mapped to
Changed.
- go run cmd/uptop/main.go broke when PR #108 split config.go out of
main.go — use ./cmd/uptop package path (same fix#104 applied to
Dockerfile and goreleaser)
- binary section claimed Linux amd64 only; goreleaser ships
linux/darwin/windows x amd64/arm64 + deb/rpm since #104
- state canonical repo (Gitea) vs mirror (GitHub) up front
- UPTOP_ADMIN_KEY is the only documented first-run path
go install module@tag compiles without GoReleaser's ldflags, so
--version reported "dev" and the TUI footer matched. The module
version and vcs stamps are embedded in every binary; read them via
debug.ReadBuildInfo when the ldflags defaults are untouched. Release
builds unchanged — ldflags still win.
ignore_tags drops rc-tagged commits from the final tag's section instead
of folding them forward — a simulated v0.1.0 rendered zero commits.
Excluding rc tags from tag_pattern makes finals span back to the last
real tag (full history for v0.1.0, verified 8.8KB in a scratch clone)
and rc tags render [Unreleased] with everything pending.
rc.2 proved the grype gate was decorative — buildx pushed before the
scan ran, so a red run still shipped the image (and rc tags moved
:latest). Build amd64 locally, scan that, then run the multi-arch push
from the warm builder cache. :latest now only moves on non-rc tags.
mirror-release: poll until the Gitea asset count is stable across two
polls (GoReleaser uploads sequentially — assets>0 could mirror a partial
set) and stretch the timeout to 20 min since the release run can queue
behind the Docker job on the single runner.
The existing .grype.yaml ignore listed the wish SCP traversal only by CVE
id; grype's db now matches it as GHSA-xjvp-7243-rg9h and ignores are
exact-id, so the rc.2 scan gate tripped on an already-triaged finding.
List both ids. Vulnerable SCP middleware is never compiled in; real fix
is the charm v2 stack migration (#126).
cliff.toml ignore_tags folds rc tags into the next real release so
v0.1.0's notes cover full history instead of commits-since-rc.2.
Four defects from the rc.1 dress rehearsal:
- Dockerfile pinned golang:1.26-alpine3.23 at a 1.26.3 digest while
go.mod requires 1.26.4; golang images set GOTOOLCHAIN=local, so the
build hard-fails. Pin 1.26.4-alpine3.23 explicitly.
- changelog.disable swallowed --release-notes (the flag is consumed by
the changelog pipe), publishing empty release bodies. Re-enable.
- Remove the Gitea-side GitHub relay step: redundant with
.github/workflows/mirror-release.yml, which runs on GitHub Actions
with the built-in token and copies the canonical Gitea assets.
- mirror-release.yml: jq '.body // empty' treats "" as truthy so the
notes fallback never fired; use select(). Mark rc tags --prerelease.
- Docker compose: ping_group_range sysctl, without which ping monitors
silently report DOWN in containers
- README: data retention table (1000 checks / 5000 state changes per
monitor, 200 logs, pruned automatically), group-alert limitation note
- config-as-code: apply is not atomic + re-run convergence, backup
redaction footgun (/api/backup/export redacts by default), opsgenie
example (provider count was stale at 9), ntfy auth keys
Relay is on hold until after the rc dress rehearsal — without the
secret the step exits cleanly instead of failing the release run.
Adding the secret later enables it with no workflow change.
viewLogsTab filtered logViewport.View() — the visible window — so the
entry count showed the window size and hidden lines reappeared while
scrolling. Filter and render now happen at content-set time from
engine.GetLogs(); the view only reads stored counts.
GoReleaser publishes to exactly one SCM (Gitea); the push mirror carries
refs but not releases, so GitHub Releases — where the README points —
stayed empty. After the Gitea release, wait for the mirrored tag and
create the GitHub release with the same artifacts and notes.
Needs new Gitea secret GH_MIRROR_TOKEN (GitHub PAT with repo scope).
GITHUB_TOKEN is reserved by Gitea Actions, hence the different name.
Last upkeep-era name in the wire protocol. Breaking for mixed-version
clusters, but zero installed base exists pre-v0.1.0 — free now, breaking
forever after first tag.
No leader fencing exists — during a network partition both nodes run
checks and fire alerts independently. Document the behavior honestly:
duplicate alerts, doubled history, ~15s takeover, converges on heal.
smtp.SendMail ignores context entirely — a blackholed SMTP server
hangs the alert goroutine for the OS TCP timeout (minutes), while the
30s context from the engine does nothing.
Replace with sendMailContext: dials with ctx deadline, sets connection
deadlines, handles STARTTLS and AUTH when advertised. Behavioral
parity with smtp.SendMail but cancellation works throughout.
Cluster-secret holder could POST a backup with their own admin key to
/api/backup/import, replacing all users — privilege escalation from
cluster-auth to admin. Also, Kuma imports produced zero users but
ImportWipe unconditionally deleted the users table — locking out all
accounts until restart reseeded UPTOP_ADMIN_KEY.
- Server handlers strip data.Users (set nil) before calling ImportData
- ImportData only wipes+replaces users when data.Users != nil
- New ImportWipeUsers dialect method separates user wipe from data wipe
- CLI restore (main.go) unchanged — full import still replaces users
Pre-check resolved and validated the target IP, then runPingCheck and
runPortCheck re-resolved by hostname — a DNS rebind between the two
lookups could redirect to a private IP, bypassing the SSRF guard.
Resolve once in RunCheck, pin the validated IP, and pass it down:
- runPingCheck: SetIPAddr with pinned IP (skips internal resolve)
- runPortCheck: dial pinned IP literal instead of hostname
HTTP checks are unaffected (SafeDialContext resolves+validates at
dial time). DNS checks validate the server address, not the target.
Go module tooling requires v-prefixed semver tags (go install @latest
ignores CalVer tags entirely), GoReleaser errors on non-semver tags,
and zero-padded CalVer months are invalid semver. Old CalVer tags and
releases were deleted due to pre-release security issues; relaunch
tags as v1.0.0.
- Workflow tag triggers: [0-9]* -> v[0-9]* (Gitea + GitHub relay)
- cliff.toml tag_pattern: regex v[0-9].* (was matching everything --
tag_pattern is regex since git-cliff 1.4, not glob)
- Docker image tags drop the v prefix per registry convention
Bare-metal installs created the DB with process umask (often 022),
making uptop.db, -wal, and -shm world-readable. These files contain
alert credentials and config. Now chmod 0600 after open. Missing
WAL/SHM siblings (not yet created) are silently skipped. Docker
installs were already mitigated by the non-root UID.
1. UpdateSite handles token-read Scan error instead of ignoring it.
sql.ErrNoRows (nonexistent site) passes through; real DB errors
surface.
2. RunCheck allowPrivate changed from variadic to real bool param.
Dead maxRequestBody duplicate removed from sqlstore.go.
3. Footer help bar documents [Space] for group collapse.
4. adjustCursor unified with clampCursor — one clamping path
instead of two with different semantics.
5. Compose cluster/probe example files annotate hardcoded secrets
with "EXAMPLE ONLY — rotate before use".
6. huhForm.WithHeight moved from View() to handleResize — no longer
mutates form state during render.
7. maxTableRows recalculated on filter enter/exit via recalcLayout()
— was only recalculated on resize, causing off-by-one when the
filter bar appeared/disappeared.
1. Rate limiter cleanup goroutine now stoppable via Stop() channel
instead of looping forever. Prevents goroutine leak in tests.
2. Dead WindowSizeMsg branch in handleFormMsg removed — top-level
Update handles resize before forms see it.
3. Probe results sorted by node ID — map iteration no longer
reorders rows every render.
4. fmtAlertConfig takes models.AlertConfig directly instead of an
anonymous struct the caller builds inline.
5. Backspace no longer aliases delete — d is the documented key.
Prevents accidental delete-confirm on habitual backspace.
6. SLA daily buckets use time.Date day arithmetic instead of
Add(-i*24h) — lands on midnight correctly across DST transitions.
1. Kuma import now maps push monitor tokens (generates crypto/rand
token) and paused state (Active=false → Paused=true). Previously
push monitors imported with empty token sat DOWN forever, and
paused Kuma monitors came in unpaused and started alerting.
2. Dockerfile adds HEALTHCHECK against /api/health on port 8080.
Container orchestrators can now detect unhealthy instances.
3. migrate-secrets sets the encryptor before loading alerts, so
already-encrypted settings are decrypted correctly on second run
instead of failing with a JSON unmarshal error.
4. docker-compose.yml adds container hardening: read_only filesystem,
cap_drop ALL, no-new-privileges, tmpfs for /tmp.
1. Delete braille.go + braille_test.go — dead code, only referenced
by its own test. Can be re-added when latency charts are built.
2. Hoist duplicate `const sparkWidth = 40` (update.go + view_detail.go)
to package-level `detailSparkWidth`. Click-index resolution and
rendering now share one constant.
3. Remove tea.ClearScreen on every resize — caused full-screen flash
during continuous resizes. ctrl+l manual clear kept.
1. Poll loop now fully converges with the DB: updated site configs
are refreshed via UpdateSiteConfig, and sites removed from the DB
are evicted from liveState. Previously the loop only added new
sites — config edits via apply were ignored until restart, and
pruned sites kept being checked and alerting.
2. Push monitors now record check history on each heartbeat via
recordCheck. Previously RecordHeartbeat updated state but never
wrote to check_history — push uptime % and sparklines were empty.
3. Groups record a synthetic check per evaluation tick so they get
uptime history and sparklines instead of blank displays.
Cursor tracked by site ID instead of positional index. When the
list re-sorts every tick (sites change status), the selection stays
on the same monitor instead of silently jumping to whatever now
occupies that index position.
q now means "back" in detail, history, SLA, and alert-detail views
— consistent with muscle memory from navigating deeper views.
Only the dashboard q quits the app. ctrl+c always quits from
anywhere.
1. SSRF guard now blocks 0.0.0.0/8 (routes to localhost on Linux)
and 100.64.0.0/10 (CGNAT). Also rejects unspecified, multicast,
and loopback IPs via net.IP methods for defense in depth.
2. DNS monitor type no longer bypasses SSRF guard. The DNSServer
address is resolved and validated against isPrivateIP before use.
Port restricted to 53 — prevents arbitrary internal port probing
via crafted DNSServer values.
3. /metrics now default-deny when MetricsPublic is false, regardless
of whether UPTOP_CLUSTER_SECRET is set. Previously, no secret =
no auth check = metrics exposed to everyone.
1. Alertless monitors no longer spam error logs — triggerAlert
returns early when alertID <= 0.
2. HTTP response body drained before close — enables connection
reuse via keep-alive instead of fresh TCP+TLS per check.
3. /api/backup/export enforces GET — was the only endpoint
accepting any HTTP method.
4. limitStr guards against max < 3 — prevents negative slice
index panic on very narrow terminals.
5. Filter input accepts multibyte characters — len(msg.Runes)
instead of len(msg.String()) for proper Unicode support.
6. Startup warning corrected — with no UPTOP_CLUSTER_SECRET,
endpoints reject (401), not accept. Warning now says so.
7. UPTOP_KEYS file open failure logged — was silently swallowed,
leaving operators with no admin seeded and no message.
Replace three uncoordinated logging systems (log.Printf, fmt.Fprintf
to stderr, fmt.Println warnings) with structured slog calls.
68 log calls migrated:
- log.Printf → slog.Error/Warn/Info (45 calls across 5 files)
- fmt.Fprintf(os.Stderr) → slog.Error (23 calls in main.go)
Kept unchanged:
- fmt.Println/Printf for CLI user output (version, banners, import results)
- engine.AddLog for TUI-visible ring buffer (monitoring events)
Store migration diagnostics demoted to slog.Debug (silent at default
info level). HTTP request logging now structured with method/path/
status/duration/ip attributes.
Site now embeds SiteConfig (22 persistent fields) and SiteState
(11 ephemeral runtime fields). Field access unchanged via promotion
— site.Name and site.Status still work.
Store layer deals exclusively in SiteConfig — the DB never sees
runtime state. Engine's liveState keeps full Site composites.
UpdateSiteConfig reduced from 11-line field-by-field copy to
`existing.SiteConfig = cfg`.
RunCheck takes SiteConfig (only needs config fields). Checker is
now statically prevented from reading/writing runtime state.
Backup.Sites changed to []SiteConfig — exports no longer carry
zero-valued runtime fields. Import backward-compatible (json
ignores unknown fields).
Replace the 328-line Start() god function with a Server struct +
11 named handler methods. Routes registered in routes(), middleware
applied in one place.
Start() kept as a convenience wrapper (NewServer + Start) so
existing callers don't need to change unless they want the Server
reference.
Each handler is now independently readable and testable without
parsing a 300-line closure nest.
New cmd/uptop/config.go with appConfig struct + parseConfig() that
reads all 25 UPTOP_* env vars in one place with defaults. Replaces
~120 lines of scattered os.Getenv calls in runServe.
runServe now reads cfg := parseConfig() up front. ServerConfig
built via cfg.serverConfig(). Uniform flag > env > default
precedence for port/db-type/dsn via flag defaults from config.
New internal/store/storetest/mock.go provides BaseMock implementing
the full Store interface with no-op defaults and optional Func field
overrides. Each test file embeds BaseMock and shadows only the methods
it needs.
Removes ~400 lines of duplicated stub methods across 6 test files.
Adding a Store method now requires one addition (BaseMock) instead
of editing 6 files.
Replace the error-string-matching migration runner with a proper
schema_version table. Migrations are now numbered and recorded;
only unapplied versions run. Fresh databases seed at baseline
version (CREATE TABLE already includes all columns).
CREATE TABLE statements updated to include regions (sites) and
node_id (check_history) — previously only added via ALTER.
DeleteAlert now nulls sites.alert_id before deleting, preventing
dangling references that caused every incident to hit the error
path instead of alerting.
Replace ~150 bare status string comparisons with typed models.Status
constants (StatusUp, StatusDown, StatusPending, StatusLate, StatusStale,
StatusSSLExp). Single IsBroken() method replaces the duplicated
isBroken lambda in monitor.go and isDown function in sla.go.
Adding a new status value (e.g. DEGRADED) now requires one constant
definition instead of grep-and-pray across 16 files.
CheckResult.Status stays string — the checker is the boundary between
raw protocol results and typed status. Cast happens at the edge in
handleStatusChange.
All 8 TIMESTAMP columns in Postgres CREATE TABLE statements changed to
TIMESTAMPTZ. Migration ALTER TYPE statements added for existing databases
(converts assuming stored values are UTC).
Prevents timezone-shifted instants on non-UTC Postgres servers, which
would skew SLA math and maintenance-window checks. SQLite unaffected —
DATETIME is typeless.
Every Store interface method (except Close) now takes context.Context
as first parameter. All 54 db.Query/Exec/QueryRow calls in SQLStore
replaced with their *Context variants. DB operations now respect
cancellation and deadlines.
Context sources by caller:
- Engine dbWriter/poll/pruner: engine ctx from Start()
- HTTP handlers: r.Context()
- config.Apply/Export: caller-provided ctx
- TUI/main.go init: context.Background()
RunCheck and all sub-checks (HTTP/ping/port/DNS) accept parent ctx.
HTTP checks now inherit shutdown cancellation instead of rooting in
context.Background(). dbWrite.exec takes ctx so the writer goroutine
can cancel stuck DB operations.
DeleteSite/ImportData use BeginTx(ctx) instead of Begin().
1. Group auto-pause trap: remove the one-way Paused=true mutation
from checkGroup — monitorRoutine skipped paused groups, so they
could never re-evaluate or auto-unpause.
2. Retry logic: apply MaxRetries to all →DOWN transitions, not just
UP→DOWN. New monitors (PENDING) no longer alert on first transient
failure when retries are configured.
3. Shutdown drain hole: track checker goroutines with checkerWG so
Stop() waits for in-flight checks before draining the write queue.
Final drainWrites() catches any writes enqueued after the writer's
own drain.
4. Probe-ingest writer bypass: route SaveCheckFromNode through the
engine's serialized dbWriter instead of writing directly to the
store from the HTTP handler.
5. Dead-probe expiry: expire stale probe results (>3× site interval)
before aggregation so a dead probe can't poison status forever.
Also clean probeResults in RemoveSite.
6. Maintenance-cache N+1: replace per-check DB query with a
fully-resolved in-memory cache refreshed every poll cycle. One
GetActiveMaintenanceWindows() call instead of N IsMonitorInMaintenance.
ImportData now wipes check_history, state_changes, and alert_health
so re-inserted IDs don't inherit stale history from prior occupants.
Un-neuter grype CVE gate (was || echo, now fails on critical).
Add .grype.yaml with ignore for CVE-2026-41589 (wish SCP —
unreachable, we only import wish/bubbletea).
Pin: grype v0.114.0, git-cliff v2.13.1, govulncheck v1.1.4.
Tag `latest` only on tag push, not workflow_dispatch.
Build path ./cmd/uptop (survives a main.go split).
Add dist/ and uptop to .dockerignore.
keyCache.Invalidate existed but had zero callers, and refresh silently
swallowed store errors — a revoked key kept working off the stale
cache for as long as the DB stayed down.
Invalidate now clears the key set (not just the timestamp) and is
wired through userInvalidatingStore, a decorator at the composition
root that drops the cache on AddUser/UpdateUser/DeleteUser/ImportData.
Transient refresh errors still retain the previous key set so a DB
blip can't lock every admin out, but a post-revocation refresh failure
denies. Refresh errors are logged. First tests for the SSH auth gate.
Also suppresses per-request HTTP logging when the local TUI owns the
terminal — request logs scribbled over the alt screen.
The handler serialized raw models.Site — LastError internals,
Hostname, Port, DNSServer, AlertID, intervals all public, and every
future Site field public the day it's added. statusSite now exposes
exactly what the status page renders: Name, Type, URL, Status, Paused,
LastCheck, Latency.
Replaces the vacuous TestStatusJSON_TokensStripped, which injected via
UpdateSiteConfig (a no-op for unknown IDs) and asserted over zero
sites. The new test seeds the store, starts the engine, waits for live
state, and asserts internal fields are absent from the raw JSON.
The alert detail panel dumped a.Settings raw — SMTP passwords, bot
tokens, API keys on screen and into any recording or screen share. The
table view leaked the PagerDuty routing key, Pushover user key, and
full discord/slack/webhook URLs (the URL path is the credential).
The redaction allowlist moves from internal/server to
models.RedactAlertSettings so the backup export and the TUI render
through one policy. Panel keys are sorted so rows stop reshuffling
every tick; webhook URLs show scheme+host only; keys show
first4…last4.
Deletes, pause toggles, maintenance end, theme/collapse prefs, and all
four form submits wrote to the store synchronously on the UI goroutine;
with busy_timeout=5000 a contended DB froze input for up to 5s.
Writes now run through a writeCmd helper returning writeDoneMsg. The
in-memory engine/model mutations stay in Update so rows react
instantly; the reply logs failures and reloads tab data, so the UI
converges on what was actually written. Closures capture snapshotted
values only — never the model.
The #101 refactor stopped at the tick path; 'h' history and the SLA
view still queried state changes synchronously in Update, freezing the
UI for up to busy_timeout on a contended DB. Both now load through
Cmds with loading placeholders.
Also closes the remaining staleness holes in the async data flow:
- tabDataMsg carries a sequence number; out-of-order replies from
slower earlier loads are dropped instead of overwriting newer data
- history/SLA replies are dropped when the user has navigated to a
different site or period
- the open detail panel refreshes on the tab-data cadence instead of
loading once on entry and going stale
- initSiteHuhForm reads the m.alerts cache instead of hitting the store
applyTheme mutated ~18 package-global lipgloss styles while every SSH
session's tea.Program read them concurrently from its own goroutine.
Pressing T or opening a new connection raced other sessions' View and
bled themes across users.
Styles now live in an immutable per-Model struct built by newStyles;
free formatter helpers that consumed the globals became Model methods.
The TUI ran database queries on the UI goroutine: handleTick called
refreshData every second, which issued four blocking SQLite queries
(GetAllAlerts/GetAllUsers/GetAllNodes/GetAllMaintenanceWindows) and
swallowed their errors; viewDetailPanel ran GetStateChanges — a DB query
— inside View(), on every render (tick, keypress, mouse). A slow disk
stalled input and animation.
Split refreshData into refreshLive() (in-memory engine copies only —
sites + logs — safe every tick) and loadTabDataCmd(), a tea.Cmd that
loads the four DB-backed tables off the UI goroutine and returns a
tabDataMsg. handleTick now refreshes live state every tick but dispatches
the tab-data load only when older than tabRefreshTTL (5s), so tab-bar
counts stay fresh without a per-second query storm. Errors surface to the
log instead of being dropped, and a transient failure keeps the previous
data rather than blanking the view.
The detail panel's state-change history is loaded once on enter via
loadDetailCmd and cached on the model; viewDetailPanel reads the cache,
so View no longer touches the database. Init kicks an initial load so the
dashboard isn't empty on the first frame, and the bare time.Time tick
message is now a named tickMsg (no cross-message collision). The
test-alert handler's raw goroutine becomes a tea.Cmd.
Adds the package's first Update()-driven tests: tab-data load + apply,
error-keeps-previous-data, detail cache with a store-hit counter proving
View does zero IO across repeated renders, and the handleTick throttle.
Full suite green under -race; golangci-lint clean.
Four fixes hardening the secrets and rate-limit posture a prior audit
left or that regressed:
X-Forwarded-For rate-limit bypass + memory DoS (ratelimit.go): clientIP
returned the raw XFF header, so an attacker rotating it minted unlimited
distinct limiter keys — never tripping the limit and growing the visitors
map without bound. XFF is now honored only when the immediate peer is a
configured trusted proxy (UPTOP_TRUSTED_PROXIES, CIDRs or bare IPs), using
the right-most non-trusted hop; otherwise the key is the real RemoteAddr.
The visitors map is bounded with LRU eviction as defense in depth.
Export redaction denylist -> per-provider allowlist (server.go): the old
six-key denylist missed the actual credentials — the webhook URL for
discord/slack/webhook/ntfy/gotify and api_key for opsgenie — exporting
them in the clear. redactByProvider keeps only known-safe keys per
provider type and redacts everything else, so unknown/new keys fail safe.
ImportData plaintext secrets (sqlstore.go): import inserted raw
json.Marshal(settings), bypassing the encryption AddAlert/UpdateAlert
use. It now routes through marshalSettings, so a restore with
UPTOP_ENCRYPTION_KEY set stores enc:-prefixed ciphertext, not plaintext.
Alert error credential leak (alert.go): provider Send returned the raw
*url.Error, whose URL carries the secret (Telegram bot token in the path,
webhook secrets in the URL); it was persisted to AlertHealth.LastError
and shown in the TUI. sanitizeError strips the URL, keeping the operation
and underlying cause.
Tests cover trusted/untrusted XFF + spoofed-bypass + map bound, the
allowlist per provider, encrypted-on-import round-trip, and URL-stripped
errors. README documents UPTOP_TRUSTED_PROXIES. Full suite green under
-race; golangci-lint clean.