The TUI ran database queries on the UI goroutine: handleTick called
refreshData every second, which issued four blocking SQLite queries
(GetAllAlerts/GetAllUsers/GetAllNodes/GetAllMaintenanceWindows) and
swallowed their errors; viewDetailPanel ran GetStateChanges — a DB query
— inside View(), on every render (tick, keypress, mouse). A slow disk
stalled input and animation.
Split refreshData into refreshLive() (in-memory engine copies only —
sites + logs — safe every tick) and loadTabDataCmd(), a tea.Cmd that
loads the four DB-backed tables off the UI goroutine and returns a
tabDataMsg. handleTick now refreshes live state every tick but dispatches
the tab-data load only when older than tabRefreshTTL (5s), so tab-bar
counts stay fresh without a per-second query storm. Errors surface to the
log instead of being dropped, and a transient failure keeps the previous
data rather than blanking the view.
The detail panel's state-change history is loaded once on enter via
loadDetailCmd and cached on the model; viewDetailPanel reads the cache,
so View no longer touches the database. Init kicks an initial load so the
dashboard isn't empty on the first frame, and the bare time.Time tick
message is now a named tickMsg (no cross-message collision). The
test-alert handler's raw goroutine becomes a tea.Cmd.
Adds the package's first Update()-driven tests: tab-data load + apply,
error-keeps-previous-data, detail cache with a store-hit counter proving
View does zero IO across repeated renders, and the handleTick throttle.
Full suite green under -race; golangci-lint clean.
Four fixes hardening the secrets and rate-limit posture a prior audit
left or that regressed:
X-Forwarded-For rate-limit bypass + memory DoS (ratelimit.go): clientIP
returned the raw XFF header, so an attacker rotating it minted unlimited
distinct limiter keys — never tripping the limit and growing the visitors
map without bound. XFF is now honored only when the immediate peer is a
configured trusted proxy (UPTOP_TRUSTED_PROXIES, CIDRs or bare IPs), using
the right-most non-trusted hop; otherwise the key is the real RemoteAddr.
The visitors map is bounded with LRU eviction as defense in depth.
Export redaction denylist -> per-provider allowlist (server.go): the old
six-key denylist missed the actual credentials — the webhook URL for
discord/slack/webhook/ntfy/gotify and api_key for opsgenie — exporting
them in the clear. redactByProvider keeps only known-safe keys per
provider type and redacts everything else, so unknown/new keys fail safe.
ImportData plaintext secrets (sqlstore.go): import inserted raw
json.Marshal(settings), bypassing the encryption AddAlert/UpdateAlert
use. It now routes through marshalSettings, so a restore with
UPTOP_ENCRYPTION_KEY set stores enc:-prefixed ciphertext, not plaintext.
Alert error credential leak (alert.go): provider Send returned the raw
*url.Error, whose URL carries the secret (Telegram bot token in the path,
webhook secrets in the URL); it was persisted to AlertHealth.LastError
and shown in the TUI. sanitizeError strips the URL, keeping the operation
and underlying cause.
Tests cover trusted/untrusted XFF + spoofed-bypass + map bound, the
allowlist per provider, encrypted-on-import round-trip, and URL-stripped
errors. README documents UPTOP_TRUSTED_PROXIES. Full suite green under
-race; golangci-lint clean.
Every check spawned `go e.db.Save*(...)` with the error discarded: a
fire-and-forget goroutine per log line, check, state change, and alert
health update. SaveLog ran a full-table prune DELETE on every insert and
SaveCheck a COUNT + conditional prune on every check, so the hot path
amplified each write into several statements. Nothing tracked these
goroutines, so at shutdown they raced the store's Close() — writes to a
closing DB, silently swallowed.
Introduce a single writer goroutine that drains a buffered channel of
typed dbWrite values (log/check/state-change/alert-health). Writes are
enqueued non-blocking; a saturated queue drops and notes it in the
in-memory log rather than blocking the check loop. Write errors are now
logged instead of discarded. Retention moves off the hot path: SaveLog
and SaveCheck become plain INSERTs, and PruneLogs/PruneCheckHistory/
PruneStateChanges run on a 10-minute timer inside the writer (single
keep-newest-N-per-site pass via a window function). state_changes was
previously never pruned — now bounded.
Add Engine.Stop(): cancels the engine's context, then waits for the
writer to drain every buffered write before returning. main wires it in
before the deferred store Close() so no write races a closed DB.
SQLite gains busy_timeout=5000 and synchronous=NORMAL, applied via the
DSN so every pooled connection inherits them (a post-open PRAGMA only
touches one connection); WAL moves to the DSN too. :memory: test DBs are
left as-is.
Tests: writer drains on Stop, Stop is idempotent, and the prune queries
keep newest-N per site / N logs on real SQLite. Full suite green under
-race.
checkByID snapshotted a Site under RLock, ran a network check for
seconds, then handleStatusChange wrote the entire stale struct back into
liveState. Any concurrent mutation during the check — a user pause, a
config edit, or a push heartbeat — was silently reverted. Worst case: a
heartbeat set UP and an in-flight checkPush overwrote it with a stale
DOWN, firing a false alert.
Introduce applyState(id, mutate): a single read-modify-write helper that
runs the mutator against the CURRENT live entry under the write lock, so
config and Paused are preserved automatically and status transitions are
computed from the true current status. Route handleStatusChange,
RecordHeartbeat, ToggleSitePause and checkGroup through it. Logs and
alerts now fire after the lock is released, off the critical section.
Push false-DOWN is closed by a guard: a non-UP result whose snapshot
LastCheck predates the live LastCheck is dropped, since a heartbeat (or
newer check) superseded it. HTTP/probe stamp LastCheck=now before the
call, so they are unaffected (and serial per site anyway).
Also fixes a latent bug where RecordHeartbeat read StatusChangedAt after
overwriting it, always reporting "was down 0s"; downSince is now captured
before mutation.
Adds regression tests for pause/config-edit/heartbeat-during-check and
removed-site-dropped. Full suite green under -race.
Click any sparkline character to see data point details — approximate
time, latency, and up/down status. Esc dismisses tooltip without
leaving detail view. Uses existing BubbleZone infrastructure with
zone-relative coordinate math for index resolution.
Background goroutine runs every 15 minutes, deletes maintenance windows
that expired beyond the retention period (default 7 days). Configurable
via UPTOP_MAINT_RETENTION env var (Go duration format).
Closes#72
DeleteSite now removes maintenance_windows, check_history, and
state_changes for the site within a transaction before deleting
the site itself. Prevents orphaned rows.
Closes#71
Replace misleading relative-only sparkline with dual-channel design:
bar height uses relative scaling (shows stability and anomalies),
color+brightness uses absolute thresholds (shows fast vs slow).
- Add brightness gradient within color bands (dim→bright as latency
increases toward the next threshold)
- Pass row background through sparkline rendering so zebra stripes
and selection highlights carry through ANSI sequences
- Cap sparkline width to 60 (matches maxHistoryLen) and column
width to 62 to eliminate trailing dead space
- Quiet group sparkline: subtle dots for healthy, bold red for down
- Add braille subpixel canvas (ported from meridian) for future
multi-row graph use
- STATUS column shows icon + clean state only (▲ UP, ▼ DOWN, ◆ LATE,
◆ STALE, ◇ PAUSED, ◼ MAINT, ○ PENDING). Error classification
(DNS/TLS/TMO) removed from STATUS — stays in NAME inline hint.
- Detail panel Last Check shows relative time ("12s ago") instead of
absolute timestamp.
- Extract shared fmtTimeAgo() to format.go, consolidate duplicate
formatters in tab_alerts.go and tab_nodes.go.
Sites table with many rows exceeded the fixed content height,
pushing footer down. MaxHeight now clips content that overflows
while Height still pads shorter content upward.
Each tab returned different leading newlines (Sites/tables: 1,
Logs: 3, empty states: varies). TrimSpace content before layout
so JoinVertical controls all spacing. Remove leading \n from
footer since JoinVertical handles gaps.
Replace string concatenation layout with lipgloss.JoinVertical
and fixed-height content area. Footer now stays at the same
vertical position regardless of tab content height. Uses
lipgloss.Height() to dynamically measure header/footer and
fill remaining space.
Logs were dumping all lines directly, pushing the dashboard
footer off screen. Now uses logViewport with proper height
accounting so footer stays visible and scrolling works.
- Extract divider() and emptyState() helpers to format.go
- All empty states now use bordered box with accent color
- Detail and alert detail panels get header/section dividers
- SLA label width 14→16 to match detail/alert panels
- Logs key hints moved from content to dashboard footer
- History/SLA panels use shared divider helper
- Show version in dashboard footer (wired from goreleaser ldflags)
- Cap name column at 35, raise sparkline minimum to 15 chars
- Preserve zebra background on group rows (was lost by style override)
- Group sparkline uses bullet • instead of heavy circle ●
Full-screen SLA report accessible via [s] from detail panel.
Computes uptime%, downtime, outage count, longest outage, MTTR,
and MTBF from state_changes table. Includes daily breakdown with
bar chart, switchable time periods (24h/7d/30d/90d), and
scrollable viewport. LATE/STALE treated as UP for SLA purposes.
checkGroup only checked for DOWN/SSL EXP and PENDING. Groups
now reflect STALE and LATE children with proper priority:
DOWN > STALE > LATE > PENDING > UP.
New intermediate state between LATE and DOWN at the midpoint of
the grace period. Gives operators earlier warning that a push
monitor has gone quiet. Includes dedicated orange theme color
across all 5 themes and proper styling in dashboard, detail
panel, and history view.
Support Opsgenie Alert API v2 with US/EU endpoint selection,
configurable priority (P1-P5), and GenieKey auth. TUI form
includes API key, priority picker, and EU instance toggle.
Update() routed form and confirm-delete states before handling
time.Time ticks, so those handlers swallowed the tick command and
permanently broke the refresh loop. After opening any form or
delete dialog, the TUI stopped auto-refreshing until restarted.
Move time.Time and WindowSizeMsg handling before state dispatch
so ticks always fire regardless of view state.
Also fix fmtRetries off-by-one: FailureCount=1 displayed as 0/N
instead of 1/N due to an erroneous subtract-one.
Monitor goroutine slept for the full check interval after a config
edit, so hostname/URL changes wouldn't take effect until the next
scheduled check. Added per-site recheck channel that wakes the
goroutine immediately when UpdateSiteConfig is called.
- Replace deprecated LineUp/LineDown/HalfViewUp/HalfViewDown with
ScrollUp/ScrollDown/HalfPageUp/HalfPageDown
- Use tagged switch for mouse button dispatch
- Use fmt.Fprintf instead of WriteString(Sprintf)
Full-screen scrollable history view accessible via [h] from detail
panel. Shows all state transitions with computed outage durations,
event density sparkline for flapping detection, and summary stats.
- Detail panel STATE CHANGES now shows outage duration per recovery
- Event density sparkline highlights flapping periods
- Summary footer: event count, outage count, avg outage duration
- Vim-style navigation (j/k/g/G) + mouse scroll in history view
Categorize raw error strings into DNS/TCP/TLS/HTTP/ICMP/TMO/PRIV
so users get instant triage from the monitor list without opening
the detail panel.
- Status column shows DOWN:DNS, DOWN:TLS, DOWN:HTTP, etc.
- Inline NAME column errors prefixed with category tag [DNS], [TLS]
- Detail panel shows connection chain checklist for HTTP monitors
(✓ DNS → ✓ TCP → ✗ TLS → · HTTP) pinpointing failure layer
- All display-side only — no database or model changes
tui.go (1032→164) and tab_sites.go (993→482) violated "small functions"
and "testable in isolation" standards. Extracted 6 new files by concern:
- format.go: pure formatting functions (fmtLatency, fmtUptime, etc.)
- sparkline.go: sparkline rendering (latency, heartbeat, group)
- update.go: Update method decomposed into 15 named handlers
- view_dashboard.go: View, dashboard composition, tab bar, footer
- view_detail.go: site detail panel
- data.go: data refresh with extracted sortSitesForDisplay/filterSites
Added 17 unit tests for the newly-testable pure functions covering
format, sparkline, sort ordering, and filter logic. No behavioral
changes — strict move-and-extract refactor.
GitHub is the public storefront. Binary releases link to GitHub
where the relay workflow publishes them. Clarify Linux amd64 only
for binary downloads.