Commit Graph

22 Commits

Author SHA1 Message Date
lerko 83ec6bee42 fix(tui): apply log filter to full log list, not viewport window
viewLogsTab filtered logViewport.View() — the visible window — so the
entry count showed the window size and hidden lines reappeared while
scrolling. Filter and render now happen at content-set time from
engine.GetLogs(); the view only reads stored counts.
2026-06-12 15:31:57 -04:00
lerko 6cf0efed9b fix: seven fixes — token scan, variadic cleanup, TUI layout, compose secrets
CI / test (pull_request) Successful in 1m54s
CI / lint (pull_request) Successful in 1m27s
CI / vulncheck (pull_request) Successful in 1m1s
1. UpdateSite handles token-read Scan error instead of ignoring it.
   sql.ErrNoRows (nonexistent site) passes through; real DB errors
   surface.

2. RunCheck allowPrivate changed from variadic to real bool param.
   Dead maxRequestBody duplicate removed from sqlstore.go.

3. Footer help bar documents [Space] for group collapse.

4. adjustCursor unified with clampCursor — one clamping path
   instead of two with different semantics.

5. Compose cluster/probe example files annotate hardcoded secrets
   with "EXAMPLE ONLY — rotate before use".

6. huhForm.WithHeight moved from View() to handleResize — no longer
   mutates form state during render.

7. maxTableRows recalculated on filter enter/exit via recalcLayout()
   — was only recalculated on resize, causing off-by-one when the
   filter bar appeared/disappeared.
2026-06-12 09:36:00 -04:00
lerko 9115ab720c fix: six small fixes — rate limiter leak, DST SLA, probe sort, TUI cleanup
CI / test (pull_request) Successful in 1m55s
CI / lint (pull_request) Successful in 1m27s
CI / vulncheck (pull_request) Successful in 56s
1. Rate limiter cleanup goroutine now stoppable via Stop() channel
   instead of looping forever. Prevents goroutine leak in tests.

2. Dead WindowSizeMsg branch in handleFormMsg removed — top-level
   Update handles resize before forms see it.

3. Probe results sorted by node ID — map iteration no longer
   reorders rows every render.

4. fmtAlertConfig takes models.AlertConfig directly instead of an
   anonymous struct the caller builds inline.

5. Backspace no longer aliases delete — d is the documented key.
   Prevents accidental delete-confirm on habitual backspace.

6. SLA daily buckets use time.Date day arithmetic instead of
   Add(-i*24h) — lands on midnight correctly across DST transitions.
2026-06-12 09:18:52 -04:00
lerko 13637ec216 chore(tui): delete dead braille code, hoist sparkWidth, stop resize flash
CI / test (pull_request) Successful in 1m58s
CI / lint (pull_request) Successful in 1m22s
CI / vulncheck (pull_request) Successful in 1m1s
1. Delete braille.go + braille_test.go — dead code, only referenced
   by its own test. Can be re-added when latency charts are built.

2. Hoist duplicate `const sparkWidth = 40` (update.go + view_detail.go)
   to package-level `detailSparkWidth`. Click-index resolution and
   rendering now share one constant.

3. Remove tea.ClearScreen on every resize — caused full-screen flash
   during continuous resizes. ctrl+l manual clear kept.
2026-06-12 08:03:16 -04:00
lerko fa56f47f96 fix(tui): track selection by site ID + q means back everywhere
CI / test (pull_request) Successful in 2m5s
CI / lint (pull_request) Successful in 1m26s
CI / vulncheck (pull_request) Successful in 1m2s
Cursor tracked by site ID instead of positional index. When the
list re-sorts every tick (sites change status), the selection stays
on the same monitor instead of silently jumping to whatever now
occupies that index position.

q now means "back" in detail, history, SLA, and alert-detail views
— consistent with muscle memory from navigating deeper views.
Only the dashboard q quits the app. ctrl+c always quits from
anywhere.
2026-06-11 19:34:21 -04:00
lerko 5d2b7a3e66 fix: seven quick-win bug fixes across engine, server, TUI, CLI
CI / test (pull_request) Successful in 1m55s
CI / lint (pull_request) Successful in 1m27s
CI / vulncheck (pull_request) Successful in 1m1s
1. Alertless monitors no longer spam error logs — triggerAlert
   returns early when alertID <= 0.

2. HTTP response body drained before close — enables connection
   reuse via keep-alive instead of fresh TCP+TLS per check.

3. /api/backup/export enforces GET — was the only endpoint
   accepting any HTTP method.

4. limitStr guards against max < 3 — prevents negative slice
   index panic on very narrow terminals.

5. Filter input accepts multibyte characters — len(msg.Runes)
   instead of len(msg.String()) for proper Unicode support.

6. Startup warning corrected — with no UPTOP_CLUSTER_SECRET,
   endpoints reject (401), not accept. Warning now says so.

7. UPTOP_KEYS file open failure logged — was silently swallowed,
   leaving operators with no admin seeded and no message.
2026-06-11 18:28:32 -04:00
lerko f00acbc280 refactor(models): typed Status constants with IsBroken() predicate
Replace ~150 bare status string comparisons with typed models.Status
constants (StatusUp, StatusDown, StatusPending, StatusLate, StatusStale,
StatusSSLExp). Single IsBroken() method replaces the duplicated
isBroken lambda in monitor.go and isDown function in sla.go.

Adding a new status value (e.g. DEGRADED) now requires one constant
definition instead of grep-and-pray across 16 files.

CheckResult.Status stays string — the checker is the boundary between
raw protocol results and typed status. Cast happens at the edge in
handleStatusChange.
2026-06-11 15:56:51 -04:00
lerko 70a83a1da9 refactor(store): propagate context.Context through all Store methods
Every Store interface method (except Close) now takes context.Context
as first parameter. All 54 db.Query/Exec/QueryRow calls in SQLStore
replaced with their *Context variants. DB operations now respect
cancellation and deadlines.

Context sources by caller:
- Engine dbWriter/poll/pruner: engine ctx from Start()
- HTTP handlers: r.Context()
- config.Apply/Export: caller-provided ctx
- TUI/main.go init: context.Background()

RunCheck and all sub-checks (HTTP/ping/port/DNS) accept parent ctx.
HTTP checks now inherit shutdown cancellation instead of rooting in
context.Background(). dbWrite.exec takes ctx so the writer goroutine
can cancel stuck DB operations.

DeleteSite/ImportData use BeginTx(ctx) instead of Begin().
2026-06-11 14:40:30 -04:00
lerko a3711c652c fix(tui): move all store writes out of Update into tea.Cmds
CI / test (pull_request) Successful in 2m35s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
Deletes, pause toggles, maintenance end, theme/collapse prefs, and all
four form submits wrote to the store synchronously on the UI goroutine;
with busy_timeout=5000 a contended DB froze input for up to 5s.

Writes now run through a writeCmd helper returning writeDoneMsg. The
in-memory engine/model mutations stay in Update so rows react
instantly; the reply logs failures and reloads tab data, so the UI
converges on what was actually written. Closures capture snapshotted
values only — never the model.
2026-06-11 11:39:15 -04:00
lerko 634c3ee03c fix(tui): finish moving keypress DB reads into tea.Cmds
The #101 refactor stopped at the tick path; 'h' history and the SLA
view still queried state changes synchronously in Update, freezing the
UI for up to busy_timeout on a contended DB. Both now load through
Cmds with loading placeholders.

Also closes the remaining staleness holes in the async data flow:
- tabDataMsg carries a sequence number; out-of-order replies from
  slower earlier loads are dropped instead of overwriting newer data
- history/SLA replies are dropped when the user has navigated to a
  different site or period
- the open detail panel refreshes on the tab-data cadence instead of
  loading once on entry and going stale
- initSiteHuhForm reads the m.alerts cache instead of hitting the store
2026-06-11 11:35:03 -04:00
lerko 274f0081e2 fix(tui): move theme styles onto the Model to end cross-session races
applyTheme mutated ~18 package-global lipgloss styles while every SSH
session's tea.Program read them concurrently from its own goroutine.
Pressing T or opening a new connection raced other sessions' View and
bled themes across users.

Styles now live in an immutable per-Model struct built by newStyles;
free formatter helpers that consumed the globals became Model methods.
2026-06-11 11:23:16 -04:00
lerko f349d0dfd1 fix(tui): move blocking DB IO out of Update/View into tea.Cmds
CI / test (pull_request) Successful in 2m38s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
The TUI ran database queries on the UI goroutine: handleTick called
refreshData every second, which issued four blocking SQLite queries
(GetAllAlerts/GetAllUsers/GetAllNodes/GetAllMaintenanceWindows) and
swallowed their errors; viewDetailPanel ran GetStateChanges — a DB query
— inside View(), on every render (tick, keypress, mouse). A slow disk
stalled input and animation.

Split refreshData into refreshLive() (in-memory engine copies only —
sites + logs — safe every tick) and loadTabDataCmd(), a tea.Cmd that
loads the four DB-backed tables off the UI goroutine and returns a
tabDataMsg. handleTick now refreshes live state every tick but dispatches
the tab-data load only when older than tabRefreshTTL (5s), so tab-bar
counts stay fresh without a per-second query storm. Errors surface to the
log instead of being dropped, and a transient failure keeps the previous
data rather than blanking the view.

The detail panel's state-change history is loaded once on enter via
loadDetailCmd and cached on the model; viewDetailPanel reads the cache,
so View no longer touches the database. Init kicks an initial load so the
dashboard isn't empty on the first frame, and the bare time.Time tick
message is now a named tickMsg (no cross-message collision). The
test-alert handler's raw goroutine becomes a tea.Cmd.

Adds the package's first Update()-driven tests: tab-data load + apply,
error-keeps-previous-data, detail cache with a store-hit counter proving
View does zero IO across repeated renders, and the handleTick throttle.
Full suite green under -race; golangci-lint clean.
2026-06-10 21:14:47 -04:00
lerko f97ea3d66b feat(tui): click-to-inspect sparkline tooltips in detail view
CI / test (pull_request) Successful in 2m47s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 46s
Click any sparkline character to see data point details — approximate
time, latency, and up/down status. Esc dismisses tooltip without
leaving detail view. Uses existing BubbleZone infrastructure with
zone-relative coordinate math for index resolution.
2026-06-10 11:28:29 -04:00
lerko c471a72ff5 fix(monitor): inject time into ComputeDailyBreakdown for testability
CI / test (pull_request) Successful in 2m30s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
Test failed near midnight when outage events fell in previous day's
bucket. Accept a now parameter instead of calling time.Now() internally.
2026-06-04 21:29:03 -04:00
lerko 33a3ff9bcb fix(tui): expand log viewport to fill content area
CI / test (pull_request) Successful in 2m36s
CI / lint (pull_request) Successful in 1m6s
CI / vulncheck (pull_request) Successful in 51s
Previous + 3 over-restricted viewport height, leaving blank
lines at the bottom of the logs tab.
2026-06-04 16:13:40 -04:00
lerko e0f189efe9 fix(tui): logs tab use viewport for scrollable content
CI / test (pull_request) Successful in 2m39s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
Logs were dumping all lines directly, pushing the dashboard
footer off screen. Now uses logViewport with proper height
accounting so footer stays visible and scrolling works.
2026-06-04 15:36:21 -04:00
lerko 60592ef810 feat(tui): add SLA reporting view
CI / test (pull_request) Successful in 2m35s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 41s
Full-screen SLA report accessible via [s] from detail panel.
Computes uptime%, downtime, outage count, longest outage, MTTR,
and MTBF from state_changes table. Includes daily breakdown with
bar chart, switchable time periods (24h/7d/30d/90d), and
scrollable viewport. LATE/STALE treated as UP for SLA purposes.
2026-06-04 14:24:39 -04:00
lerko 9e15b369d3 fix(tui): wire up [e] edit key in detail panel
CI / test (pull_request) Successful in 2m38s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 46s
CI / test (push) Successful in 2m40s
CI / lint (push) Successful in 56s
CI / vulncheck (push) Successful in 51s
The detail panel footer showed [e] Edit but handleDetailKey had no
case for it. Route to handleEditItem() like the dashboard does.
2026-06-04 12:36:24 -04:00
lerko 5b39be8eb2 fix(tui): broken tick chain after form/dialog + retries off-by-one
CI / test (pull_request) Successful in 2m43s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
Update() routed form and confirm-delete states before handling
time.Time ticks, so those handlers swallowed the tick command and
permanently broke the refresh loop. After opening any form or
delete dialog, the TUI stopped auto-refreshing until restarted.

Move time.Time and WindowSizeMsg handling before state dispatch
so ticks always fire regardless of view state.

Also fix fmtRetries off-by-one: FailureCount=1 displayed as 0/N
instead of 1/N due to an erroneous subtract-one.
2026-06-04 12:34:10 -04:00
lerko 1d1f5d0ee4 fix(tui): resolve staticcheck lint errors in history view
CI / test (pull_request) Successful in 2m39s
CI / lint (pull_request) Successful in 51s
CI / vulncheck (pull_request) Successful in 51s
- Replace deprecated LineUp/LineDown/HalfViewUp/HalfViewDown with
  ScrollUp/ScrollDown/HalfPageUp/HalfPageDown
- Use tagged switch for mouse button dispatch
- Use fmt.Fprintf instead of WriteString(Sprintf)
2026-06-03 20:22:41 -04:00
lerko bc661f5207 feat(tui): add state change history view with outage duration
CI / test (pull_request) Successful in 2m30s
CI / lint (pull_request) Failing after 51s
CI / vulncheck (pull_request) Successful in 46s
Full-screen scrollable history view accessible via [h] from detail
panel. Shows all state transitions with computed outage durations,
event density sparkline for flapping detection, and summary stats.

- Detail panel STATE CHANGES now shows outage duration per recovery
- Event density sparkline highlights flapping periods
- Summary footer: event count, outage count, avg outage duration
- Vim-style navigation (j/k/g/G) + mouse scroll in history view
2026-06-03 19:49:10 -04:00
lerko 5d362fdbe6 refactor(tui): decompose god files into single-concern modules
CI / test (pull_request) Successful in 2m34s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
CI / test (push) Successful in 2m32s
CI / lint (push) Successful in 56s
CI / vulncheck (push) Successful in 51s
tui.go (1032→164) and tab_sites.go (993→482) violated "small functions"
and "testable in isolation" standards. Extracted 6 new files by concern:

- format.go: pure formatting functions (fmtLatency, fmtUptime, etc.)
- sparkline.go: sparkline rendering (latency, heartbeat, group)
- update.go: Update method decomposed into 15 named handlers
- view_dashboard.go: View, dashboard composition, tab bar, footer
- view_detail.go: site detail panel
- data.go: data refresh with extracted sortSitesForDisplay/filterSites

Added 17 unit tests for the newly-testable pure functions covering
format, sparkline, sort ordering, and filter logic. No behavioral
changes — strict move-and-extract refactor.
2026-06-02 21:06:30 -04:00