Commit Graph

98 Commits

Author SHA1 Message Date
lerko 83ec6bee42 fix(tui): apply log filter to full log list, not viewport window
viewLogsTab filtered logViewport.View() — the visible window — so the
entry count showed the window size and hidden lines reappeared while
scrolling. Filter and render now happen at content-set time from
engine.GetLogs(); the view only reads stored counts.
2026-06-12 15:31:57 -04:00
lerko 6cf0efed9b fix: seven fixes — token scan, variadic cleanup, TUI layout, compose secrets
CI / test (pull_request) Successful in 1m54s
CI / lint (pull_request) Successful in 1m27s
CI / vulncheck (pull_request) Successful in 1m1s
1. UpdateSite handles token-read Scan error instead of ignoring it.
   sql.ErrNoRows (nonexistent site) passes through; real DB errors
   surface.

2. RunCheck allowPrivate changed from variadic to real bool param.
   Dead maxRequestBody duplicate removed from sqlstore.go.

3. Footer help bar documents [Space] for group collapse.

4. adjustCursor unified with clampCursor — one clamping path
   instead of two with different semantics.

5. Compose cluster/probe example files annotate hardcoded secrets
   with "EXAMPLE ONLY — rotate before use".

6. huhForm.WithHeight moved from View() to handleResize — no longer
   mutates form state during render.

7. maxTableRows recalculated on filter enter/exit via recalcLayout()
   — was only recalculated on resize, causing off-by-one when the
   filter bar appeared/disappeared.
2026-06-12 09:36:00 -04:00
lerko 9115ab720c fix: six small fixes — rate limiter leak, DST SLA, probe sort, TUI cleanup
CI / test (pull_request) Successful in 1m55s
CI / lint (pull_request) Successful in 1m27s
CI / vulncheck (pull_request) Successful in 56s
1. Rate limiter cleanup goroutine now stoppable via Stop() channel
   instead of looping forever. Prevents goroutine leak in tests.

2. Dead WindowSizeMsg branch in handleFormMsg removed — top-level
   Update handles resize before forms see it.

3. Probe results sorted by node ID — map iteration no longer
   reorders rows every render.

4. fmtAlertConfig takes models.AlertConfig directly instead of an
   anonymous struct the caller builds inline.

5. Backspace no longer aliases delete — d is the documented key.
   Prevents accidental delete-confirm on habitual backspace.

6. SLA daily buckets use time.Date day arithmetic instead of
   Add(-i*24h) — lands on midnight correctly across DST transitions.
2026-06-12 09:18:52 -04:00
lerko 13637ec216 chore(tui): delete dead braille code, hoist sparkWidth, stop resize flash
CI / test (pull_request) Successful in 1m58s
CI / lint (pull_request) Successful in 1m22s
CI / vulncheck (pull_request) Successful in 1m1s
1. Delete braille.go + braille_test.go — dead code, only referenced
   by its own test. Can be re-added when latency charts are built.

2. Hoist duplicate `const sparkWidth = 40` (update.go + view_detail.go)
   to package-level `detailSparkWidth`. Click-index resolution and
   rendering now share one constant.

3. Remove tea.ClearScreen on every resize — caused full-screen flash
   during continuous resizes. ctrl+l manual clear kept.
2026-06-12 08:03:16 -04:00
lerko fa56f47f96 fix(tui): track selection by site ID + q means back everywhere
CI / test (pull_request) Successful in 2m5s
CI / lint (pull_request) Successful in 1m26s
CI / vulncheck (pull_request) Successful in 1m2s
Cursor tracked by site ID instead of positional index. When the
list re-sorts every tick (sites change status), the selection stays
on the same monitor instead of silently jumping to whatever now
occupies that index position.

q now means "back" in detail, history, SLA, and alert-detail views
— consistent with muscle memory from navigating deeper views.
Only the dashboard q quits the app. ctrl+c always quits from
anywhere.
2026-06-11 19:34:21 -04:00
lerko 5d2b7a3e66 fix: seven quick-win bug fixes across engine, server, TUI, CLI
CI / test (pull_request) Successful in 1m55s
CI / lint (pull_request) Successful in 1m27s
CI / vulncheck (pull_request) Successful in 1m1s
1. Alertless monitors no longer spam error logs — triggerAlert
   returns early when alertID <= 0.

2. HTTP response body drained before close — enables connection
   reuse via keep-alive instead of fresh TCP+TLS per check.

3. /api/backup/export enforces GET — was the only endpoint
   accepting any HTTP method.

4. limitStr guards against max < 3 — prevents negative slice
   index panic on very narrow terminals.

5. Filter input accepts multibyte characters — len(msg.Runes)
   instead of len(msg.String()) for proper Unicode support.

6. Startup warning corrected — with no UPTOP_CLUSTER_SECRET,
   endpoints reject (401), not accept. Warning now says so.

7. UPTOP_KEYS file open failure logged — was silently swallowed,
   leaving operators with no admin seeded and no message.
2026-06-11 18:28:32 -04:00
lerko 52ccd7ad91 refactor(models): split Site into SiteConfig + SiteState
CI / test (pull_request) Successful in 1m58s
CI / lint (pull_request) Successful in 1m21s
CI / vulncheck (pull_request) Successful in 1m2s
Site now embeds SiteConfig (22 persistent fields) and SiteState
(11 ephemeral runtime fields). Field access unchanged via promotion
— site.Name and site.Status still work.

Store layer deals exclusively in SiteConfig — the DB never sees
runtime state. Engine's liveState keeps full Site composites.
UpdateSiteConfig reduced from 11-line field-by-field copy to
`existing.SiteConfig = cfg`.

RunCheck takes SiteConfig (only needs config fields). Checker is
now statically prevented from reading/writing runtime state.

Backup.Sites changed to []SiteConfig — exports no longer carry
zero-valued runtime fields. Import backward-compatible (json
ignores unknown fields).
2026-06-11 17:13:09 -04:00
lerko 2b357341c8 refactor(store): shared storetest.BaseMock replaces 5 duplicated mocks
CI / test (pull_request) Successful in 1m57s
CI / lint (pull_request) Successful in 1m16s
CI / vulncheck (pull_request) Successful in 1m1s
New internal/store/storetest/mock.go provides BaseMock implementing
the full Store interface with no-op defaults and optional Func field
overrides. Each test file embeds BaseMock and shadows only the methods
it needs.

Removes ~400 lines of duplicated stub methods across 6 test files.
Adding a Store method now requires one addition (BaseMock) instead
of editing 6 files.
2026-06-11 16:09:29 -04:00
lerko f00acbc280 refactor(models): typed Status constants with IsBroken() predicate
Replace ~150 bare status string comparisons with typed models.Status
constants (StatusUp, StatusDown, StatusPending, StatusLate, StatusStale,
StatusSSLExp). Single IsBroken() method replaces the duplicated
isBroken lambda in monitor.go and isDown function in sla.go.

Adding a new status value (e.g. DEGRADED) now requires one constant
definition instead of grep-and-pray across 16 files.

CheckResult.Status stays string — the checker is the boundary between
raw protocol results and typed status. Cast happens at the edge in
handleStatusChange.
2026-06-11 15:56:51 -04:00
lerko 70a83a1da9 refactor(store): propagate context.Context through all Store methods
Every Store interface method (except Close) now takes context.Context
as first parameter. All 54 db.Query/Exec/QueryRow calls in SQLStore
replaced with their *Context variants. DB operations now respect
cancellation and deadlines.

Context sources by caller:
- Engine dbWriter/poll/pruner: engine ctx from Start()
- HTTP handlers: r.Context()
- config.Apply/Export: caller-provided ctx
- TUI/main.go init: context.Background()

RunCheck and all sub-checks (HTTP/ping/port/DNS) accept parent ctx.
HTTP checks now inherit shutdown cancellation instead of rooting in
context.Background(). dbWrite.exec takes ctx so the writer goroutine
can cancel stuck DB operations.

DeleteSite/ImportData use BeginTx(ctx) instead of Begin().
2026-06-11 14:40:30 -04:00
lerko a1ab276bc5 fix(security): mask alert secrets in the TUI detail panel and table
The alert detail panel dumped a.Settings raw — SMTP passwords, bot
tokens, API keys on screen and into any recording or screen share. The
table view leaked the PagerDuty routing key, Pushover user key, and
full discord/slack/webhook URLs (the URL path is the credential).

The redaction allowlist moves from internal/server to
models.RedactAlertSettings so the backup export and the TUI render
through one policy. Panel keys are sorted so rows stop reshuffling
every tick; webhook URLs show scheme+host only; keys show
first4…last4.
2026-06-11 12:26:40 -04:00
lerko a3711c652c fix(tui): move all store writes out of Update into tea.Cmds
CI / test (pull_request) Successful in 2m35s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
Deletes, pause toggles, maintenance end, theme/collapse prefs, and all
four form submits wrote to the store synchronously on the UI goroutine;
with busy_timeout=5000 a contended DB froze input for up to 5s.

Writes now run through a writeCmd helper returning writeDoneMsg. The
in-memory engine/model mutations stay in Update so rows react
instantly; the reply logs failures and reloads tab data, so the UI
converges on what was actually written. Closures capture snapshotted
values only — never the model.
2026-06-11 11:39:15 -04:00
lerko 634c3ee03c fix(tui): finish moving keypress DB reads into tea.Cmds
The #101 refactor stopped at the tick path; 'h' history and the SLA
view still queried state changes synchronously in Update, freezing the
UI for up to busy_timeout on a contended DB. Both now load through
Cmds with loading placeholders.

Also closes the remaining staleness holes in the async data flow:
- tabDataMsg carries a sequence number; out-of-order replies from
  slower earlier loads are dropped instead of overwriting newer data
- history/SLA replies are dropped when the user has navigated to a
  different site or period
- the open detail panel refreshes on the tab-data cadence instead of
  loading once on entry and going stale
- initSiteHuhForm reads the m.alerts cache instead of hitting the store
2026-06-11 11:35:03 -04:00
lerko 274f0081e2 fix(tui): move theme styles onto the Model to end cross-session races
applyTheme mutated ~18 package-global lipgloss styles while every SSH
session's tea.Program read them concurrently from its own goroutine.
Pressing T or opening a new connection raced other sessions' View and
bled themes across users.

Styles now live in an immutable per-Model struct built by newStyles;
free formatter helpers that consumed the globals became Model methods.
2026-06-11 11:23:16 -04:00
lerko f349d0dfd1 fix(tui): move blocking DB IO out of Update/View into tea.Cmds
CI / test (pull_request) Successful in 2m38s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
The TUI ran database queries on the UI goroutine: handleTick called
refreshData every second, which issued four blocking SQLite queries
(GetAllAlerts/GetAllUsers/GetAllNodes/GetAllMaintenanceWindows) and
swallowed their errors; viewDetailPanel ran GetStateChanges — a DB query
— inside View(), on every render (tick, keypress, mouse). A slow disk
stalled input and animation.

Split refreshData into refreshLive() (in-memory engine copies only —
sites + logs — safe every tick) and loadTabDataCmd(), a tea.Cmd that
loads the four DB-backed tables off the UI goroutine and returns a
tabDataMsg. handleTick now refreshes live state every tick but dispatches
the tab-data load only when older than tabRefreshTTL (5s), so tab-bar
counts stay fresh without a per-second query storm. Errors surface to the
log instead of being dropped, and a transient failure keeps the previous
data rather than blanking the view.

The detail panel's state-change history is loaded once on enter via
loadDetailCmd and cached on the model; viewDetailPanel reads the cache,
so View no longer touches the database. Init kicks an initial load so the
dashboard isn't empty on the first frame, and the bare time.Time tick
message is now a named tickMsg (no cross-message collision). The
test-alert handler's raw goroutine becomes a tea.Cmd.

Adds the package's first Update()-driven tests: tab-data load + apply,
error-keeps-previous-data, detail cache with a store-hit counter proving
View does zero IO across repeated renders, and the handleTick throttle.
Full suite green under -race; golangci-lint clean.
2026-06-10 21:14:47 -04:00
lerko f97ea3d66b feat(tui): click-to-inspect sparkline tooltips in detail view
CI / test (pull_request) Successful in 2m47s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 46s
Click any sparkline character to see data point details — approximate
time, latency, and up/down status. Esc dismisses tooltip without
leaving detail view. Uses existing BubbleZone infrastructure with
zone-relative coordinate math for index resolution.
2026-06-10 11:28:29 -04:00
lerko 33dc84449b polish(tui): responsive column hiding — 3-tier priority-based layout
CI / test (pull_request) Successful in 2m29s
CI / lint (pull_request) Successful in 57s
CI / vulncheck (pull_request) Successful in 46s
Columns drop progressively as terminal narrows:
- Compact (<90): #, NAME, STATUS, LATENCY
- Medium (90-119): + TYPE, UPTIME, HISTORY
- Wide (120+): + SSL, RETRIES

Column definitions are data-driven via siteColumns slice.
Row builder uses pickCols() helper so headers and cells can't drift.

Closes #68
2026-06-05 17:50:57 -04:00
lerko 69a8e0f7ba polish(tui): overhaul tab bar — consistent counts, active highlight, colored alerts
CI / test (pull_request) Successful in 2m33s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
Closes #89
2026-06-05 17:14:07 -04:00
lerko c471a72ff5 fix(monitor): inject time into ComputeDailyBreakdown for testability
CI / test (pull_request) Successful in 2m30s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
Test failed near midnight when outage events fell in previous day's
bucket. Accept a now parameter instead of calling time.Now() internally.
2026-06-04 21:29:03 -04:00
lerko 7bff79b09c fix(tui): check fmt.Sscanf return value (errcheck lint)
CI / test (pull_request) Successful in 2m37s
CI / lint (pull_request) Successful in 57s
CI / vulncheck (pull_request) Successful in 51s
2026-06-04 19:56:05 -04:00
lerko 986681ef8a feat(tui): overhaul latency sparkline scaling, color, and layout
CI / test (pull_request) Successful in 2m39s
CI / lint (pull_request) Failing after 56s
CI / vulncheck (pull_request) Successful in 51s
Replace misleading relative-only sparkline with dual-channel design:
bar height uses relative scaling (shows stability and anomalies),
color+brightness uses absolute thresholds (shows fast vs slow).

- Add brightness gradient within color bands (dim→bright as latency
  increases toward the next threshold)
- Pass row background through sparkline rendering so zebra stripes
  and selection highlights carry through ANSI sequences
- Cap sparkline width to 60 (matches maxHistoryLen) and column
  width to 62 to eliminate trailing dead space
- Quiet group sparkline: subtle dots for healthy, bold red for down
- Add braille subpixel canvas (ported from meridian) for future
  multi-row graph use
2026-06-04 19:40:34 -04:00
lerko fb709b34c5 refactor(tui): status icons, clean STATUS column, relative time
CI / test (pull_request) Successful in 2m30s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
- STATUS column shows icon + clean state only (▲ UP, ▼ DOWN, ◆ LATE,
  ◆ STALE, ◇ PAUSED, ◼ MAINT, ○ PENDING). Error classification
  (DNS/TLS/TMO) removed from STATUS — stays in NAME inline hint.
- Detail panel Last Check shows relative time ("12s ago") instead of
  absolute timestamp.
- Extract shared fmtTimeAgo() to format.go, consolidate duplicate
  formatters in tab_alerts.go and tab_nodes.go.
2026-06-04 17:03:04 -04:00
lerko 33a3ff9bcb fix(tui): expand log viewport to fill content area
CI / test (pull_request) Successful in 2m36s
CI / lint (pull_request) Successful in 1m6s
CI / vulncheck (pull_request) Successful in 51s
Previous + 3 over-restricted viewport height, leaving blank
lines at the bottom of the logs tab.
2026-06-04 16:13:40 -04:00
lerko d099740f33 fix(tui): remove extra blank lines above footer
CI / test (pull_request) Successful in 2m37s
CI / lint (pull_request) Successful in 57s
CI / vulncheck (pull_request) Successful in 51s
JoinVertical adds no gap lines between sections. The - 2
subtraction was over-reserving space, leaving 2 blank lines
between content and footer.
2026-06-04 16:12:14 -04:00
lerko cdb8c356e9 fix(tui): clip overflowing content to keep footer pinned
CI / test (pull_request) Successful in 2m38s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
Sites table with many rows exceeded the fixed content height,
pushing footer down. MaxHeight now clips content that overflows
while Height still pads shorter content upward.
2026-06-04 16:07:44 -04:00
lerko d4a2e9dd53 fix(tui): normalize content whitespace for consistent footer position
CI / test (pull_request) Successful in 2m43s
CI / lint (pull_request) Successful in 1m1s
CI / vulncheck (pull_request) Successful in 51s
Each tab returned different leading newlines (Sites/tables: 1,
Logs: 3, empty states: varies). TrimSpace content before layout
so JoinVertical controls all spacing. Remove leading \n from
footer since JoinVertical handles gaps.
2026-06-04 16:03:57 -04:00
lerko aae6e6e65e fix(tui): pin footer to bottom of terminal
CI / test (pull_request) Successful in 2m45s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
Replace string concatenation layout with lipgloss.JoinVertical
and fixed-height content area. Footer now stays at the same
vertical position regardless of tab content height. Uses
lipgloss.Height() to dynamically measure header/footer and
fill remaining space.
2026-06-04 15:59:32 -04:00
lerko e0f189efe9 fix(tui): logs tab use viewport for scrollable content
CI / test (pull_request) Successful in 2m39s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
Logs were dumping all lines directly, pushing the dashboard
footer off screen. Now uses logViewport with proper height
accounting so footer stays visible and scrolling works.
2026-06-04 15:36:21 -04:00
lerko ba75be194d refactor(tui): consistent chrome across all views
CI / test (pull_request) Successful in 2m36s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
- Extract divider() and emptyState() helpers to format.go
- All empty states now use bordered box with accent color
- Detail and alert detail panels get header/section dividers
- SLA label width 14→16 to match detail/alert panels
- Logs key hints moved from content to dashboard footer
- History/SLA panels use shared divider helper
2026-06-04 19:23:12 +00:00
lerko e0cb0adebd fix(tui): quick wins batch — version footer, column widths, zebra, sparkline
CI / test (pull_request) Successful in 2m34s
CI / lint (pull_request) Successful in 57s
CI / vulncheck (pull_request) Successful in 51s
- Show version in dashboard footer (wired from goreleaser ldflags)
- Cap name column at 35, raise sparkline minimum to 15 chars
- Preserve zebra background on group rows (was lost by style override)
- Group sparkline uses bullet • instead of heavy circle ●
2026-06-04 14:56:01 -04:00
lerko 60592ef810 feat(tui): add SLA reporting view
CI / test (pull_request) Successful in 2m35s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 41s
Full-screen SLA report accessible via [s] from detail panel.
Computes uptime%, downtime, outage count, longest outage, MTTR,
and MTBF from state_changes table. Includes daily breakdown with
bar chart, switchable time periods (24h/7d/30d/90d), and
scrollable viewport. LATE/STALE treated as UP for SLA purposes.
2026-06-04 14:24:39 -04:00
lerko 66b1c662c9 fix(tui): show correct push heartbeat curl command in detail panel 2026-06-04 18:23:57 +00:00
lerko 10c6ec348e fix(tui): show push token and URL in detail panel
Push monitors were missing token/endpoint info in the detail
view, making it impossible to know where to send heartbeats.
2026-06-04 18:23:57 +00:00
lerko ca43621c44 feat(monitor): add STALE state for push monitors
New intermediate state between LATE and DOWN at the midpoint of
the grace period. Gives operators earlier warning that a push
monitor has gone quiet. Includes dedicated orange theme color
across all 5 themes and proper styling in dashboard, detail
panel, and history view.
2026-06-04 18:23:57 +00:00
lerko f23014ab12 feat(alert): add Opsgenie provider
CI / test (pull_request) Successful in 2m37s
CI / lint (pull_request) Successful in 57s
CI / vulncheck (pull_request) Successful in 51s
Support Opsgenie Alert API v2 with US/EU endpoint selection,
configurable priority (P1-P5), and GenieKey auth. TUI form
includes API key, priority picker, and EU instance toggle.
2026-06-04 13:32:14 -04:00
lerko 9e15b369d3 fix(tui): wire up [e] edit key in detail panel
CI / test (pull_request) Successful in 2m38s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 46s
CI / test (push) Successful in 2m40s
CI / lint (push) Successful in 56s
CI / vulncheck (push) Successful in 51s
The detail panel footer showed [e] Edit but handleDetailKey had no
case for it. Route to handleEditItem() like the dashboard does.
2026-06-04 12:36:24 -04:00
lerko 5b39be8eb2 fix(tui): broken tick chain after form/dialog + retries off-by-one
CI / test (pull_request) Successful in 2m43s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
Update() routed form and confirm-delete states before handling
time.Time ticks, so those handlers swallowed the tick command and
permanently broke the refresh loop. After opening any form or
delete dialog, the TUI stopped auto-refreshing until restarted.

Move time.Time and WindowSizeMsg handling before state dispatch
so ticks always fire regardless of view state.

Also fix fmtRetries off-by-one: FailureCount=1 displayed as 0/N
instead of 1/N due to an erroneous subtract-one.
2026-06-04 12:34:10 -04:00
lerko 1d1f5d0ee4 fix(tui): resolve staticcheck lint errors in history view
CI / test (pull_request) Successful in 2m39s
CI / lint (pull_request) Successful in 51s
CI / vulncheck (pull_request) Successful in 51s
- Replace deprecated LineUp/LineDown/HalfViewUp/HalfViewDown with
  ScrollUp/ScrollDown/HalfPageUp/HalfPageDown
- Use tagged switch for mouse button dispatch
- Use fmt.Fprintf instead of WriteString(Sprintf)
2026-06-03 20:22:41 -04:00
lerko bc661f5207 feat(tui): add state change history view with outage duration
CI / test (pull_request) Successful in 2m30s
CI / lint (pull_request) Failing after 51s
CI / vulncheck (pull_request) Successful in 46s
Full-screen scrollable history view accessible via [h] from detail
panel. Shows all state transitions with computed outage durations,
event density sparkline for flapping detection, and summary stats.

- Detail panel STATE CHANGES now shows outage duration per recovery
- Event density sparkline highlights flapping periods
- Summary footer: event count, outage count, avg outage duration
- Vim-style navigation (j/k/g/G) + mouse scroll in history view
2026-06-03 19:49:10 -04:00
lerko c0ad51af9c fix(tui): classify safedial "failed to connect" as TCP
CI / test (pull_request) Successful in 2m29s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
CI / test (push) Successful in 2m41s
CI / lint (push) Successful in 56s
CI / vulncheck (push) Successful in 51s
Error from safedial.go fell through to ErrCatUnknown, showing plain
DOWN instead of DOWN:TCP.
2026-06-03 17:24:31 -04:00
lerko c25614c098 fix(tui): remove error truncation from detail panel
CI / test (pull_request) Successful in 2m29s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
Error row now word-wraps to terminal width instead of hard-truncating.
Probe results and state change errors show full text.
2026-06-03 16:57:27 -04:00
lerko 3d7ab5a49e feat(tui): classify error reasons on DOWN monitors
CI / test (pull_request) Successful in 2m30s
CI / lint (pull_request) Successful in 1m7s
CI / vulncheck (pull_request) Successful in 46s
Categorize raw error strings into DNS/TCP/TLS/HTTP/ICMP/TMO/PRIV
so users get instant triage from the monitor list without opening
the detail panel.

- Status column shows DOWN:DNS, DOWN:TLS, DOWN:HTTP, etc.
- Inline NAME column errors prefixed with category tag [DNS], [TLS]
- Detail panel shows connection chain checklist for HTTP monitors
  (✓ DNS → ✓ TCP → ✗ TLS → · HTTP) pinpointing failure layer
- All display-side only — no database or model changes
2026-06-03 16:33:12 -04:00
lerko 5d362fdbe6 refactor(tui): decompose god files into single-concern modules
CI / test (pull_request) Successful in 2m34s
CI / lint (pull_request) Successful in 56s
CI / vulncheck (pull_request) Successful in 51s
CI / test (push) Successful in 2m32s
CI / lint (push) Successful in 56s
CI / vulncheck (push) Successful in 51s
tui.go (1032→164) and tab_sites.go (993→482) violated "small functions"
and "testable in isolation" standards. Extracted 6 new files by concern:

- format.go: pure formatting functions (fmtLatency, fmtUptime, etc.)
- sparkline.go: sparkline rendering (latency, heartbeat, group)
- update.go: Update method decomposed into 15 named handlers
- view_dashboard.go: View, dashboard composition, tab bar, footer
- view_detail.go: site detail panel
- data.go: data refresh with extracted sortSitesForDisplay/filterSites

Added 17 unit tests for the newly-testable pure functions covering
format, sparkline, sort ordering, and filter logic. No behavioral
changes — strict move-and-extract refactor.
2026-06-02 21:06:30 -04:00
lerko 8f17deba67 chore: migrate module path to lerkolabs org
CI / test (pull_request) Successful in 2m39s
CI / lint (pull_request) Successful in 1m6s
CI / vulncheck (pull_request) Successful in 46s
Move Go module from gitea.lerkolabs.com/lerko/uptop to
gitea.lerkolabs.com/lerkolabs/uptop. Updates all imports,
go.mod, goreleaser owner, and README links.
2026-05-29 14:22:49 -04:00
lerko 026e969b74 chore: TUI screenshots, README polish, changelog rewrite (#32)
CI / test (push) Successful in 2m41s
CI / lint (push) Successful in 1m11s
CI / vulncheck (push) Successful in 56s
- Add 6 TUI screenshots to assets/ (monitors, alerts, logs, nodes, detail, theme)
- Rewrite README with hero image, badges, collapsible install sections
- Rewrite changelog to match actual CalVer tag history
- VHS tooling extracted to lerko/uptop-vhs

Reviewed-on: lerko/uptop#32
2026-05-29 17:45:31 +00:00
lerko cfbf01274d chore(tui): visual polish — detail sections, column headers, alert detail (#37)
CI / test (push) Successful in 2m40s
CI / lint (push) Successful in 1m2s
CI / vulncheck (push) Successful in 51s
Release / docker (push) Has been cancelled
Release / release (push) Has been cancelled
## Summary

Bundled remaining UX polish items from the screenshot review.

### Changes

**Detail panel sections (#5)**
- Fields grouped into ENDPOINT, TIMING, HTTP, CONFIG sections with subtle headers
- Matches existing PROBE RESULTS and STATE CHANGES section pattern
- Cleaner visual hierarchy without box-drawing clutter

**Omit unconfigured fields (#6)**
- Timeout hidden when 0 (unconfigured)
- Method hidden when default GET
- AcceptedCodes shows "200-299" explicitly when empty

**Column header (#7)**
- `LATENCY` → `LAT` (design short, never truncate — htop/btop pattern)

**Alert detail view (#8)**
- `i` key on Alerts tab opens full detail panel
- Shows: type, health status, last sent time, send/fail counts, last error
- Full config key:value pairs (untruncated)
- Keybinding: `[i/Esc] Back  [e] Edit  [t] Test  [q] Quit`

### Files (3)
- `internal/tui/tab_sites.go` — section headers, field omission, LAT header
- `internal/tui/tab_alerts.go` — viewAlertDetailPanel()
- `internal/tui/tui.go` — stateAlertDetail, key handler, render routing

Reviewed-on: lerko/uptop#37
2026-05-28 20:40:29 +00:00
lerko 0aa2f9cd8a feat: alert channel health indicator + test alerts
CI / test (pull_request) Successful in 2m46s
CI / lint (pull_request) Successful in 1m1s
CI / vulncheck (pull_request) Successful in 51s
Track alert delivery health at runtime:
- AlertHealth struct: LastSendAt, LastSendOK, LastError, SendCount, FailCount
- triggerAlert records success/failure after each Send()
- Health data exposed via GetAlertHealth() for TUI

Alerts tab enriched:
- Health dot column: green (OK), red (failed), gray (never sent)
- LAST SENT column: relative time ("2m ago", "never")
- [t] key sends test notification through selected channel

Inspired by Grafana's contact point health columns.
2026-05-27 21:23:06 -04:00
lerko b14d5e19db feat: logs tab overhaul — severity tags, filtering, recovery durations
CI / test (pull_request) Successful in 2m36s
CI / lint (pull_request) Successful in 1m1s
CI / vulncheck (pull_request) Successful in 51s
Logs tab visual overhaul:
- Severity-classified entries: DOWN (red), UP (green), WARN (amber),
  SYS (cyan), info (gray) — rendered as inline tags, not whole-line color
- Column-aligned format: [timestamp] [severity tag] [message]
- Filter toggle (f key): All vs Important only (hides retry noise)
- Header shows entry count, filter state, hidden count

Engine log improvements:
- Recovery messages include downtime duration ("was down 14m")
- LATE transition logged ("heartbeat overdue")
- Push monitor recovery includes downtime duration
2026-05-27 20:14:43 -04:00
lerko 5dc31108f8 feat: proper push monitor lifecycle — PENDING, LATE, DOWN states
CI / test (pull_request) Successful in 2m41s
CI / lint (pull_request) Successful in 1m7s
CI / vulncheck (pull_request) Successful in 46s
Push monitors no longer lie about status:

- PENDING stays until first heartbeat (no auto-promote to UP)
- LATE state (amber) when overdue but within grace period
- DOWN only after grace period expires
- Grace period = interval/2, minimum 60s

RecordHeartbeat now handles all transitions:
- PENDING → UP (first heartbeat, logged)
- LATE → UP (late arrival, logged)
- DOWN → UP (recovery, alert + state change persisted)

TUI updates:
- LATE rendered in amber/warning color
- Status bar shows LATE count separately
- Tab badge shows ⚠ for late monitors
- Sort order: DOWN > LATE > UP > PENDING > PAUSED
- Detail panel shows error for LATE monitors

Inspired by Healthchecks.io state machine (new/up/grace/down).
2026-05-27 19:56:50 -04:00
lerko bc3a44beac feat: show error reason when monitors go DOWN
CI / test (pull_request) Successful in 2m42s
CI / lint (pull_request) Successful in 1m11s
CI / vulncheck (pull_request) Successful in 51s
Propagate check failure reasons through the entire stack:
- Checker captures specific errors (DNS, timeout, HTTP status, SSL, etc.)
- Engine tracks LastError, StatusChangedAt, LastSuccessAt per monitor
- State transitions persisted to new state_changes table
- Detail panel shows error reason, HTTP code, state duration, last
  success time, and last 5 state change events
- Monitor table shows inline error preview for DOWN services
- Alert messages include error reason
- Probe nodes forward error reasons to leader

15 files changed across models, checker, engine, store, TUI, and probes.
2026-05-27 19:32:30 -04:00