Replace continuous surplus distribution with two fixed layouts per table.
Breakpoint at 120 columns — matches how btop/k9s do it.
Compact (<120): short headers (LAT, UP%, RT, ST, MON, SENT, VER),
tight fixed widths, no surplus guessing.
Wide (≥120): full headers (LATENCY, UPTIME, RETRIES, STATUS, MONITORS,
LAST SENT, VERSION), generous widths.
Sites tab keeps content-aware NAME sizing + sparkline flex.
All other tabs (Alerts, Maint, Nodes, Users) use simple fixed tiers.
Removed old computeTableLayout/colDef/tierCol/pickTier — no longer needed.
Extract shared computeTableLayout() into table_helpers.go — takes column
definitions with short/full headers, min/max widths, and a flex column
that absorbs surplus space. All tabs now use it:
- Alerts: CONFIG column is flex, NAME/TYPE/SENT expand with width
- Maint: TITLE column is flex, TYPE/MONITORS/STATUS/dates expand
- Nodes: NAME column is flex, REGION/LAST SEEN/VERSION expand
- Users: PUBLIC KEY column is flex, USERNAME expands
- Sites: uses same colDef type (keeps special dual-flex for NAME+HISTORY)
Headers auto-switch short/full based on available width across all tabs.
Was hardcoded to 30 — actual overhead is 2 (borders) + numCols-1
(separators) = 10 for 9 columns. The 20-char gap was being
redistributed by lipgloss into columns like # making them too wide.
lipgloss table with Width(tableWidth) redistributes surplus space across
all columns. Adding MaxWidth() caps each column to its computed width.
Also dump any remaining surplus into the HISTORY sparkline column.
Was width=0 (auto) which let lipgloss over-allocate the column,
causing visible empty space between truncated names and TYPE column.
Now set to nameW explicitly so column width = truncation limit.
Replace hardcoded column widths with dynamic layout system:
- Each column has short/full header and min/max width
- At narrow terminals: LAT, UP%, RT, compact widths
- At wide terminals: LATENCY, UPTIME, RETRIES, expanded widths
- Surplus space distributed left-to-right across expandable columns
- Headers switch between short/full based on actual column width
Column definitions:
# (4-6) TYPE (8-10) STATUS (8-10) LAT/LATENCY (5-10)
UP%/UPTIME (5-8) SSL (5-7) RT/RETRIES (5-9)
Detail panel:
- Grouped fields into sections (ENDPOINT, TIMING, HTTP, CONFIG)
- Omit Timeout when 0 (unconfigured)
- Omit Method when default GET
- Show explicit "200-299" when AcceptedCodes empty
Table:
- LATENCY header → LAT (design short, never truncate)
Alerts:
- Press [i] for alert detail panel: full config, health status,
send counts, last error
- Keybinding display updated with [i]Info
Bundled remaining UX polish items from screenshot review.
Push monitors no longer lie about status:
- PENDING stays until first heartbeat (no auto-promote to UP)
- LATE state (amber) when overdue but within grace period
- DOWN only after grace period expires
- Grace period = interval/2, minimum 60s
RecordHeartbeat now handles all transitions:
- PENDING → UP (first heartbeat, logged)
- LATE → UP (late arrival, logged)
- DOWN → UP (recovery, alert + state change persisted)
TUI updates:
- LATE rendered in amber/warning color
- Status bar shows LATE count separately
- Tab badge shows ⚠ for late monitors
- Sort order: DOWN > LATE > UP > PENDING > PAUSED
- Detail panel shows error for LATE monitors
Inspired by Healthchecks.io state machine (new/up/grace/down).
Propagate check failure reasons through the entire stack:
- Checker captures specific errors (DNS, timeout, HTTP status, SSL, etc.)
- Engine tracks LastError, StatusChangedAt, LastSuccessAt per monitor
- State transitions persisted to new state_changes table
- Detail panel shows error reason, HTTP code, state duration, last
success time, and last 5 state change events
- Monitor table shows inline error preview for DOWN services
- Alert messages include error reason
- Probe nodes forward error reasons to leader
15 files changed across models, checker, engine, store, TUI, and probes.
Flexoki Dark (default), Flexoki Light, Catppuccin Mocha, Nord.
Press T to cycle themes; selection persists in preferences.
All hardcoded colors replaced with theme-driven values.
Dedicated ZebraBg per theme for subtle row striping.
Consistent with interval/timeout validators that already skip for
group monitors. Prevents potential validation block if field is
cleared while editing.
URL, SSL threshold, and port validators blocked form progression
when editing monitors that don't use those fields (e.g. ping monitors
failing URL validation, non-SSL sites failing threshold check).
Scope each validator to fire only for its relevant monitor type.
Site count in tab label and footer now reflects total monitors
(excluding groups) regardless of collapse state. Down count also
computed from all sites so collapsed groups with down children
still surface in the badge. Replaced Nerd Font folder glyphs
with standard Unicode triangles for cross-font compatibility.
Add alternating row backgrounds for easier table scanning. Detail panel
now shows breadcrumb path (Sites > Group > Name) and min/avg/max latency
stats below the sparkline. Group collapse state persists across restarts
via new preferences table in both SQLite and Postgres.
Group rows now show selection background when navigated to. Layout
chrome extracted to named constants to prevent viewport drift. Groups
display aggregate history as dot sparkline (●) distinct from site
bar sparklines, with uptime computed from active children only.
Paused and maintenance children excluded from group aggregates.
Maintenance windows suppress alerts during planned downtime while checks
continue running. Incidents provide informational tracking. Supports
targeting all monitors, single monitor, or group (applies to children).
New Maint tab in TUI with create/end/delete. Status page, JSON API, and
Prometheus metrics all reflect maintenance state.
Arrow-style icons per monitor type plus Nerd Font folder icons for
groups (closed when collapsed, open when expanded):
→ http, ↓ push, ↔ ping, ⊡ port, ◆ dns, / group
- Delete confirmation wrapped in rounded border box with danger color
- Empty sites view shows styled welcome box with onboarding hint
- NAME column width scales with terminal width (13-40 chars)
Polish pass for TUI professionalism:
- Status bar replaces generic footer with live stats (UP/DOWN count,
online probes) plus contextual key hints
- Tab badges show DOWN count on Sites tab and offline count on Nodes tab
- Detail panel (press i) shows full monitor info: URL, latency, uptime,
SSL, probe results, sparkline — without entering edit mode
Phase 3 of distributed probing:
- Add regions column to sites table for per-monitor probe affinity
- Region-filtered probe assignments (empty regions = all probes)
- New Nodes TUI tab showing connected probes with status/region/last-seen
- Regions input field in site form for configuring probe affinity
- Config-as-code support for regions (export/import/diff)
- Prometheus upkeep_probe_up metric with per-node labels
- Reindex TUI tabs: Sites, Alerts, Logs, Nodes, Users
DB fields existed but were never surfaced in the TUI. Adds an HTTP
Settings form group with method select (7 methods) and accepted
codes input, visible only for HTTP monitors.
Replace all monitor package-level mutable state with Engine struct.
All state (liveState, logStore, histories, tokenIndex, HTTP clients)
is now encapsulated in Engine, created via NewEngine(store).
Key changes:
- Engine struct holds all monitor state with proper mutex protection
- Engine.Start(ctx) and monitorRoutine respect context cancellation
for graceful shutdown — no more leaked goroutines
- cluster.runFollowerLoop also respects context for clean exit
- Token index (map[string]int) for O(1) push heartbeat lookup,
replacing O(n) linear scan through LiveState
- UpdateSiteConfig preserves 8 runtime fields instead of copying
17 config fields individually
- triggerAlert goroutines get 30s timeout context
- All consumers (TUI, server, cluster, main) receive *Engine via
constructor/parameter — no package-level state access
- main.go creates context.WithCancel, passes to engine and cluster
First test suite: 12 tests across store and alert packages
- Store: CRUD for sites/alerts/users, push token generation,
import/export round-trip, check history persistence
- Alert: Discord/Slack/Webhook payload format, HTTP 4xx error
propagation, Ntfy headers, unknown provider returns nil
- New table_helpers.go with renderTable() and shared styles
- Remove 4 duplicated style blocks (header/cell/selected/border)
from tab_alerts.go and tab_users.go
- All 3 tab views now use renderTable() for offset/end calc,
selected row highlighting, and table construction
- Sites tab keeps siteGroupStyle via StyleOverride callback
- Clamp cursor to list length at end of refreshData() to prevent
index-out-of-bounds after concurrent list changes
- Fix off-by-one in tab click handler (i <= maxTabs → i < tabCount)
Remove store.Get()/SetGlobal()/Current. Store is now passed explicitly
to all consumers via constructor parameters and function arguments.
- TUI Model holds store field, set via InitialModel(isAdmin, store)
- monitor.StartEngine(s) and InitHistoryFromStore(s) accept store
- server.Start(cfg, s) closes over store in HTTP handlers
- main.go threads store to SSH server, TUI, monitor, server
- isKeyAllowed receives store as parameter
No more hidden dependency on package-level mutable state in store pkg.
Monitor package still uses package-level state (LiveState, etc.) — will
be encapsulated into Engine struct in Phase 7.
Every Store method now returns an error. Callers handle errors
gracefully — TUI logs to event log, server returns HTTP 500,
monitor engine logs and retries. All rows.Scan() errors are now
checked in sqlstore.go instead of silently appending corrupt data.
- GetSites, GetAllAlerts, GetAllUsers return ([]T, error)
- GetAlert returns (AlertConfig, error) instead of (AlertConfig, bool)
- AddSite, UpdateSite, DeleteSite, etc. all return error
- SaveCheck, LoadAllHistory, ExportData return error
- ~25 caller sites updated across tui, server, monitor, main
Use lipgloss StyleFunc to set per-column widths, with NAME as
the flex column absorbing remaining space. History column tied
to sparkWidth for consistency.
Groups act as visual organizers in the sites table. Monitors can be
assigned to a parent group via the form. Group rows show aggregated
worst-child status, children render with tree chars (├/└), and Space
toggles collapse/expand. Group form hides irrelevant connection and
advanced sections.
Prevent accidental deletes with y/n confirmation dialog. Validate all
numeric form inputs (interval, port, timeout, threshold, retries) with
range checks instead of silently defaulting to zero. Escape user-supplied
data in status page JavaScript to close XSS via monitor names. Persist
check history to new check_history table so sparklines and uptime
percentages survive restarts.
Per-site pause: [p] key toggles pause for selected monitor in TUI.
Paused monitors skip checks, persist to DB, show on status page.
Status page: replace full-page reload with fetch-based DOM updates
to eliminate scroll-jump on refresh. Add summary bar (UP/DOWN/PAUSED
counts), stale-data indicator, and fix SSL EXP CSS class bug.
TUI: constrain tables to terminal width via lipgloss .Width() to
prevent row wrapping that pushed header off-screen. Add MaxHeight
safety net. Bump subtle style from #383838 to #565f89 for
readability on dark terminals.
Add Hostname, Port, Timeout, Method, Description, ParentID, AcceptedCodes,
DNSResolveType, DNSServer, and IgnoreTLS fields. Refactor AddSite/UpdateSite
to accept models.Site instead of individual params. Includes DB migrations
for existing databases, per-monitor timeout/TLS in the engine, new type
options in TUI forms, and TYPE column in the sites table.