release: 2026.05.1 — distributed probing, config-as-code, TUI polish #15

2026-05-16T20:00:16Z

lerko commented

2026-05-16 20:00:16 +00:00

Release 2026.05.1

Major release bringing distributed probing, config-as-code, and significant TUI improvements.

Features

Distributed probing — stateless probe nodes execute checks and report to leader with configurable aggregation (any-down, majority-down, all-down), region affinity, Nodes TUI tab
Config-as-code — YAML export/import with goupkeep export and goupkeep apply, dry-run, prune mode
9 alert providers — added Telegram, PagerDuty, Pushover, Gotify
Prometheus /metrics endpoint
HTTP method + accepted status codes exposed in monitor form
Monitor groups with collapse/expand tree view
Per-site pause
TUI polish — status bar, tab badges, detail panel (i), filter (/), type icons, bordered modals, welcome state, dynamic column widths, health-aware pulse, DOWN-first sort

Fixes

Push tokens stripped from public /status/json (security)
XSS fix in status page and import error responses
Uptime computed from windowed checks, not unbounded counters
Status/latency/logs persist across restarts
Sparkline right-aligned with current time at right edge
Stable sort prevents list shuffling

Refactors

Engine struct encapsulates all monitor state
Store singleton removed, threaded explicitly
SQLite/Postgres unified via Dialect interface
Alert providers share HTTPProvider base
Check logic extracted for probe reuse

## Release 2026.05.1 Major release bringing distributed probing, config-as-code, and significant TUI improvements. ### Features - **Distributed probing** — stateless probe nodes execute checks and report to leader with configurable aggregation (any-down, majority-down, all-down), region affinity, Nodes TUI tab - **Config-as-code** — YAML export/import with `goupkeep export` and `goupkeep apply`, dry-run, prune mode - **9 alert providers** — added Telegram, PagerDuty, Pushover, Gotify - **Prometheus /metrics endpoint** - **HTTP method + accepted status codes** exposed in monitor form - **Monitor groups** with collapse/expand tree view - **Per-site pause** - **TUI polish** — status bar, tab badges, detail panel (i), filter (/), type icons, bordered modals, welcome state, dynamic column widths, health-aware pulse, DOWN-first sort ### Fixes - Push tokens stripped from public /status/json (security) - XSS fix in status page and import error responses - Uptime computed from windowed checks, not unbounded counters - Status/latency/logs persist across restarts - Sparkline right-aligned with current time at right edge - Stable sort prevents list shuffling ### Refactors - Engine struct encapsulates all monitor state - Store singleton removed, threaded explicitly - SQLite/Postgres unified via Dialect interface - Alert providers share HTTPProvider base - Check logic extracted for probe reuse

lerko added 47 commits 2026-05-16 20:00:17 +00:00

feat(tui,status): add per-site pause, fix viewport, polish status page d5ab3a18a4

Per-site pause: [p] key toggles pause for selected monitor in TUI.
Paused monitors skip checks, persist to DB, show on status page.

Status page: replace full-page reload with fetch-based DOM updates
to eliminate scroll-jump on refresh. Add summary bar (UP/DOWN/PAUSED
counts), stale-data indicator, and fix SSL EXP CSS class bug.

TUI: constrain tables to terminal width via lipgloss .Width() to
prevent row wrapping that pushed header off-screen. Add MaxHeight
safety net. Bump subtle style from #383838 to #565f89 for
readability on dark terminals.

Merge pull request 'feat(tui,status): add per-site pause, fix viewport, polish status page' (#1 ) from feat/pause into develop 2f8de35d4b

Reviewed-on: lerko/uptime#1

fix(tui,status,store): add delete confirm, input validation, XSS fix, history persistence e97780ad38

Prevent accidental deletes with y/n confirmation dialog. Validate all
numeric form inputs (interval, port, timeout, threshold, retries) with
range checks instead of silently defaulting to zero. Escape user-supplied
data in status page JavaScript to close XSS via monitor names. Persist
check history to new check_history table so sparklines and uptime
percentages survive restarts.

refactor(tui): replace database ID column with row counter cfcd71dabe

Display sequential # instead of internal database IDs in sites, alerts,
and users tables for a cleaner view without gaps from deleted records.

feat(tui): add monitor groups with collapse/expand and tree view c480f519c4

Groups act as visual organizers in the sites table. Monitors can be
assigned to a parent group via the form. Group rows show aggregated
worst-child status, children render with tree chars (├/└), and Space
toggles collapse/expand. Group form hides irrelevant connection and
advanced sections.

Merge pull request 'fix/feat: UX polish, security fixes, groups' (#2 ) from fix/polish-ux-safety into develop 41a8a90bed

Reviewed-on: lerko/uptime#2

style(tui): add fixed column widths to sites table 77fa6324f2

Use lipgloss StyleFunc to set per-column widths, with NAME as
the flex column absorbing remaining space. History column tied
to sparkWidth for consistency.

fix(core): correctness and robustness fixes across all subsystems 4d5116644f

- Move status page template to package-level template.Must (panic on
  parse error at init instead of nil deref at runtime)
- Fix XSS in import error responses (log detail server-side, return
  generic message to client)
- Handle ListenAndServe errors in HTTP and SSH servers
- Use defer resp.Body.Close() in all alert providers, check
  json.Marshal errors
- Share HTTP clients across checks instead of creating per-request
- Use http.NewRequestWithContext for per-site timeout control
- Support HTTP method field (was always GET despite DB storing method)
- Implement AcceptedCodes validation (was hardcoded >= 400 despite DB
  storing accepted code ranges)
- Add defer tx.Rollback() to ImportData for transaction safety

refactor(store): unify SQLite and Postgres into dialect-based SQLStore ab75f61c6b

Extract shared SQLStore with Dialect interface for the ~5% that
differs between backends (DDL, placeholders, sequence resets).

- New dialect.go: Dialect interface + placeholder rewriter (? → $N)
- New sqlstore.go: single implementation of all 19 Store methods
- sqlite.go: reduced from 286 to 83 lines (SQLiteDialect only)
- postgres.go: reduced from 266 to 78 lines (PostgresDialect only)
- main.go: use NewSQLiteStore/NewPostgresStore constructors

Zero CRUD logic duplication. Every future schema change written once.

refactor(store): add error returns to all Store interface methods d4f4012c8a

Every Store method now returns an error. Callers handle errors
gracefully — TUI logs to event log, server returns HTTP 500,
monitor engine logs and retries. All rows.Scan() errors are now
checked in sqlstore.go instead of silently appending corrupt data.

- GetSites, GetAllAlerts, GetAllUsers return ([]T, error)
- GetAlert returns (AlertConfig, error) instead of (AlertConfig, bool)
- AddSite, UpdateSite, DeleteSite, etc. all return error
- SaveCheck, LoadAllHistory, ExportData return error
- ~25 caller sites updated across tui, server, monitor, main

refactor(core): remove store global singleton, thread store explicitly a6bb9a7aff

Remove store.Get()/SetGlobal()/Current. Store is now passed explicitly
to all consumers via constructor parameters and function arguments.

- TUI Model holds store field, set via InitialModel(isAdmin, store)
- monitor.StartEngine(s) and InitHistoryFromStore(s) accept store
- server.Start(cfg, s) closes over store in HTTP handlers
- main.go threads store to SSH server, TUI, monitor, server
- isKeyAllowed receives store as parameter

No more hidden dependency on package-level mutable state in store pkg.
Monitor package still uses package-level state (LiveState, etc.) — will
be encapsulated into Engine struct in Phase 7.

refactor(alert): extract shared HTTPProvider for webhook-based alerts d6f33a4d1f

Discord, Slack, and Webhook providers now use a single HTTPProvider
struct with a PayloadFunc for the only part that differs. Centralizes
response body handling and adds HTTP status code checking (4xx/5xx
now return errors instead of being silently ignored).

Email and Ntfy keep separate implementations (different protocols).
Adding a new HTTP-based alert provider is now a one-line PayloadFunc.

refactor(tui): extract shared table rendering, fix cursor bounds 0e6dc774cb

- New table_helpers.go with renderTable() and shared styles
- Remove 4 duplicated style blocks (header/cell/selected/border)
  from tab_alerts.go and tab_users.go
- All 3 tab views now use renderTable() for offset/end calc,
  selected row highlighting, and table construction
- Sites tab keeps siteGroupStyle via StyleOverride callback
- Clamp cursor to list length at end of refreshData() to prevent
  index-out-of-bounds after concurrent list changes
- Fix off-by-one in tab click handler (i <= maxTabs → i < tabCount)

refactor(monitor): encapsulate engine state, add graceful shutdown and tests f023e38fdc

Replace all monitor package-level mutable state with Engine struct.
All state (liveState, logStore, histories, tokenIndex, HTTP clients)
is now encapsulated in Engine, created via NewEngine(store).

Key changes:
- Engine struct holds all monitor state with proper mutex protection
- Engine.Start(ctx) and monitorRoutine respect context cancellation
  for graceful shutdown — no more leaked goroutines
- cluster.runFollowerLoop also respects context for clean exit
- Token index (map[string]int) for O(1) push heartbeat lookup,
  replacing O(n) linear scan through LiveState
- UpdateSiteConfig preserves 8 runtime fields instead of copying
  17 config fields individually
- triggerAlert goroutines get 30s timeout context
- All consumers (TUI, server, cluster, main) receive *Engine via
  constructor/parameter — no package-level state access
- main.go creates context.WithCancel, passes to engine and cluster

First test suite: 12 tests across store and alert packages
- Store: CRUD for sites/alerts/users, push token generation,
  import/export round-trip, check history persistence
- Alert: Discord/Slack/Webhook payload format, HTTP 4xx error
  propagation, Ntfy headers, unknown provider returns nil

feat(alert): add Telegram, PagerDuty, Pushover, Gotify providers 52a54f9c5c

Expand alert provider count from 5 to 9. All new providers use
the shared HTTPProvider with closure-based payload functions.
Includes TUI form support and tests for each provider.

feat(metrics): add Prometheus /metrics endpoint b7b8aa6f03

Zero-dependency Prometheus text exposition format. Exposes monitor
up/down, latency, status code, check timestamps, pause state,
SSL cert expiry, and check counters — all from in-memory state.

Merge pull request 'feat(metrics): add Prometheus /metrics endpoint' (#5 ) from feat/prometheus-metrics into feat/next 079270274f

Reviewed-on: lerko/uptime#5

Merge pull request 'feat/next: alert providers, prometheus metrics, core refactors' (#6 ) from feat/next into develop 4ebba64ba1

Reviewed-on: lerko/uptime#6

feat(tui): expose HTTP method and accepted status codes in monitor form 9e5bb74c5c

DB fields existed but were never surfaced in the TUI. Adds an HTTP
Settings form group with method select (7 methods) and accepted
codes input, visible only for HTTP monitors.

Merge pull request 'feat(tui): expose HTTP method and accepted status codes' (#7 ) from feat/expose-http-method-codes into develop 5a52f738db

Reviewed-on: lerko/uptime#7

feat(config): add config-as-code YAML import/export 5b01b9ee30

Add declarative config-as-code support via YAML files. Monitors and
alerts can be exported, version controlled, and applied across instances.

- goupkeep export [-o file.yaml] dumps current state
- goupkeep apply -f file.yaml creates/updates to match desired state
- --dry-run shows planned changes without applying
- --prune deletes monitors/alerts not in the YAML
- Matching by name, alert references by name, nested group children
- CLI refactored to subcommands (apply, export, serve) with backward compat
- 24 tests covering apply, export, validation, round-trip idempotency

Merge pull request 'feat(config): add config-as-code YAML import/export' (#8 ) from feat/config-as-code into develop 6cbbd4849a

Reviewed-on: lerko/uptime#8

docs: rewrite README, remove upstream references c80ef44256

Replace old README that referenced rdgames1000 Docker images and
goupkeep.org docs. New README reflects current feature set and
credits the original project as the fork source.

feat(cluster): add distributed probing foundation — schema, models, and probe APIs ca9faa0acd

Add node-aware check history and probe registration infrastructure:
- ProbeNode model and nodes table (SQLite + Postgres)
- node_id column on check_history for multi-source tracking
- Store interface: RegisterNode, GetNode, GetAllNodes, DeleteNode, SaveCheckFromNode
- Dialect: UpsertNodeSQL (INSERT OR REPLACE / ON CONFLICT)
- API endpoints: POST /api/probe/register, GET /api/probe/assignments, POST /api/probe/results
- Backward compatible: existing SaveCheck wraps SaveCheckFromNode with empty node_id

feat(cluster): add probe execution mode, check extraction, and result aggregation ca5a42314f

Phase 2 of distributed probing:
- Extract check logic into standalone RunCheck() for use by probes
- Add probe cluster mode: stateless nodes that fetch assignments, execute
  checks, and report results to the leader
- Add multi-node result aggregation with configurable strategy
  (any-down, majority-down, all-down)
- Leader ingests probe results into engine live state and triggers alerts
- New env vars: UPKEEP_NODE_ID, UPKEEP_NODE_NAME, UPKEEP_NODE_REGION,
  UPKEEP_AGG_STRATEGY
- Example docker-compose.probe.yml with leader + 2 regional probes

feat(cluster): add region affinity, Nodes TUI tab, and probe metrics 0396acdc59

Phase 3 of distributed probing:
- Add regions column to sites table for per-monitor probe affinity
- Region-filtered probe assignments (empty regions = all probes)
- New Nodes TUI tab showing connected probes with status/region/last-seen
- Regions input field in site form for configuring probe affinity
- Config-as-code support for regions (export/import/diff)
- Prometheus upkeep_probe_up metric with per-node labels
- Reindex TUI tabs: Sites, Alerts, Logs, Nodes, Users

Merge pull request 'feat(cluster): add distributed probing foundation' (#9 ) from feat/distributed-probing-foundation into develop 4ac4973eaf

feat(tui): add status bar, tab badges, and detail panel 769954c8f5

Polish pass for TUI professionalism:
- Status bar replaces generic footer with live stats (UP/DOWN count,
  online probes) plus contextual key hints
- Tab badges show DOWN count on Sites tab and offline count on Nodes tab
- Detail panel (press i) shows full monitor info: URL, latency, uptime,
  SSL, probe results, sparkline — without entering edit mode

fix(tui): make status bar and tab badges visible 3bc8e31b89

- Tab badges now always show count (Sites (12)), not just on failure
- Status bar UP count uses green/red coloring instead of subtle gray

feat(tui): bordered modals, welcome state, and dynamic name width f2ea0dc758

- Delete confirmation wrapped in rounded border box with danger color
- Empty sites view shows styled welcome box with onboarding hint
- NAME column width scales with terminal width (13-40 chars)

Merge pull request 'feat(tui): polish pass — status bar, badges, detail panel, modals' (#10 ) from feat/tui-polish into develop 95d43e33f0

feat(tui): DOWN-first sort, health pulse, and site filter 22c6022121

- DOWN/SSL EXP monitors float to top of sites list
- Pulse indicator turns red when any monitor is down, green when healthy
- Press / to filter sites by name, Enter to lock filter, Esc to clear
- Active filter shown in status bar

fix(tui): use stable sort to prevent site list shuffling each tick 426c38ea94

fix(tui): sort children by ID before status to prevent map-order shuffling cc9dc24892

feat(tui): split available width evenly between NAME and HISTORY columns f01533080f

fix(tui): sparkline now spans full column width 1917540731

fix(tui): sparkline right-aligned — current time at right edge, dots fill left fc7b6f72e1

fix(tui): increase history buffer to 60 so sparkline fills completely adf46a1654

Merge pull request 'feat(tui): DOWN-first sort, health pulse, filter, and sparkline fixes' (#11 ) from feat/tui-polish-2 into develop 1b223b9725

feat(tui): add type icons to sites table 1eddb851b0

Arrow-style icons per monitor type plus Nerd Font folder icons for
groups (closed when collapsed, open when expanded):
  → http, ↓ push, ↔ ping, ⊡ port, ◆ dns, / group

Merge pull request 'feat(tui): add type icons to sites table' (#12 ) from feat/tui-type-icons into develop f65ff40a2d

fix(tui): compute uptime from windowed statuses, not running counters 52c85b11b8

fix: seed status and latency from DB history on startup 4d375cf874

feat: persist logs to DB, load on startup ed082e4080

Merge pull request 'fix: persistent state — uptime, status, latency, and logs survive restarts' (#13 ) from fix/uptime-percentage into develop fa1042a2ec

fix(security): strip push tokens from /status/json response 025b1b61d0

The public status JSON endpoint was serializing full Site structs
including heartbeat tokens. An attacker could extract tokens and
forge heartbeats to suppress DOWN alerts. Now tokens are stripped
before encoding. Backup/export endpoint is unaffected.

Merge pull request 'fix(security): strip push tokens from /status/json response' (#14 ) from fix/status-json-token-exposure into develop 887b8240f8

lerko merged commit b13b1f18b1 into main

2026-05-16 20:03:54 +00:00

lerko referenced this issue from a commit

2026-05-16 20:03:54 +00:00

Merge pull request 'release: 2026.05.1 — distributed probing, config-as-code, TUI polish' (#15) from develop into main

lerko referenced this pull request

2026-06-27 15:00:36 +00:00

feat(store): detect overlapping maintenance windows #151

Sign in to join this conversation.