Commit Graph

73 Commits

Author SHA1 Message Date
lerko ae141c62ba fix(store): replace panic with error return, handle unmarshal errors
generateToken() now returns (string, error) instead of panicking on
crypto/rand failure. All json.Unmarshal calls for alert settings now
check and propagate errors instead of silently ignoring them.

Adds Close() to Store interface for graceful shutdown support.
Skips malformed notification entries during Kuma import.
2026-05-23 13:15:39 -04:00
lerko ba53845193 Merge pull request 'fix(tui): visual polish and layout improvements' (#18) from fix/tui-visual-polish into main
Reviewed-on: lerko/uptime#18
2026-05-23 16:12:57 +00:00
lerko fb11e9ba85 fix(tui): stable monitor count and universal group icons
Site count in tab label and footer now reflects total monitors
(excluding groups) regardless of collapse state. Down count also
computed from all sites so collapsed groups with down children
still surface in the badge. Replaced Nerd Font folder glyphs
with standard Unicode triangles for cross-font compatibility.
2026-05-23 11:01:34 -04:00
lerko e84b64f8ed feat(tui): zebra striping, detail breadcrumb, sparkline stats, collapse persistence
Add alternating row backgrounds for easier table scanning. Detail panel
now shows breadcrumb path (Sites > Group > Name) and min/avg/max latency
stats below the sparkline. Group collapse state persists across restarts
via new preferences table in both SQLite and Postgres.
2026-05-22 20:53:23 -04:00
lerko 88e4f0ed69 fix(tui): group selection highlight, layout constants, group history graphs
Group rows now show selection background when navigated to. Layout
chrome extracted to named constants to prevent viewport drift. Groups
display aggregate history as dot sparkline (●) distinct from site
bar sparklines, with uptime computed from active children only.
Paused and maintenance children excluded from group aggregates.
2026-05-22 20:26:49 -04:00
lerko 8e948bf187 Merge pull request 'feat: incident management and maintenance windows' (#17) from feat/incident-management into main
Reviewed-on: lerko/uptime#17
2026.05.2
2026-05-22 23:34:16 +00:00
lerko dc672d6cba fix(tui): exclude maintenance'd monitors from down count and pulse
Sites badge, status line, and pulse indicator now skip monitors under
maintenance when counting DOWN — consistent with group behavior.
2026-05-22 19:25:27 -04:00
lerko a89584dac1 fix(engine): skip children in maintenance when computing group status
Group status now treats maintenance'd children like paused ones —
they're excluded from the UP/DOWN calculation. Prevents group from
showing DOWN when its only failing child is under maintenance.
2026-05-22 19:19:08 -04:00
lerko d437f54797 fix(tui): constrain form height to terminal and forward resize events
Forms overflowed past terminal because huh didn't know about the
surrounding chrome (header, footer, padding). Now sets WithHeight()
on every render and forwards WindowSizeMsg during form state.
2026-05-22 19:06:27 -04:00
lerko b146f34d19 feat: add incident management and maintenance windows
Maintenance windows suppress alerts during planned downtime while checks
continue running. Incidents provide informational tracking. Supports
targeting all monitors, single monitor, or group (applies to children).

New Maint tab in TUI with create/end/delete. Status page, JSON API, and
Prometheus metrics all reflect maintenance state.
2026-05-22 18:45:02 -04:00
lerko 5de834465f Merge pull request 'fix(tui): correct viewport sizing and dynamic chrome calculation' (#16) from fix/tui-viewport-sizing into main
Reviewed-on: lerko/uptime#16
2026-05-22 22:22:10 +00:00
lerko ea401136a9 fix(tui): correct viewport sizing and dynamic chrome calculation
Replace hardcoded row offset with counted chrome lines, account for
filter bar, and fix log viewport dimensions.
2026-05-22 18:19:08 -04:00
lerko 5a9b19b3e8 chore: add production docker-compose.yml
Single-container SQLite deploy for `docker compose up -d`.
2026-05-22 15:00:09 -04:00
lerko b13b1f18b1 Merge pull request 'release: 2026.05.1 — distributed probing, config-as-code, TUI polish' (#15) from develop into main
Reviewed-on: lerko/uptime#15
2026.05.1
2026-05-16 20:03:53 +00:00
lerko 887b8240f8 Merge pull request 'fix(security): strip push tokens from /status/json response' (#14) from fix/status-json-token-exposure into develop 2026-05-16 19:57:41 +00:00
lerko 025b1b61d0 fix(security): strip push tokens from /status/json response
The public status JSON endpoint was serializing full Site structs
including heartbeat tokens. An attacker could extract tokens and
forge heartbeats to suppress DOWN alerts. Now tokens are stripped
before encoding. Backup/export endpoint is unaffected.
2026-05-16 15:45:09 -04:00
lerko fa1042a2ec Merge pull request 'fix: persistent state — uptime, status, latency, and logs survive restarts' (#13) from fix/uptime-percentage into develop 2026-05-16 19:27:24 +00:00
lerko ed082e4080 feat: persist logs to DB, load on startup 2026-05-16 15:25:08 -04:00
lerko 4d375cf874 fix: seed status and latency from DB history on startup 2026-05-16 15:05:28 -04:00
lerko 52c85b11b8 fix(tui): compute uptime from windowed statuses, not running counters 2026-05-16 14:58:34 -04:00
lerko f65ff40a2d Merge pull request 'feat(tui): add type icons to sites table' (#12) from feat/tui-type-icons into develop 2026-05-16 18:41:11 +00:00
lerko 1eddb851b0 feat(tui): add type icons to sites table
Arrow-style icons per monitor type plus Nerd Font folder icons for
groups (closed when collapsed, open when expanded):
  → http, ↓ push, ↔ ping, ⊡ port, ◆ dns, / group
2026-05-16 14:35:38 -04:00
lerko 1b223b9725 Merge pull request 'feat(tui): DOWN-first sort, health pulse, filter, and sparkline fixes' (#11) from feat/tui-polish-2 into develop 2026-05-16 18:27:17 +00:00
lerko adf46a1654 fix(tui): increase history buffer to 60 so sparkline fills completely 2026-05-16 14:01:25 -04:00
lerko fc7b6f72e1 fix(tui): sparkline right-aligned — current time at right edge, dots fill left 2026-05-16 13:57:41 -04:00
lerko 1917540731 fix(tui): sparkline now spans full column width 2026-05-16 13:49:20 -04:00
lerko f01533080f feat(tui): split available width evenly between NAME and HISTORY columns 2026-05-16 13:43:34 -04:00
lerko cc9dc24892 fix(tui): sort children by ID before status to prevent map-order shuffling 2026-05-16 13:36:49 -04:00
lerko 426c38ea94 fix(tui): use stable sort to prevent site list shuffling each tick 2026-05-16 13:33:20 -04:00
lerko 22c6022121 feat(tui): DOWN-first sort, health pulse, and site filter
- DOWN/SSL EXP monitors float to top of sites list
- Pulse indicator turns red when any monitor is down, green when healthy
- Press / to filter sites by name, Enter to lock filter, Esc to clear
- Active filter shown in status bar
2026-05-16 13:28:37 -04:00
lerko 95d43e33f0 Merge pull request 'feat(tui): polish pass — status bar, badges, detail panel, modals' (#10) from feat/tui-polish into develop 2026-05-16 17:04:26 +00:00
lerko f2ea0dc758 feat(tui): bordered modals, welcome state, and dynamic name width
- Delete confirmation wrapped in rounded border box with danger color
- Empty sites view shows styled welcome box with onboarding hint
- NAME column width scales with terminal width (13-40 chars)
2026-05-16 12:56:09 -04:00
lerko 3bc8e31b89 fix(tui): make status bar and tab badges visible
- Tab badges now always show count (Sites (12)), not just on failure
- Status bar UP count uses green/red coloring instead of subtle gray
2026-05-16 12:28:09 -04:00
lerko 769954c8f5 feat(tui): add status bar, tab badges, and detail panel
Polish pass for TUI professionalism:
- Status bar replaces generic footer with live stats (UP/DOWN count,
  online probes) plus contextual key hints
- Tab badges show DOWN count on Sites tab and offline count on Nodes tab
- Detail panel (press i) shows full monitor info: URL, latency, uptime,
  SSL, probe results, sparkline — without entering edit mode
2026-05-16 12:25:46 -04:00
lerko 4ac4973eaf Merge pull request 'feat(cluster): add distributed probing foundation' (#9) from feat/distributed-probing-foundation into develop 2026-05-16 16:01:16 +00:00
lerko 0396acdc59 feat(cluster): add region affinity, Nodes TUI tab, and probe metrics
Phase 3 of distributed probing:
- Add regions column to sites table for per-monitor probe affinity
- Region-filtered probe assignments (empty regions = all probes)
- New Nodes TUI tab showing connected probes with status/region/last-seen
- Regions input field in site form for configuring probe affinity
- Config-as-code support for regions (export/import/diff)
- Prometheus upkeep_probe_up metric with per-node labels
- Reindex TUI tabs: Sites, Alerts, Logs, Nodes, Users
2026-05-16 11:50:16 -04:00
lerko ca5a42314f feat(cluster): add probe execution mode, check extraction, and result aggregation
Phase 2 of distributed probing:
- Extract check logic into standalone RunCheck() for use by probes
- Add probe cluster mode: stateless nodes that fetch assignments, execute
  checks, and report results to the leader
- Add multi-node result aggregation with configurable strategy
  (any-down, majority-down, all-down)
- Leader ingests probe results into engine live state and triggers alerts
- New env vars: UPKEEP_NODE_ID, UPKEEP_NODE_NAME, UPKEEP_NODE_REGION,
  UPKEEP_AGG_STRATEGY
- Example docker-compose.probe.yml with leader + 2 regional probes
2026-05-16 11:19:57 -04:00
lerko ca9faa0acd feat(cluster): add distributed probing foundation — schema, models, and probe APIs
Add node-aware check history and probe registration infrastructure:
- ProbeNode model and nodes table (SQLite + Postgres)
- node_id column on check_history for multi-source tracking
- Store interface: RegisterNode, GetNode, GetAllNodes, DeleteNode, SaveCheckFromNode
- Dialect: UpsertNodeSQL (INSERT OR REPLACE / ON CONFLICT)
- API endpoints: POST /api/probe/register, GET /api/probe/assignments, POST /api/probe/results
- Backward compatible: existing SaveCheck wraps SaveCheckFromNode with empty node_id
2026-05-16 11:05:06 -04:00
lerko c80ef44256 docs: rewrite README, remove upstream references
Replace old README that referenced rdgames1000 Docker images and
goupkeep.org docs. New README reflects current feature set and
credits the original project as the fork source.
2026-05-15 22:02:48 -04:00
lerko 6cbbd4849a Merge pull request 'feat(config): add config-as-code YAML import/export' (#8) from feat/config-as-code into develop
Reviewed-on: lerko/uptime#8
2026-05-16 01:10:29 +00:00
lerko 5b01b9ee30 feat(config): add config-as-code YAML import/export
Add declarative config-as-code support via YAML files. Monitors and
alerts can be exported, version controlled, and applied across instances.

- goupkeep export [-o file.yaml] dumps current state
- goupkeep apply -f file.yaml creates/updates to match desired state
- --dry-run shows planned changes without applying
- --prune deletes monitors/alerts not in the YAML
- Matching by name, alert references by name, nested group children
- CLI refactored to subcommands (apply, export, serve) with backward compat
- 24 tests covering apply, export, validation, round-trip idempotency
2026-05-15 20:40:49 -04:00
lerko 5a52f738db Merge pull request 'feat(tui): expose HTTP method and accepted status codes' (#7) from feat/expose-http-method-codes into develop
Reviewed-on: lerko/uptime#7
2026-05-15 19:43:28 +00:00
lerko 9e5bb74c5c feat(tui): expose HTTP method and accepted status codes in monitor form
DB fields existed but were never surfaced in the TUI. Adds an HTTP
Settings form group with method select (7 methods) and accepted
codes input, visible only for HTTP monitors.
2026-05-15 15:42:51 -04:00
lerko 4ebba64ba1 Merge pull request 'feat/next: alert providers, prometheus metrics, core refactors' (#6) from feat/next into develop
Reviewed-on: lerko/uptime#6
2026-05-15 19:35:12 +00:00
lerko 079270274f Merge pull request 'feat(metrics): add Prometheus /metrics endpoint' (#5) from feat/prometheus-metrics into feat/next
Reviewed-on: lerko/uptime#5
2026-05-15 15:35:03 +00:00
lerko b7b8aa6f03 feat(metrics): add Prometheus /metrics endpoint
Zero-dependency Prometheus text exposition format. Exposes monitor
up/down, latency, status code, check timestamps, pause state,
SSL cert expiry, and check counters — all from in-memory state.
2026-05-15 11:26:21 -04:00
lerko 52a54f9c5c feat(alert): add Telegram, PagerDuty, Pushover, Gotify providers
Expand alert provider count from 5 to 9. All new providers use
the shared HTTPProvider with closure-based payload functions.
Includes TUI form support and tests for each provider.
2026-05-15 10:53:38 -04:00
lerko f023e38fdc refactor(monitor): encapsulate engine state, add graceful shutdown and tests
Replace all monitor package-level mutable state with Engine struct.
All state (liveState, logStore, histories, tokenIndex, HTTP clients)
is now encapsulated in Engine, created via NewEngine(store).

Key changes:
- Engine struct holds all monitor state with proper mutex protection
- Engine.Start(ctx) and monitorRoutine respect context cancellation
  for graceful shutdown — no more leaked goroutines
- cluster.runFollowerLoop also respects context for clean exit
- Token index (map[string]int) for O(1) push heartbeat lookup,
  replacing O(n) linear scan through LiveState
- UpdateSiteConfig preserves 8 runtime fields instead of copying
  17 config fields individually
- triggerAlert goroutines get 30s timeout context
- All consumers (TUI, server, cluster, main) receive *Engine via
  constructor/parameter — no package-level state access
- main.go creates context.WithCancel, passes to engine and cluster

First test suite: 12 tests across store and alert packages
- Store: CRUD for sites/alerts/users, push token generation,
  import/export round-trip, check history persistence
- Alert: Discord/Slack/Webhook payload format, HTTP 4xx error
  propagation, Ntfy headers, unknown provider returns nil
2026-05-15 08:21:17 -04:00
lerko 0e6dc774cb refactor(tui): extract shared table rendering, fix cursor bounds
- New table_helpers.go with renderTable() and shared styles
- Remove 4 duplicated style blocks (header/cell/selected/border)
  from tab_alerts.go and tab_users.go
- All 3 tab views now use renderTable() for offset/end calc,
  selected row highlighting, and table construction
- Sites tab keeps siteGroupStyle via StyleOverride callback
- Clamp cursor to list length at end of refreshData() to prevent
  index-out-of-bounds after concurrent list changes
- Fix off-by-one in tab click handler (i <= maxTabs → i < tabCount)
2026-05-15 00:49:14 -04:00
lerko d6f33a4d1f refactor(alert): extract shared HTTPProvider for webhook-based alerts
Discord, Slack, and Webhook providers now use a single HTTPProvider
struct with a PayloadFunc for the only part that differs. Centralizes
response body handling and adds HTTP status code checking (4xx/5xx
now return errors instead of being silently ignored).

Email and Ntfy keep separate implementations (different protocols).
Adding a new HTTP-based alert provider is now a one-line PayloadFunc.
2026-05-15 00:46:05 -04:00