feat: incident management and maintenance windows #17

Merged
lerko merged 4 commits from feat/incident-management into main 2026-05-22 23:34:16 +00:00
Owner

Summary

  • New maintenance_windows DB table (SQLite + PostgreSQL) with full CRUD
  • Maintenance windows suppress alerts while checks continue running — uptime history stays accurate
  • Incident type for informational tracking (no alert suppression)
  • Target scope: all monitors, single monitor, or group (propagates to children)
  • New Maint TUI tab with create (n), end early (x), and delete (d)
  • Duration-based form: 1h/2h/4h/8h/indefinite/custom
  • Sites tab shows MAINT status with purple styling for monitors under maintenance
  • Detail panel shows active maintenance window title
  • Status page: .MAINT CSS class, summary counts MAINT separately
  • /status/json overrides status to "MAINT" for affected monitors
  • Prometheus upkeep_monitor_maintenance gauge
  • Backup/restore includes maintenance windows
  • Fixed pre-existing bug: user delete confirmation (tab index mismatch)

Test plan

  • Create maintenance window via TUI → appears in Maint tab as ACTIVE
  • Take monitor DOWN during maintenance → no alert fires, log shows "suppressed"
  • End maintenance via x → next DOWN triggers alert normally
  • Group targeting → window on group, children show MAINT
  • Status page → MAINT status and purple badge render
  • /status/json → MAINT status in JSON output
  • Backup/restore → export with window, import on fresh DB
  • Indefinite window → stays active until manually ended
  • Prometheus /metricsupkeep_monitor_maintenance present
## Summary - New `maintenance_windows` DB table (SQLite + PostgreSQL) with full CRUD - Maintenance windows suppress alerts while checks continue running — uptime history stays accurate - Incident type for informational tracking (no alert suppression) - Target scope: all monitors, single monitor, or group (propagates to children) - New **Maint** TUI tab with create (`n`), end early (`x`), and delete (`d`) - Duration-based form: 1h/2h/4h/8h/indefinite/custom - Sites tab shows `MAINT` status with purple styling for monitors under maintenance - Detail panel shows active maintenance window title - Status page: `.MAINT` CSS class, summary counts MAINT separately - `/status/json` overrides status to "MAINT" for affected monitors - Prometheus `upkeep_monitor_maintenance` gauge - Backup/restore includes maintenance windows - Fixed pre-existing bug: user delete confirmation (tab index mismatch) ## Test plan - [x] Create maintenance window via TUI → appears in Maint tab as ACTIVE - [x] Take monitor DOWN during maintenance → no alert fires, log shows "suppressed" - [x] End maintenance via `x` → next DOWN triggers alert normally - [x] Group targeting → window on group, children show MAINT - [x] Status page → MAINT status and purple badge render - [ ] `/status/json` → MAINT status in JSON output - [x] Backup/restore → export with window, import on fresh DB - [x] Indefinite window → stays active until manually ended - [x] Prometheus `/metrics` → `upkeep_monitor_maintenance` present
lerko added 1 commit 2026-05-22 22:45:21 +00:00
Maintenance windows suppress alerts during planned downtime while checks
continue running. Incidents provide informational tracking. Supports
targeting all monitors, single monitor, or group (applies to children).

New Maint tab in TUI with create/end/delete. Status page, JSON API, and
Prometheus metrics all reflect maintenance state.
lerko added 1 commit 2026-05-22 23:06:31 +00:00
Forms overflowed past terminal because huh didn't know about the
surrounding chrome (header, footer, padding). Now sets WithHeight()
on every render and forwards WindowSizeMsg during form state.
lerko added 1 commit 2026-05-22 23:19:12 +00:00
Group status now treats maintenance'd children like paused ones —
they're excluded from the UP/DOWN calculation. Prevents group from
showing DOWN when its only failing child is under maintenance.
lerko added 1 commit 2026-05-22 23:25:30 +00:00
Sites badge, status line, and pulse indicator now skip monitors under
maintenance when counting DOWN — consistent with group behavior.
lerko merged commit 8e948bf187 into main 2026-05-22 23:34:16 +00:00
lerko deleted branch feat/incident-management 2026-05-22 23:34:17 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: lerkolabs/uptop#17