feat: show error reason when monitors go DOWN #33
Reference in New Issue
Block a user
Delete Branch "feat/error-reason"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
When a monitor goes DOWN, the app now tells you WHY — the #1 UX gap from the screenshot review.
What changed
Error propagation — checker captures specific failure reasons at all 6 check types:
connection refused,timeout,HTTP 503 (expected 200-299),SSL certificate expiredno ICMP response,ping failed: <reason>dial tcp: connection refusedDNS query failed: <reason>,DNS RCODE: NXDOMAINheartbeat missedState tracking — engine tracks per-monitor:
LastError— most recent error stringStatusChangedAt— when status last changedLastSuccessAt— last UP check timestampState change persistence — new
state_changestable records every UP↔DOWN transition with timestamp + error. Survives restarts.TUI detail panel — now shows:
Monitor table — DOWN rows show truncated error inline after the name in subtle gray
Alert messages — include error reason:
Monitor 'X' is DOWN: connection refusedProbe nodes — forward error reasons to leader via
error_reasonJSON field (backward-compatible)Files touched (15)
internal/models/models.go— Site fields + StateChange structinternal/monitor/checker.go— ErrorReason on CheckResultinternal/monitor/monitor.go— handleStatusChange, state tracking, state persistenceinternal/monitor/aggregator.go— ErrorReason on NodeResultinternal/store/store.go— SaveStateChange + GetStateChanges interfaceinternal/store/sqlstore.go— implementationsinternal/store/sqlite.go+postgres.go— state_changes tableinternal/tui/tab_sites.go— detail panel, inline error, fmtDurationinternal/cluster/probe.go— error_reason in probe resultsinternal/server/server.go— accept + forward probe errors