21 Commits

Author SHA1 Message Date
lerko ec898ff943 Merge pull request 'fix(ci): use docker-builder runner for image builds' (#40) from fix/docker-release into main
CI / test (push) Successful in 2m36s
CI / lint (push) Successful in 1m11s
CI / vulncheck (push) Successful in 56s
Release / release (push) Successful in 2m28s
Release / docker (push) Successful in 26m59s
Reviewed-on: #40
2026-05-29 22:38:24 +00:00
lerko 38c7739995 fix(ci): use docker-builder runner for Docker image builds
CI / lint (pull_request) Successful in 2m14s
CI / vulncheck (pull_request) Successful in 51s
CI / test (pull_request) Successful in 3m59s
2026-05-29 18:01:07 -04:00
lerko 5679dffffa fix(ci): use internal Gitea URL for GoReleaser API calls
CI / test (push) Successful in 2m49s
CI / lint (push) Successful in 1m11s
CI / vulncheck (push) Successful in 1m1s
Release / release (push) Successful in 2m18s
Release / docker (push) Failing after 3m38s
2026-05-29 17:26:57 -04:00
lerko 9a4985e355 Merge pull request 'fix(ci): install git and gcc for GoReleaser' (#39) from fix/release-pipeline into main
CI / test (push) Successful in 2m36s
CI / lint (push) Successful in 1m22s
CI / vulncheck (push) Successful in 46s
Release / release (push) Failing after 2m12s
Release / docker (push) Has been skipped
Reviewed-on: #39
2026-05-29 20:13:01 +00:00
lerko 65406ce69c fix(ci): install git and gcc for GoReleaser in release pipeline
CI / test (pull_request) Successful in 2m49s
CI / lint (pull_request) Successful in 1m12s
CI / vulncheck (pull_request) Successful in 56s
2026-05-29 16:02:28 -04:00
lerko 2474b341ad chore: clean up dockerignore
CI / test (push) Successful in 2m36s
CI / lint (push) Successful in 1m11s
CI / vulncheck (push) Successful in 56s
Release / release (push) Failing after 30s
Release / docker (push) Has been skipped
2026-05-29 15:42:51 -04:00
lerko b0762800ac docs: update changelog for 2026.05.5
CI / test (push) Successful in 2m49s
CI / lint (push) Successful in 1m12s
CI / vulncheck (push) Successful in 46s
2026-05-29 15:37:49 -04:00
lerko 08bcdd6481 chore: move docker-compose files to deploy/
CI / test (push) Successful in 2m54s
CI / lint (push) Successful in 1m12s
CI / vulncheck (push) Successful in 56s
2026-05-29 15:30:49 -04:00
lerko ebf8bfb097 chore: add CI status badge to README
CI / test (push) Successful in 2m44s
CI / lint (push) Successful in 1m11s
CI / vulncheck (push) Successful in 46s
2026-05-29 15:17:09 -04:00
lerko b62a721277 Merge pull request 'chore: migrate module path to lerkolabs org' (#38) from chore/org-namespace into main
CI / test (push) Successful in 2m36s
CI / lint (push) Successful in 1m1s
CI / vulncheck (push) Successful in 56s
Reviewed-on: #38
2026-05-29 19:07:06 +00:00
lerko 8f17deba67 chore: migrate module path to lerkolabs org
CI / test (pull_request) Successful in 2m39s
CI / lint (pull_request) Successful in 1m6s
CI / vulncheck (pull_request) Successful in 46s
Move Go module from gitea.lerkolabs.com/lerko/uptop to
gitea.lerkolabs.com/lerkolabs/uptop. Updates all imports,
go.mod, goreleaser owner, and README links.
2026-05-29 14:22:49 -04:00
lerko 026e969b74 chore: TUI screenshots, README polish, changelog rewrite (#32)
CI / test (push) Successful in 2m41s
CI / lint (push) Successful in 1m11s
CI / vulncheck (push) Successful in 56s
- Add 6 TUI screenshots to assets/ (monitors, alerts, logs, nodes, detail, theme)
- Rewrite README with hero image, badges, collapsible install sections
- Rewrite changelog to match actual CalVer tag history
- VHS tooling extracted to lerko/uptop-vhs

Reviewed-on: lerko/uptop#32
2026-05-29 17:45:31 +00:00
lerko cfbf01274d chore(tui): visual polish — detail sections, column headers, alert detail (#37)
CI / test (push) Successful in 2m40s
CI / lint (push) Successful in 1m2s
CI / vulncheck (push) Successful in 51s
Release / docker (push) Has been cancelled
Release / release (push) Has been cancelled
## Summary

Bundled remaining UX polish items from the screenshot review.

### Changes

**Detail panel sections (#5)**
- Fields grouped into ENDPOINT, TIMING, HTTP, CONFIG sections with subtle headers
- Matches existing PROBE RESULTS and STATE CHANGES section pattern
- Cleaner visual hierarchy without box-drawing clutter

**Omit unconfigured fields (#6)**
- Timeout hidden when 0 (unconfigured)
- Method hidden when default GET
- AcceptedCodes shows "200-299" explicitly when empty

**Column header (#7)**
- `LATENCY` → `LAT` (design short, never truncate — htop/btop pattern)

**Alert detail view (#8)**
- `i` key on Alerts tab opens full detail panel
- Shows: type, health status, last sent time, send/fail counts, last error
- Full config key:value pairs (untruncated)
- Keybinding: `[i/Esc] Back  [e] Edit  [t] Test  [q] Quit`

### Files (3)
- `internal/tui/tab_sites.go` — section headers, field omission, LAT header
- `internal/tui/tab_alerts.go` — viewAlertDetailPanel()
- `internal/tui/tui.go` — stateAlertDetail, key handler, render routing

Reviewed-on: lerko/uptop#37
2026-05-28 20:40:29 +00:00
lerko 26e297cbae Merge pull request 'feat: alert channel health indicator + test alerts' (#36) from feat/alert-health into main
CI / test (push) Successful in 2m48s
CI / lint (push) Successful in 1m17s
CI / vulncheck (push) Successful in 1m6s
Reviewed-on: lerko/uptop#36
2026-05-28 01:33:00 +00:00
lerko 0aa2f9cd8a feat: alert channel health indicator + test alerts
CI / test (pull_request) Successful in 2m46s
CI / lint (pull_request) Successful in 1m1s
CI / vulncheck (pull_request) Successful in 51s
Track alert delivery health at runtime:
- AlertHealth struct: LastSendAt, LastSendOK, LastError, SendCount, FailCount
- triggerAlert records success/failure after each Send()
- Health data exposed via GetAlertHealth() for TUI

Alerts tab enriched:
- Health dot column: green (OK), red (failed), gray (never sent)
- LAST SENT column: relative time ("2m ago", "never")
- [t] key sends test notification through selected channel

Inspired by Grafana's contact point health columns.
2026-05-27 21:23:06 -04:00
lerko f17f06a1c6 Merge pull request 'feat: logs tab overhaul — severity tags, filtering, recovery durations' (#35) from feat/logs-overhaul into main
CI / test (push) Successful in 2m47s
CI / lint (push) Successful in 1m16s
CI / vulncheck (push) Successful in 56s
Reviewed-on: lerko/uptop#35
2026-05-28 00:35:24 +00:00
lerko b14d5e19db feat: logs tab overhaul — severity tags, filtering, recovery durations
CI / test (pull_request) Successful in 2m36s
CI / lint (pull_request) Successful in 1m1s
CI / vulncheck (pull_request) Successful in 51s
Logs tab visual overhaul:
- Severity-classified entries: DOWN (red), UP (green), WARN (amber),
  SYS (cyan), info (gray) — rendered as inline tags, not whole-line color
- Column-aligned format: [timestamp] [severity tag] [message]
- Filter toggle (f key): All vs Important only (hides retry noise)
- Header shows entry count, filter state, hidden count

Engine log improvements:
- Recovery messages include downtime duration ("was down 14m")
- LATE transition logged ("heartbeat overdue")
- Push monitor recovery includes downtime duration
2026-05-27 20:14:43 -04:00
lerko a2b38ddc60 Merge pull request 'feat: proper push monitor lifecycle — PENDING, LATE, DOWN' (#34) from feat/push-monitor-states into main
CI / test (push) Successful in 2m48s
CI / lint (push) Successful in 1m17s
CI / vulncheck (push) Successful in 56s
Reviewed-on: lerko/uptop#34
2026-05-28 00:01:56 +00:00
lerko 5dc31108f8 feat: proper push monitor lifecycle — PENDING, LATE, DOWN states
CI / test (pull_request) Successful in 2m41s
CI / lint (pull_request) Successful in 1m7s
CI / vulncheck (pull_request) Successful in 46s
Push monitors no longer lie about status:

- PENDING stays until first heartbeat (no auto-promote to UP)
- LATE state (amber) when overdue but within grace period
- DOWN only after grace period expires
- Grace period = interval/2, minimum 60s

RecordHeartbeat now handles all transitions:
- PENDING → UP (first heartbeat, logged)
- LATE → UP (late arrival, logged)
- DOWN → UP (recovery, alert + state change persisted)

TUI updates:
- LATE rendered in amber/warning color
- Status bar shows LATE count separately
- Tab badge shows ⚠ for late monitors
- Sort order: DOWN > LATE > UP > PENDING > PAUSED
- Detail panel shows error for LATE monitors

Inspired by Healthchecks.io state machine (new/up/grace/down).
2026-05-27 19:56:50 -04:00
lerko 63773b13d0 Merge pull request 'feat: show error reason when monitors go DOWN' (#33) from feat/error-reason into main
CI / test (push) Successful in 2m51s
CI / lint (push) Successful in 1m1s
CI / vulncheck (push) Successful in 56s
Reviewed-on: lerko/uptop#33
2026-05-27 23:38:26 +00:00
lerko bc3a44beac feat: show error reason when monitors go DOWN
CI / test (pull_request) Successful in 2m42s
CI / lint (pull_request) Successful in 1m11s
CI / vulncheck (pull_request) Successful in 51s
Propagate check failure reasons through the entire stack:
- Checker captures specific errors (DNS, timeout, HTTP status, SSL, etc.)
- Engine tracks LastError, StatusChangedAt, LastSuccessAt per monitor
- State transitions persisted to new state_changes table
- Detail panel shows error reason, HTTP code, state duration, last
  success time, and last 5 state change events
- Monitor table shows inline error preview for DOWN services
- Alert messages include error reason
- Probe nodes forward error reasons to leader

15 files changed across models, checker, engine, store, TUI, and probes.
2026-05-27 19:32:30 -04:00
52 changed files with 1279 additions and 372 deletions
+4 -9
View File
@@ -1,15 +1,10 @@
.git
.ssh/
.gitea/
tmp/
vendor/
# Security: keep sensitive/local files out of Docker build context
.ssh/
.claude/
.github/
.gitea/
CLAUDE.md
*.db
*.db-journal
*.local.json
*.local.md
*.local
*.db
*.db-journal
+8 -1
View File
@@ -8,7 +8,13 @@ on:
jobs:
release:
runs-on: ubuntu-latest
defaults:
run:
shell: sh
steps:
- name: Install build tools
run: apk add --no-cache git gcc musl-dev
- uses: actions/checkout@v4
with:
fetch-depth: 0
@@ -34,9 +40,10 @@ jobs:
env:
GORELEASER_FORCE_TOKEN: gitea
GITEA_TOKEN: ${{ secrets.RELEASE_TOKEN }}
GITEA_API_URL: http://gitea:3000/api/v1
docker:
runs-on: ubuntu-latest
runs-on: docker-builder
needs: [release]
steps:
- uses: actions/checkout@v4
+1 -1
View File
@@ -27,7 +27,7 @@ go.work
# End of https://www.toptal.com/developers/gitignore/api/go
/uptop
uptop.db
uptop.db*
.ssh
+2 -2
View File
@@ -1,12 +1,12 @@
version: 2
gitea_urls:
api: https://gitea.lerkolabs.com/api/v1
api: "{{ if index .Env \"GITEA_API_URL\" }}{{ .Env.GITEA_API_URL }}{{ else }}https://gitea.lerkolabs.com/api/v1{{ end }}"
download: https://gitea.lerkolabs.com
release:
gitea:
owner: lerko
owner: lerkolabs
name: uptop
builds:
+78 -30
View File
@@ -1,46 +1,94 @@
# Changelog
## [2026.05.2] — 2026-05-23
## [2026.05.5] — 2026-05-29
### Added
- Comprehensive test suite (94 tests across monitor, server, cluster)
- golangci-lint config with CI enforcement
- Gitea Actions CI pipeline (test + lint)
- Graceful shutdown for HTTP and SSH servers
- Context-aware alert delivery with timeout
- Request size limits on all POST endpoints
- Constant-time secret comparison
- Check interval jitter to prevent thundering herd
- `--version` flag with build metadata injection
- Error reason display when monitors go DOWN (#33)
- Push monitor lifecycle — PENDING, LATE, DOWN states (#34)
- Logs tab overhaul — severity tags, filtering, recovery durations (#35)
- Alert channel health indicator and test alerts (#36)
- TUI screenshots in `assets/` (#32)
- CI status badge in README
### Fixed
- Silent JSON unmarshal failures in alert settings
- Panic on crypto/rand failure replaced with error return
- Alert delivery errors now logged instead of swallowed
- log.Fatalf in goroutines replaced with log.Printf
- Deprecated LineUp/LineDown API calls
### Changed
- Visual polish — detail sections, column headers, alert detail (#37)
- README rewritten with hero image, badges, collapsible install sections (#32)
- Changelog rewritten to match actual CalVer tag history
- Migrated to `lerkolabs` org namespace (#38)
- Docker-compose files moved to `deploy/`
## [2026.05.4] — 2026-05-27
### Added
- SSH user seeding from `UPTOP_ADMIN_KEY` env var and `UPTOP_KEYS` file (#31)
- GoReleaser for binary releases
- govulncheck in CI pipeline
- Multi-arch Docker builds (amd64 + arm64)
### Changed
- CI overhaul — Go 1.26, build caching, streamlined pipeline (#30)
- Bumped golang.org/x/crypto v0.47.0 → v0.52.0
- Bumped Alpine 3.21 → 3.23
### Security
- Cluster secret compared with crypto/subtle (timing-safe)
- http.MaxBytesReader on all JSON endpoints
- ReadHeaderTimeout added to HTTP server
- Phase 1: SSRF protection, input validation, safe dial (#26)
- Phase 2: TLS hardening, auth bypass fixes, rate limiting (#27)
- Phase 3: Graceful degradation, connection limits, timeout enforcement (#28)
- Phase 4: Code quality, error handling, linter fixes (#29)
## [2026.05.1] — 2026-05-14
## [2026.05.3] — 2026-05-25
### Added
- Theme system with 5 dark palettes — Default, Dracula, Nord, Tokyo Night, Gruvbox (#24)
- `--version` flag with build metadata injection
- Gitea Actions CI pipeline — test + lint (#20)
- golangci-lint configuration
- Comprehensive test suite — 94 tests across monitor, server, cluster (#19)
- CONTRIBUTING.md and SECURITY.md
### Changed
- Renamed project from go-upkeep to uptop (#25)
- Updated LICENSE with dual copyright for independent fork
### Fixed
- Form validators scoped to relevant monitor types (#23)
- Graceful shutdown for HTTP, SSH servers and database (#19)
- Constant-time secret comparison, request size limits (#19)
- Check interval jitter to prevent thundering herd (#19)
- TUI visual polish — zebra striping, group icons, sparkline stats (#18)
## [2026.05.2] — 2026-05-22
### Added
- Incident management and maintenance windows (#17)
- Production docker-compose.yml
### Fixed
- Viewport sizing and dynamic chrome calculation (#16)
- Form height constrained to terminal with resize forwarding
- Maintenance'd monitors excluded from down count and pulse
- Group status correctly skips children in maintenance
## [2026.05.1] — 2026-05-16
### Added
- Distributed probing with leader + probe nodes
- Config-as-code (YAML apply/export with dry-run, prune)
- TUI visual polish (zebra striping, sparklines, breadcrumbs)
- Incident management and maintenance windows
- 9 alert providers (Discord, Slack, Email, Ntfy, Telegram, PagerDuty, Pushover, Gotify, Webhook)
- Config-as-code YAML apply/export with dry-run and prune
- TUI polish — status bar, tab badges, detail panel, modals
- DOWN-first sort, health pulse, site filter
- Type icons in sites table
- Sparkline history graphs
- Persistent state — uptime, status, latency, and logs survive restarts
- Push token stripping from /status/json response
## [2026.04.1] — Initial independent fork
## [2026.04.1] — 2026-04-01
### Added
- SSH-accessible TUI (Bubble Tea + Wish)
- 6 check types (HTTP, Push, Ping, Port, DNS, Group)
- SSH-accessible TUI built on Bubble Tea + Wish
- 6 check types HTTP, Push, Ping, Port, DNS, Group
- 9 alert providers — Discord, Slack, Email, Ntfy, Telegram, PagerDuty, Pushover, Gotify, Webhook
- SQLite and PostgreSQL support
- HA clustering with automatic failover
- Prometheus metrics endpoint
- Public status page
- Uptime Kuma import
- Prometheus /metrics endpoint
- Public status page (HTML + JSON)
- Uptime Kuma backup import
+79 -48
View File
@@ -1,19 +1,50 @@
# uptop
<div align="center">
<h1>uptop</h1>
<p>Self-hosted uptime monitoring with a TUI over SSH.</p>
<p>No browser. No client install. Just <code>ssh -p 23234 your-server</code>.</p>
Self-hosted uptime monitor with a TUI you can access over SSH. No browser, no install on the client — just `ssh -p 23234 your-server`.
<p>
<a href="https://gitea.lerkolabs.com/lerkolabs/uptop/actions/workflows/ci.yml"><img src="https://gitea.lerkolabs.com/lerkolabs/uptop/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
<img src="https://img.shields.io/badge/license-MIT-blue" alt="MIT License">
<img src="https://img.shields.io/badge/go-1.26-00ADD8?logo=go&logoColor=white" alt="Go 1.26">
<img src="https://img.shields.io/docker/pulls/lerkolabs/uptop" alt="Docker Pulls">
</p>
Built on the foundation of [RDGames/go-upkeep](https://github.com/RDGames/go-upkeep).
<img src="assets/monitors.png" alt="uptop monitors view" width="800">
</div>
## What it does
## What is this
- **6 check types**: HTTP, Push (heartbeat), Ping, Port, DNS, Groups
- **9 alert providers**: Discord, Slack, Email, Ntfy, Webhook, Telegram, PagerDuty, Pushover, Gotify
- **Config as code**: define monitors in YAML, apply declaratively, version control your setup
- **HA clustering**: leader/follower with automatic failover
- **Prometheus metrics**: `/metrics` endpoint for Grafana dashboards
- **Public status page**: HTML + JSON, toggle with an env var
- **SQLite or Postgres**: SQLite for single-node, Postgres for production
- **Uptime Kuma import**: migrate from Kuma with one command
An uptime monitor you manage entirely from the terminal. It runs as a server, exposes an SSH endpoint, and drops you into a full TUI — monitors, alerts, logs, nodes, all there.
Built on [RDGames/go-upkeep](https://github.com/RDGames/go-upkeep). Rewritten for clustering, config-as-code, and a proper dashboard.
## Features
- **6 check types** — HTTP, Push (heartbeat), Ping, Port, DNS, Groups
- **9 alert providers** — Discord, Slack, Email, Ntfy, Webhook, Telegram, PagerDuty, Pushover, Gotify
- **Config as code** — define monitors in YAML, apply declaratively, version control your setup
- **HA clustering** — leader/follower with automatic failover
- **Prometheus metrics** — `/metrics` endpoint, wire it straight to Grafana
- **Public status page** — HTML + JSON, toggle with an env var
- **SQLite or Postgres** — SQLite for single-node, Postgres for production
- **Uptime Kuma import** — migrate from Kuma with one command
## Screenshots
<table>
<tr>
<td><img src="assets/detail.png" alt="detail panel" width="400"></td>
<td><img src="assets/alerts.png" alt="alerts view" width="400"></td>
</tr>
<tr>
<td><img src="assets/logs.png" alt="logs view" width="400"></td>
<td><img src="assets/nodes.png" alt="cluster nodes" width="400"></td>
</tr>
<tr>
<td colspan="2" align="center"><img src="assets/theme.png" alt="theme selection" width="600"></td>
</tr>
</table>
## Quick start
@@ -22,7 +53,7 @@ go run cmd/uptop/main.go
ssh -p 23234 localhost
```
Seed some demo data to see it in action:
Want some data to look at first:
```bash
go run cmd/uptop/main.go -demo
@@ -30,22 +61,45 @@ go run cmd/uptop/main.go -demo
## Install
### From source
<details>
<summary><strong>Docker (recommended)</strong></summary>
```bash
go install gitea.lerkolabs.com/lerko/uptop/cmd/uptop@latest
```yaml
services:
uptop:
image: lerkolabs/uptop:latest
restart: unless-stopped
ports:
- "23234:23234"
- "8080:8080"
environment:
- UPTOP_DB_TYPE=sqlite
- UPTOP_DB_DSN=/data/uptop.db
- UPTOP_STATUS_ENABLED=true
# - UPTOP_ADMIN_KEY=ssh-ed25519 AAAA... you@host
volumes:
- ./data:/data
```
### Docker
First run: set `UPTOP_ADMIN_KEY` to your SSH public key, or attach to the container and add it in the Users tab.
</details>
<details>
<summary><strong>Binary</strong></summary>
Download from [Releases](https://gitea.lerkolabs.com/lerkolabs/uptop/releases).
</details>
<details>
<summary><strong>From source</strong></summary>
```bash
docker pull lerko/uptop:latest
docker run -p 23234:23234 -p 8080:8080 -v ./data:/data lerko/uptop
go install gitea.lerkolabs.com/lerkolabs/uptop/cmd/uptop@latest
```
### Binary
Download from [Releases](https://gitea.lerkolabs.com/lerko/uptop/releases).
</details>
## Config as code
@@ -63,35 +117,11 @@ uptop apply -f monitors.yaml --dry-run # see what would change
uptop apply -f monitors.yaml --prune # delete anything not in the YAML
```
See [docs/config-as-code.md](docs/config-as-code.md) for the full reference.
## Docker
```yaml
services:
monitor:
build: .
restart: unless-stopped
stdin_open: true
tty: true
ports:
- "23234:23234"
- "8080:8080"
volumes:
- ./data:/data
- ./ssh_keys:/app/.ssh
environment:
- UPTOP_DB_TYPE=sqlite
- UPTOP_DB_DSN=/data/uptop.db
- UPTOP_STATUS_ENABLED=true
- UPTOP_CLUSTER_SECRET=change-me
```
First run: attach to the container (`docker attach uptop`), go to the Users tab, add your SSH public key. Then detach with `Ctrl+P, Ctrl+Q` and connect normally over SSH.
Full reference in [docs/config-as-code.md](docs/config-as-code.md).
## Environment variables
| Variable | Default | What it does |
| Variable | Default | Description |
|---|---|---|
| `UPTOP_PORT` | `23234` | SSH server port |
| `UPTOP_HTTP_PORT` | `8080` | HTTP server port (status page, push, metrics) |
@@ -103,6 +133,7 @@ First run: attach to the container (`docker attach uptop`), go to the Users tab,
| `UPTOP_PEER_URL` | | Leader URL for follower nodes |
| `UPTOP_CLUSTER_SECRET` | | Shared key for cluster + API auth |
| `UPTOP_INSECURE_SKIP_VERIFY` | `false` | Skip TLS verification for checks |
| `UPTOP_ADMIN_KEY` | | SSH public key seeded as first admin on startup |
## Migrating from Uptime Kuma
BIN
View File
Binary file not shown.

After

Width:  |  Height:  |  Size: 84 KiB

BIN
View File
Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

BIN
View File
Binary file not shown.

After

Width:  |  Height:  |  Size: 206 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 232 KiB

BIN
View File
Binary file not shown.

After

Width:  |  Height:  |  Size: 57 KiB

BIN
View File
Binary file not shown.

After

Width:  |  Height:  |  Size: 253 KiB

+9 -8
View File
@@ -17,14 +17,14 @@ import (
"syscall"
"time"
"gitea.lerkolabs.com/lerko/uptop/internal/cluster"
"gitea.lerkolabs.com/lerko/uptop/internal/config"
"gitea.lerkolabs.com/lerko/uptop/internal/importer"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerko/uptop/internal/monitor"
"gitea.lerkolabs.com/lerko/uptop/internal/server"
"gitea.lerkolabs.com/lerko/uptop/internal/store"
"gitea.lerkolabs.com/lerko/uptop/internal/tui"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/cluster"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/config"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/importer"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/monitor"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/server"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/store"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/tui"
tea "github.com/charmbracelet/bubbletea"
"github.com/charmbracelet/ssh"
@@ -385,6 +385,7 @@ func runServe(args []string) {
eng.InitHistory()
eng.InitLogs()
eng.InitAlertHealth()
eng.Start(ctx)
tlsCert := os.Getenv("UPTOP_TLS_CERT")
+1 -1
View File
@@ -1,4 +1,4 @@
module gitea.lerkolabs.com/lerko/uptop
module gitea.lerkolabs.com/lerkolabs/uptop
go 1.26.3
+1 -1
View File
@@ -11,7 +11,7 @@ import (
"strings"
"time"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
)
var alertClient = &http.Client{Timeout: 10 * time.Second}
+1 -1
View File
@@ -7,7 +7,7 @@ import (
"net/http/httptest"
"testing"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
)
func TestHTTPProviderDiscord(t *testing.T) {
+1 -1
View File
@@ -7,7 +7,7 @@ import (
"strings"
"time"
"gitea.lerkolabs.com/lerko/uptop/internal/monitor"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/monitor"
)
type Config struct {
+8 -2
View File
@@ -10,8 +10,8 @@ import (
"testing"
"time"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerko/uptop/internal/monitor"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/monitor"
)
// --- Mock Store (minimal, for monitor.NewEngine) ---
@@ -53,6 +53,10 @@ func (m *mockStore) GetNode(string) (models.ProbeNode, error) { return models.Pr
func (m *mockStore) GetAllNodes() ([]models.ProbeNode, error) { return nil, nil }
func (m *mockStore) UpdateNodeLastSeen(string) error { return nil }
func (m *mockStore) DeleteNode(string) error { return nil }
func (m *mockStore) LoadAlertHealth() (map[int]models.AlertHealthRecord, error) {
return nil, nil
}
func (m *mockStore) SaveAlertHealth(models.AlertHealthRecord) error { return nil }
func (m *mockStore) SaveLog(string) error { return nil }
func (m *mockStore) LoadLogs(int) ([]string, error) { return nil, nil }
func (m *mockStore) GetActiveMaintenanceWindows() ([]models.MaintenanceWindow, error) {
@@ -67,6 +71,8 @@ func (m *mockStore) DeleteMaintenanceWindow(int) error { retur
func (m *mockStore) IsMonitorInMaintenance(int) (bool, error) { return false, nil }
func (m *mockStore) GetPreference(string) (string, error) { return "", nil }
func (m *mockStore) SetPreference(string, string) error { return nil }
func (m *mockStore) SaveStateChange(int, string, string, string) error { return nil }
func (m *mockStore) GetStateChanges(int, int) ([]models.StateChange, error) { return nil, nil }
func (m *mockStore) Close() error { return nil }
// --- Cluster Start Tests ---
+4 -2
View File
@@ -12,8 +12,8 @@ import (
"sync"
"time"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerko/uptop/internal/monitor"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/monitor"
)
type ProbeConfig struct {
@@ -130,6 +130,7 @@ type probeResultItem struct {
SiteID int `json:"site_id"`
LatencyNs int64 `json:"latency_ns"`
IsUp bool `json:"is_up"`
ErrorReason string `json:"error_reason,omitempty"`
}
func probeExecuteChecks(ctx context.Context, sites []models.Site, strict, insecure *http.Client, allowPrivate bool) []probeResultItem {
@@ -157,6 +158,7 @@ loop:
SiteID: s.ID,
LatencyNs: cr.LatencyNs,
IsUp: cr.Status == "UP",
ErrorReason: cr.ErrorReason,
})
mu.Unlock()
}(site)
+2 -2
View File
@@ -2,8 +2,8 @@ package config
import (
"fmt"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerko/uptop/internal/store"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/store"
"reflect"
"strings"
)
+2 -2
View File
@@ -1,8 +1,8 @@
package config
import (
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerko/uptop/internal/store"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/store"
"strings"
"testing"
)
+2 -2
View File
@@ -5,8 +5,8 @@ import (
"os"
"sort"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerko/uptop/internal/store"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/store"
"gopkg.in/yaml.v3"
)
+1 -1
View File
@@ -1,7 +1,7 @@
package config
import (
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
"testing"
)
+1 -1
View File
@@ -3,7 +3,7 @@ package importer
import (
"encoding/json"
"fmt"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
"os"
"strings"
)
+2 -2
View File
@@ -2,8 +2,8 @@ package metrics
import (
"fmt"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerko/uptop/internal/monitor"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/monitor"
"net/http"
"sort"
"strings"
+9 -2
View File
@@ -2,13 +2,14 @@ package metrics
import (
"context"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerko/uptop/internal/monitor"
"net/http"
"net/http/httptest"
"strings"
"testing"
"time"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/monitor"
)
type mockStore struct {
@@ -50,6 +51,10 @@ func (m *mockStore) GetNode(string) (models.ProbeNode, error) { return m
func (m *mockStore) GetAllNodes() ([]models.ProbeNode, error) { return nil, nil }
func (m *mockStore) UpdateNodeLastSeen(string) error { return nil }
func (m *mockStore) DeleteNode(string) error { return nil }
func (m *mockStore) LoadAlertHealth() (map[int]models.AlertHealthRecord, error) {
return nil, nil
}
func (m *mockStore) SaveAlertHealth(models.AlertHealthRecord) error { return nil }
func (m *mockStore) SaveLog(string) error { return nil }
func (m *mockStore) LoadLogs(int) ([]string, error) { return nil, nil }
func (m *mockStore) GetActiveMaintenanceWindows() ([]models.MaintenanceWindow, error) {
@@ -64,6 +69,8 @@ func (m *mockStore) DeleteMaintenanceWindow(int) error { retur
func (m *mockStore) IsMonitorInMaintenance(int) (bool, error) { return false, nil }
func (m *mockStore) GetPreference(string) (string, error) { return "", nil }
func (m *mockStore) SetPreference(string, string) error { return nil }
func (m *mockStore) SaveStateChange(int, string, string, string) error { return nil }
func (m *mockStore) GetStateChanges(int, int) ([]models.StateChange, error) { return nil, nil }
func (m *mockStore) Close() error { return nil }
func TestMetricsHandler(t *testing.T) {
+23
View File
@@ -35,6 +35,18 @@ type Site struct {
HasSSL bool
LastCheck time.Time
SentSSLWarning bool
LastError string
StatusChangedAt time.Time
LastSuccessAt time.Time
}
type StateChange struct {
ID int
SiteID int
FromStatus string
ToStatus string
ErrorReason string
ChangedAt time.Time
}
type AlertConfig struct {
@@ -67,6 +79,17 @@ type ProbeNode struct {
Version string
}
// AlertHealthRecord is the persisted send health of an alert channel. It lets the
// "last sent" / health indicators survive restarts instead of resetting to "never".
type AlertHealthRecord struct {
AlertID int
LastSendAt time.Time
LastSendOK bool
LastError string
SendCount int
FailCount int
}
type MaintenanceWindow struct {
ID int
MonitorID int
+1
View File
@@ -15,6 +15,7 @@ type NodeResult struct {
IsUp bool
LatencyNs int64
CheckedAt time.Time
ErrorReason string
}
func AggregateStatus(results []NodeResult, strategy AggregationStrategy) (isUp bool, avgLatencyNs int64) {
+29 -10
View File
@@ -2,13 +2,14 @@ package monitor
import (
"context"
"fmt"
"net"
"net/http"
"strconv"
"strings"
"time"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
"github.com/miekg/dns"
probing "github.com/prometheus-community/pro-bing"
@@ -21,6 +22,7 @@ type CheckResult struct {
LatencyNs int64
HasSSL bool
CertExpiry time.Time
ErrorReason string
}
func RunCheck(site models.Site, strict, insecure *http.Client, globalInsecure bool, allowPrivate ...bool) CheckResult {
@@ -35,7 +37,7 @@ func RunCheck(site models.Site, strict, insecure *http.Client, globalInsecure bo
if ips, err := net.LookupIP(host); err == nil {
for _, ip := range ips {
if isPrivateIP(ip) {
return CheckResult{SiteID: site.ID, Status: "DOWN"}
return CheckResult{SiteID: site.ID, Status: "DOWN", ErrorReason: "target resolves to private IP"}
}
}
}
@@ -52,7 +54,7 @@ func RunCheck(site models.Site, strict, insecure *http.Client, globalInsecure bo
case "dns":
return runDNSCheck(site)
default:
return CheckResult{SiteID: site.ID, Status: "DOWN"}
return CheckResult{SiteID: site.ID, Status: "DOWN", ErrorReason: "unsupported monitor type: " + site.Type}
}
}
@@ -68,7 +70,7 @@ func runHTTPCheck(site models.Site, strict, insecure *http.Client, globalInsecur
req, err := http.NewRequestWithContext(ctx, method, site.URL, nil)
if err != nil {
return CheckResult{SiteID: site.ID, Status: "DOWN"}
return CheckResult{SiteID: site.ID, Status: "DOWN", ErrorReason: "invalid request: " + err.Error()}
}
client := strict
@@ -88,6 +90,7 @@ func runHTTPCheck(site models.Site, strict, insecure *http.Client, globalInsecur
if err != nil {
result.Status = "DOWN"
result.ErrorReason = truncateError(err.Error(), 256)
return result
}
defer resp.Body.Close()
@@ -95,6 +98,11 @@ func runHTTPCheck(site models.Site, strict, insecure *http.Client, globalInsecur
result.StatusCode = resp.StatusCode
if !isCodeAccepted(resp.StatusCode, site.AcceptedCodes) {
result.Status = "DOWN"
expected := site.AcceptedCodes
if expected == "" {
expected = "200-299"
}
result.ErrorReason = fmt.Sprintf("HTTP %d (expected %s)", resp.StatusCode, expected)
}
if site.CheckSSL && resp.TLS != nil && len(resp.TLS.PeerCertificates) > 0 {
@@ -103,6 +111,7 @@ func runHTTPCheck(site models.Site, strict, insecure *http.Client, globalInsecur
result.CertExpiry = cert.NotAfter
if time.Now().After(cert.NotAfter) {
result.Status = "SSL EXP"
result.ErrorReason = "SSL certificate expired"
}
}
@@ -117,7 +126,7 @@ func runPingCheck(site models.Site) CheckResult {
pinger, err := probing.NewPinger(host)
if err != nil {
return CheckResult{SiteID: site.ID, Status: "DOWN"}
return CheckResult{SiteID: site.ID, Status: "DOWN", ErrorReason: "ping setup: " + err.Error()}
}
pinger.Count = 1
pinger.Timeout = siteTimeout(site)
@@ -127,8 +136,11 @@ func runPingCheck(site models.Site) CheckResult {
err = pinger.Run()
latency := time.Since(start)
if err != nil || pinger.Statistics().PacketsRecv == 0 {
return CheckResult{SiteID: site.ID, Status: "DOWN", LatencyNs: latency.Nanoseconds()}
if err != nil {
return CheckResult{SiteID: site.ID, Status: "DOWN", LatencyNs: latency.Nanoseconds(), ErrorReason: "ping failed: " + err.Error()}
}
if pinger.Statistics().PacketsRecv == 0 {
return CheckResult{SiteID: site.ID, Status: "DOWN", LatencyNs: latency.Nanoseconds(), ErrorReason: "no ICMP response"}
}
stats := pinger.Statistics()
@@ -148,7 +160,7 @@ func runPortCheck(site models.Site) CheckResult {
latency := time.Since(start)
if err != nil {
return CheckResult{SiteID: site.ID, Status: "DOWN", LatencyNs: latency.Nanoseconds()}
return CheckResult{SiteID: site.ID, Status: "DOWN", LatencyNs: latency.Nanoseconds(), ErrorReason: truncateError(err.Error(), 256)}
}
_ = conn.Close()
return CheckResult{SiteID: site.ID, Status: "UP", LatencyNs: latency.Nanoseconds()}
@@ -199,10 +211,10 @@ func runDNSCheck(site models.Site) CheckResult {
latency := time.Since(start)
if err != nil {
return CheckResult{SiteID: site.ID, Status: "DOWN", LatencyNs: latency.Nanoseconds()}
return CheckResult{SiteID: site.ID, Status: "DOWN", LatencyNs: latency.Nanoseconds(), ErrorReason: "DNS query failed: " + err.Error()}
}
if r.Rcode != dns.RcodeSuccess {
return CheckResult{SiteID: site.ID, Status: "DOWN", StatusCode: r.Rcode, LatencyNs: latency.Nanoseconds()}
return CheckResult{SiteID: site.ID, Status: "DOWN", StatusCode: r.Rcode, LatencyNs: latency.Nanoseconds(), ErrorReason: "DNS RCODE: " + dns.RcodeToString[r.Rcode]}
}
return CheckResult{SiteID: site.ID, Status: "UP", LatencyNs: latency.Nanoseconds()}
}
@@ -235,3 +247,10 @@ func isCodeAccepted(code int, accepted string) bool {
}
return false
}
func truncateError(s string, max int) string {
if len(s) <= max {
return s
}
return s[:max-3] + "..."
}
+1 -1
View File
@@ -9,7 +9,7 @@ import (
"testing"
"time"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
)
func TestRunCheck_HTTP_Success(t *testing.T) {
+212 -24
View File
@@ -11,18 +11,26 @@ import (
"sync"
"time"
"gitea.lerkolabs.com/lerko/uptop/internal/alert"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerko/uptop/internal/store"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/alert"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/store"
)
const (
maxLogEntries = 100
pollInterval = 5 * time.Second
pushGracePeriod = 5 * time.Second
minCheckInterval = 5
minPushGrace = 60 * time.Second
)
type AlertHealth struct {
LastSendAt time.Time
LastSendOK bool
LastError string
SendCount int
FailCount int
}
type Engine struct {
mu sync.RWMutex
liveState map[int]models.Site
@@ -42,6 +50,9 @@ type Engine struct {
probeResults map[int]map[string]NodeResult
aggStrategy AggregationStrategy
alertHealthMu sync.RWMutex
alertHealth map[int]AlertHealth
db store.Store
insecureSkipVerify bool
allowPrivateTargets bool
@@ -64,6 +75,7 @@ func newEngine(s store.Store, allowPrivateTargets bool) *Engine {
histories: make(map[int]*SiteHistory),
tokenIndex: make(map[string]int),
probeResults: make(map[int]map[string]NodeResult),
alertHealth: make(map[int]AlertHealth),
aggStrategy: AggAnyDown,
isActive: true,
allowPrivateTargets: allowPrivateTargets,
@@ -96,6 +108,19 @@ func sanitizeLog(s string) string {
return s
}
func fmtDurationShort(d time.Duration) string {
if d < time.Minute {
return fmt.Sprintf("%ds", int(d.Seconds()))
}
if d < time.Hour {
return fmt.Sprintf("%dm", int(d.Minutes()))
}
if d < 24*time.Hour {
return fmt.Sprintf("%dh %dm", int(d.Hours()), int(d.Minutes())%60)
}
return fmt.Sprintf("%dd %dh", int(d.Hours())/24, int(d.Hours())%24)
}
func (e *Engine) AddLog(msg string) {
e.logMu.Lock()
defer e.logMu.Unlock()
@@ -121,6 +146,26 @@ func (e *Engine) InitLogs() {
e.logStore = logs
}
// InitAlertHealth restores persisted alert send health so the dashboard shows real
// "last sent" / health state on startup instead of resetting every channel to "never".
func (e *Engine) InitAlertHealth() {
records, err := e.db.LoadAlertHealth()
if err != nil {
return
}
e.alertHealthMu.Lock()
defer e.alertHealthMu.Unlock()
for id, r := range records {
e.alertHealth[id] = AlertHealth{
LastSendAt: r.LastSendAt,
LastSendOK: r.LastSendOK,
LastError: r.LastError,
SendCount: r.SendCount,
FailCount: r.FailCount,
}
}
}
func (e *Engine) GetLogs() []string {
e.logMu.RLock()
defer e.logMu.RUnlock()
@@ -186,17 +231,38 @@ func (e *Engine) RecordHeartbeat(token string) bool {
return false
}
prevStatus := site.Status
site.LastCheck = time.Now()
wasDown := site.Status == "DOWN"
site.Status = "UP"
site.FailureCount = 0
site.Latency = 0
site.LastError = ""
site.LastSuccessAt = time.Now()
if prevStatus != "UP" {
site.StatusChangedAt = time.Now()
}
e.liveState[targetID] = site
if wasDown {
e.AddLog(fmt.Sprintf("Push Monitor '%s' recovered", site.Name))
e.triggerAlert(site.AlertID, "✅ RECOVERY", fmt.Sprintf("Push Monitor '%s' is receiving heartbeats.", site.Name))
switch prevStatus {
case "PENDING":
e.AddLog(fmt.Sprintf("Push Monitor '%s' received first heartbeat", site.Name))
case "LATE":
e.AddLog(fmt.Sprintf("Push Monitor '%s' heartbeat arrived (was late)", site.Name))
case "DOWN":
downDur := ""
if !site.StatusChangedAt.IsZero() {
downDur = fmt.Sprintf(" (was down %s)", fmtDurationShort(time.Since(site.StatusChangedAt)))
}
e.AddLog(fmt.Sprintf("Push Monitor '%s' recovered%s", site.Name, downDur))
go e.triggerAlert(site.AlertID, "✅ RECOVERY", fmt.Sprintf("Push Monitor '%s' is receiving heartbeats.%s", site.Name, downDur))
}
if prevStatus != "UP" && prevStatus != "PENDING" {
go func() { _ = e.db.SaveStateChange(targetID, prevStatus, "UP", "") }()
}
return true
}
@@ -241,9 +307,6 @@ func (e *Engine) Start(ctx context.Context) {
if !exists {
e.mu.Lock()
s.Status = "PENDING"
if s.Type == "push" {
s.LastCheck = time.Now()
}
if h, ok := e.GetHistory(s.ID); ok && len(h.Statuses) > 0 {
if h.Statuses[len(h.Statuses)-1] {
s.Status = "UP"
@@ -283,6 +346,9 @@ func (e *Engine) UpdateSiteConfig(site models.Site) {
site.LastCheck = existing.LastCheck
site.SentSSLWarning = existing.SentSSLWarning
site.FailureCount = existing.FailureCount
site.LastError = existing.LastError
site.StatusChangedAt = existing.StatusChangedAt
site.LastSuccessAt = existing.LastSuccessAt
e.liveState[site.ID] = site
e.addToTokenIndex(site)
}
@@ -393,33 +459,62 @@ func (e *Engine) checkByID(id int) {
updatedSite.CertExpiry = result.CertExpiry
updatedSite.Latency = time.Duration(result.LatencyNs)
updatedSite.LastCheck = time.Now()
e.handleStatusChange(updatedSite, result.Status, result.StatusCode, time.Duration(result.LatencyNs))
e.handleStatusChange(updatedSite, result.Status, result.StatusCode, time.Duration(result.LatencyNs), result.ErrorReason)
}
}
func (e *Engine) checkPush(site models.Site) {
deadline := site.LastCheck.Add(time.Duration(site.Interval) * time.Second).Add(pushGracePeriod)
if time.Now().After(deadline) {
e.handleStatusChange(site, "DOWN", 0, 0)
} else if site.Status != "UP" {
e.handleStatusChange(site, "UP", 200, 0)
if site.Status == "PENDING" {
return
}
interval := time.Duration(site.Interval) * time.Second
grace := interval / 2
if grace < minPushGrace {
grace = minPushGrace
}
overdue := site.LastCheck.Add(interval)
graceEnd := overdue.Add(grace)
now := time.Now()
if now.After(graceEnd) {
if site.Status != "DOWN" {
e.handleStatusChange(site, "DOWN", 0, 0, "heartbeat missed")
}
} else if now.After(overdue) {
if site.Status != "LATE" {
e.handleStatusChange(site, "LATE", 0, 0, "heartbeat overdue")
}
}
}
func (e *Engine) handleStatusChange(site models.Site, rawStatus string, code int, latency time.Duration) {
func (e *Engine) handleStatusChange(site models.Site, rawStatus string, code int, latency time.Duration, errorReason string) {
if !e.IsActive() {
return
}
newState := site
newState.StatusCode = code
newState.LastError = errorReason
if rawStatus == "UP" {
newState.LastSuccessAt = time.Now()
newState.LastError = ""
} else {
newState.LastSuccessAt = site.LastSuccessAt
}
if site.Status == "UP" && rawStatus != "UP" {
newState.FailureCount++
if newState.FailureCount > site.MaxRetries {
newState.Status = rawStatus
newState.FailureCount = site.MaxRetries + 1
if errorReason != "" {
e.AddLog(fmt.Sprintf("Monitor '%s' confirmed DOWN: %s", site.Name, errorReason))
} else {
e.AddLog(fmt.Sprintf("Monitor '%s' confirmed DOWN", site.Name))
}
} else {
e.AddLog(fmt.Sprintf("Monitor '%s' failed check %d/%d", site.Name, newState.FailureCount, site.MaxRetries))
}
@@ -431,6 +526,14 @@ func (e *Engine) handleStatusChange(site models.Site, rawStatus string, code int
newState.FailureCount = site.MaxRetries + 1
}
if newState.Status != site.Status && site.Status != "PENDING" {
newState.StatusChangedAt = time.Now()
} else if site.StatusChangedAt.IsZero() && newState.Status != "PENDING" {
newState.StatusChangedAt = time.Now()
} else {
newState.StatusChangedAt = site.StatusChangedAt
}
inMaint := e.isInMaintenance(site.ID)
if site.Type == "http" && site.CheckSSL && site.HasSSL {
@@ -455,12 +558,24 @@ func (e *Engine) handleStatusChange(site models.Site, rawStatus string, code int
e.recordCheck(site.ID, latency, rawStatus == "UP")
if newState.Status != site.Status && site.Status != "PENDING" {
go func() { _ = e.db.SaveStateChange(site.ID, site.Status, newState.Status, errorReason) }()
}
isBroken := func(s string) bool { return s == "DOWN" || s == "SSL EXP" }
if site.Status == "UP" && newState.Status == "LATE" {
e.AddLog(fmt.Sprintf("Monitor '%s' heartbeat overdue", site.Name))
}
if !isBroken(site.Status) && isBroken(newState.Status) && newState.Status != "PENDING" {
if inMaint {
e.AddLog(fmt.Sprintf("Monitor '%s' is DOWN (alerts suppressed — maintenance)", site.Name))
} else {
msg := fmt.Sprintf("Monitor '%s' is DOWN (%s)", site.Name, rawStatus)
if errorReason != "" {
msg = fmt.Sprintf("Monitor '%s' is DOWN: %s", site.Name, errorReason)
}
if site.Type == "push" {
msg = fmt.Sprintf("Push Monitor '%s' missed heartbeat.", site.Name)
}
@@ -468,11 +583,17 @@ func (e *Engine) handleStatusChange(site models.Site, rawStatus string, code int
}
}
if isBroken(site.Status) && newState.Status == "UP" {
if !inMaint {
e.triggerAlert(site.AlertID, "✅ RECOVERY", fmt.Sprintf("Monitor '%s' is UP", site.Name))
} else {
e.AddLog(fmt.Sprintf("Monitor '%s' recovered (maintenance active, alert suppressed)", site.Name))
downDur := ""
if !site.StatusChangedAt.IsZero() {
downDur = fmt.Sprintf(" (was down %s)", fmtDurationShort(time.Since(site.StatusChangedAt)))
}
e.AddLog(fmt.Sprintf("Monitor '%s' recovered%s", site.Name, downDur))
if !inMaint {
e.triggerAlert(site.AlertID, "✅ RECOVERY", fmt.Sprintf("Monitor '%s' is UP%s", site.Name, downDur))
}
}
if site.Status == "LATE" && newState.Status == "UP" && !isBroken(site.Status) {
e.AddLog(fmt.Sprintf("Monitor '%s' heartbeat arrived (was late)", site.Name))
}
}
@@ -489,11 +610,69 @@ func (e *Engine) triggerAlert(alertID int, title, message string) {
defer cancel()
if err := provider.Send(ctx, title, message); err != nil {
e.AddLog(fmt.Sprintf("Alert send failed (%s): %v", cfg.Name, err))
e.recordAlertResult(alertID, false, err.Error())
} else {
e.recordAlertResult(alertID, true, "")
}
}()
}
}
func (e *Engine) recordAlertResult(alertID int, ok bool, errMsg string) {
e.alertHealthMu.Lock()
defer e.alertHealthMu.Unlock()
h := e.alertHealth[alertID]
h.LastSendAt = time.Now()
h.LastSendOK = ok
h.SendCount++
if ok {
h.LastError = ""
} else {
h.LastError = errMsg
h.FailCount++
}
e.alertHealth[alertID] = h
// Persist best-effort so health survives restarts; DB IO off the alert path.
go func(rec models.AlertHealthRecord) {
_ = e.db.SaveAlertHealth(rec)
}(models.AlertHealthRecord{
AlertID: alertID,
LastSendAt: h.LastSendAt,
LastSendOK: h.LastSendOK,
LastError: h.LastError,
SendCount: h.SendCount,
FailCount: h.FailCount,
})
}
func (e *Engine) GetAlertHealth(alertID int) AlertHealth {
e.alertHealthMu.RLock()
defer e.alertHealthMu.RUnlock()
return e.alertHealth[alertID]
}
func (e *Engine) TestAlert(alertID int) error {
cfg, err := e.db.GetAlert(alertID)
if err != nil {
return fmt.Errorf("failed to load alert: %w", err)
}
provider := alert.GetProvider(cfg)
if provider == nil {
return fmt.Errorf("no provider for type %q", cfg.Type)
}
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
err = provider.Send(ctx, "🧪 Test Alert", fmt.Sprintf("Test notification from uptop for channel '%s'.", cfg.Name))
if err != nil {
e.recordAlertResult(alertID, false, err.Error())
return err
}
e.recordAlertResult(alertID, true, "")
e.AddLog(fmt.Sprintf("Test alert sent to '%s'", cfg.Name))
return nil
}
func (e *Engine) isInMaintenance(monitorID int) bool {
inMaint, err := e.db.IsMonitorInMaintenance(monitorID)
if err != nil {
@@ -554,7 +733,7 @@ func (e *Engine) SetAggStrategy(strategy AggregationStrategy) {
e.aggStrategy = strategy
}
func (e *Engine) IngestProbeResult(nodeID string, siteID int, latencyNs int64, isUp bool) {
func (e *Engine) IngestProbeResult(nodeID string, siteID int, latencyNs int64, isUp bool, errorReason string) {
e.probeResultsMu.Lock()
if e.probeResults[siteID] == nil {
e.probeResults[siteID] = make(map[string]NodeResult)
@@ -564,6 +743,7 @@ func (e *Engine) IngestProbeResult(nodeID string, siteID int, latencyNs int64, i
IsUp: isUp,
LatencyNs: latencyNs,
CheckedAt: time.Now(),
ErrorReason: errorReason,
}
results := make([]NodeResult, 0, len(e.probeResults[siteID]))
for _, r := range e.probeResults[siteID] {
@@ -588,7 +768,7 @@ func (e *Engine) IngestProbeResult(nodeID string, siteID int, latencyNs int64, i
updatedSite := site
updatedSite.Latency = time.Duration(avgLatency)
updatedSite.LastCheck = time.Now()
e.handleStatusChange(updatedSite, rawStatus, 0, time.Duration(avgLatency))
e.handleStatusChange(updatedSite, rawStatus, 0, time.Duration(avgLatency), errorReason)
}
func (e *Engine) GetProbeResults(siteID int) map[string]NodeResult {
@@ -601,3 +781,11 @@ func (e *Engine) GetProbeResults(siteID int) map[string]NodeResult {
}
return cp
}
func (e *Engine) GetStateChanges(siteID int, limit int) []models.StateChange {
changes, err := e.db.GetStateChanges(siteID, limit)
if err != nil {
return nil
}
return changes
}
+46 -21
View File
@@ -2,10 +2,11 @@ package monitor
import (
"fmt"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"sync"
"testing"
"time"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
)
// --- Mock Store ---
@@ -62,6 +63,10 @@ func (m *mockStore) GetNode(string) (models.ProbeNode, error) { return m
func (m *mockStore) GetAllNodes() ([]models.ProbeNode, error) { return nil, nil }
func (m *mockStore) UpdateNodeLastSeen(string) error { return nil }
func (m *mockStore) DeleteNode(string) error { return nil }
func (m *mockStore) LoadAlertHealth() (map[int]models.AlertHealthRecord, error) {
return nil, nil
}
func (m *mockStore) SaveAlertHealth(models.AlertHealthRecord) error { return nil }
func (m *mockStore) GetActiveMaintenanceWindows() ([]models.MaintenanceWindow, error) {
return nil, nil
}
@@ -73,6 +78,8 @@ func (m *mockStore) EndMaintenanceWindow(int) error { retur
func (m *mockStore) DeleteMaintenanceWindow(int) error { return nil }
func (m *mockStore) GetPreference(string) (string, error) { return "", nil }
func (m *mockStore) SetPreference(string, string) error { return nil }
func (m *mockStore) SaveStateChange(int, string, string, string) error { return nil }
func (m *mockStore) GetStateChanges(int, int) ([]models.StateChange, error) { return nil, nil }
func (m *mockStore) Close() error { return nil }
func (m *mockStore) GetAllAlerts() ([]models.AlertConfig, error) {
@@ -174,7 +181,7 @@ func TestHandleStatusChange_PendingToUp(t *testing.T) {
site := models.Site{ID: 1, Name: "test", Status: "PENDING", MaxRetries: 3, AlertID: 1}
injectSite(e, site)
e.handleStatusChange(site, "UP", 200, 10*time.Millisecond)
e.handleStatusChange(site, "UP", 200, 10*time.Millisecond, "")
s, _ := getSite(e, 1)
if s.Status != "UP" {
@@ -195,7 +202,7 @@ func TestHandleStatusChange_UpIncrementFailure(t *testing.T) {
site := models.Site{ID: 1, Name: "test", Status: "UP", MaxRetries: 3, FailureCount: 0}
injectSite(e, site)
e.handleStatusChange(site, "DOWN", 500, 0)
e.handleStatusChange(site, "DOWN", 500, 0, "test error")
s, _ := getSite(e, 1)
if s.Status != "UP" {
@@ -213,7 +220,7 @@ func TestHandleStatusChange_UpToDown_ExceedsRetries(t *testing.T) {
site := models.Site{ID: 1, Name: "test", Status: "UP", MaxRetries: 2, FailureCount: 2, AlertID: 1}
injectSite(e, site)
e.handleStatusChange(site, "DOWN", 500, 0)
e.handleStatusChange(site, "DOWN", 500, 0, "test error")
s, _ := getSite(e, 1)
if s.Status != "DOWN" {
@@ -236,7 +243,7 @@ func TestHandleStatusChange_UpToDown_ZeroRetries(t *testing.T) {
site := models.Site{ID: 1, Name: "test", Status: "UP", MaxRetries: 0, FailureCount: 0, AlertID: 1}
injectSite(e, site)
e.handleStatusChange(site, "DOWN", 0, 0)
e.handleStatusChange(site, "DOWN", 0, 0, "test error")
s, _ := getSite(e, 1)
if s.Status != "DOWN" {
@@ -255,7 +262,7 @@ func TestHandleStatusChange_DownToUp_Recovery(t *testing.T) {
site := models.Site{ID: 1, Name: "test", Status: "DOWN", FailureCount: 4, AlertID: 1}
injectSite(e, site)
e.handleStatusChange(site, "UP", 200, 5*time.Millisecond)
e.handleStatusChange(site, "UP", 200, 5*time.Millisecond, "")
s, _ := getSite(e, 1)
if s.Status != "UP" {
@@ -276,7 +283,7 @@ func TestHandleStatusChange_DownStaysDown(t *testing.T) {
site := models.Site{ID: 1, Name: "test", Status: "DOWN", MaxRetries: 2, FailureCount: 3}
injectSite(e, site)
e.handleStatusChange(site, "DOWN", 0, 0)
e.handleStatusChange(site, "DOWN", 0, 0, "test error")
s, _ := getSite(e, 1)
if s.Status != "DOWN" {
@@ -295,7 +302,7 @@ func TestHandleStatusChange_SSLExpired(t *testing.T) {
site := models.Site{ID: 1, Name: "test", Status: "UP", MaxRetries: 0, AlertID: 1}
injectSite(e, site)
e.handleStatusChange(site, "SSL EXP", 0, 0)
e.handleStatusChange(site, "SSL EXP", 0, 0, "SSL certificate expired")
s, _ := getSite(e, 1)
if s.Status != "SSL EXP" {
@@ -315,7 +322,7 @@ func TestHandleStatusChange_AlertSuppressedMaintenance(t *testing.T) {
site := models.Site{ID: 1, Name: "test", Status: "UP", MaxRetries: 0, AlertID: 1}
injectSite(e, site)
e.handleStatusChange(site, "DOWN", 0, 0)
e.handleStatusChange(site, "DOWN", 0, 0, "test error")
s, _ := getSite(e, 1)
if s.Status != "DOWN" {
@@ -346,7 +353,7 @@ func TestHandleStatusChange_RecoverySuppressedMaintenance(t *testing.T) {
site := models.Site{ID: 1, Name: "test", Status: "DOWN", AlertID: 1}
injectSite(e, site)
e.handleStatusChange(site, "UP", 200, 0)
e.handleStatusChange(site, "UP", 200, 0, "")
s, _ := getSite(e, 1)
if s.Status != "UP" {
@@ -370,7 +377,7 @@ func TestHandleStatusChange_SSLWarning(t *testing.T) {
}
injectSite(e, site)
e.handleStatusChange(site, "UP", 200, 0)
e.handleStatusChange(site, "UP", 200, 0, "")
s, _ := getSite(e, 1)
if !s.SentSSLWarning {
@@ -393,7 +400,7 @@ func TestHandleStatusChange_SSLWarningNotRepeated(t *testing.T) {
}
injectSite(e, site)
e.handleStatusChange(site, "UP", 200, 0)
e.handleStatusChange(site, "UP", 200, 0, "")
waitAsync()
if len(ms.getAlertCallsSnapshot()) != 0 {
@@ -412,7 +419,7 @@ func TestHandleStatusChange_SSLWarningReset(t *testing.T) {
}
injectSite(e, site)
e.handleStatusChange(site, "UP", 200, 0)
e.handleStatusChange(site, "UP", 200, 0, "")
s, _ := getSite(e, 1)
if s.SentSSLWarning {
@@ -433,7 +440,7 @@ func TestHandleStatusChange_SSLWarningSuppressedMaint(t *testing.T) {
}
injectSite(e, site)
e.handleStatusChange(site, "UP", 200, 0)
e.handleStatusChange(site, "UP", 200, 0, "")
s, _ := getSite(e, 1)
if !s.SentSSLWarning {
@@ -452,7 +459,7 @@ func TestHandleStatusChange_InactiveEngine(t *testing.T) {
injectSite(e, site)
e.SetActive(false)
e.handleStatusChange(site, "DOWN", 0, 0)
e.handleStatusChange(site, "DOWN", 0, 0, "test error")
s, _ := getSite(e, 1)
if s.Status != "UP" {
@@ -534,7 +541,7 @@ func TestCheckPush_DeadlineMissed(t *testing.T) {
site := models.Site{
ID: 1, Name: "push", Type: "push", Status: "UP",
Interval: 10, MaxRetries: 0,
LastCheck: time.Now().Add(-20 * time.Second),
LastCheck: time.Now().Add(-120 * time.Second),
}
injectSite(e, site)
@@ -546,6 +553,24 @@ func TestCheckPush_DeadlineMissed(t *testing.T) {
}
}
func TestCheckPush_OverdueBecomesLate(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{
ID: 1, Name: "push", Type: "push", Status: "UP",
Interval: 300,
LastCheck: time.Now().Add(-310 * time.Second),
}
injectSite(e, site)
e.checkPush(site)
s, _ := getSite(e, 1)
if s.Status != "LATE" {
t.Errorf("expected LATE when overdue but within grace, got %s", s.Status)
}
}
func TestCheckPush_WithinDeadline(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
@@ -563,20 +588,20 @@ func TestCheckPush_WithinDeadline(t *testing.T) {
}
}
func TestCheckPush_PendingToUp(t *testing.T) {
func TestCheckPush_PendingStaysPending(t *testing.T) {
ms := newMockStore()
e := newTestEngine(ms)
site := models.Site{
ID: 1, Name: "push", Type: "push", Status: "PENDING",
Interval: 60, LastCheck: time.Now(),
Interval: 60,
}
injectSite(e, site)
e.checkPush(site)
s, _ := getSite(e, 1)
if s.Status != "UP" {
t.Errorf("expected UP, got %s", s.Status)
if s.Status != "PENDING" {
t.Errorf("expected PENDING to stay until first heartbeat, got %s", s.Status)
}
}
@@ -991,7 +1016,7 @@ func TestConcurrent_HandleStatusChangeAndGetState(t *testing.T) {
wg.Add(2)
go func() {
defer wg.Done()
e.handleStatusChange(site, "DOWN", 500, 0)
e.handleStatusChange(site, "DOWN", 500, 0, "test error")
}()
go func() {
defer wg.Done()
+8 -6
View File
@@ -11,11 +11,11 @@ import (
"strings"
"time"
"gitea.lerkolabs.com/lerko/uptop/internal/importer"
"gitea.lerkolabs.com/lerko/uptop/internal/metrics"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerko/uptop/internal/monitor"
"gitea.lerkolabs.com/lerko/uptop/internal/store"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/importer"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/metrics"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/monitor"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/store"
)
const maxRequestBody = 1 << 20
@@ -67,6 +67,7 @@ var statusTpl = template.Must(template.New("status").Parse(`
.UP { background: #9ece6a; color: #1a1b26; }
.DOWN { background: #f7768e; color: #1a1b26; }
.PENDING { background: #e0af68; color: #1a1b26; }
.LATE { background: #e0af68; color: #1a1b26; }
.SSL-EXP { background: #e0af68; color: #1a1b26; }
.PAUSED { background: #565f89; color: #c0caf5; }
.MAINT { background: #bb9af7; color: #1a1b26; }
@@ -406,6 +407,7 @@ func Start(cfg ServerConfig, s store.Store, eng *monitor.Engine) *http.Server {
SiteID int `json:"site_id"`
LatencyNs int64 `json:"latency_ns"`
IsUp bool `json:"is_up"`
ErrorReason string `json:"error_reason"`
} `json:"results"`
}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
@@ -420,7 +422,7 @@ func Start(cfg ServerConfig, s store.Store, eng *monitor.Engine) *http.Server {
if err := s.SaveCheckFromNode(result.SiteID, req.NodeID, result.LatencyNs, result.IsUp); err != nil {
log.Printf("Failed to save probe result: %v", err)
}
eng.IngestProbeResult(req.NodeID, result.SiteID, result.LatencyNs, result.IsUp)
eng.IngestProbeResult(req.NodeID, result.SiteID, result.LatencyNs, result.IsUp, result.ErrorReason)
}
if err := s.UpdateNodeLastSeen(req.NodeID); err != nil {
log.Printf("Failed to update node last seen: %v", err)
+9 -2
View File
@@ -4,13 +4,14 @@ import (
"bytes"
"encoding/json"
"fmt"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerko/uptop/internal/monitor"
"net"
"net/http"
"sync"
"testing"
"time"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/monitor"
)
// --- Mock Store ---
@@ -64,6 +65,10 @@ func (m *mockStore) AddAlertReturningID(string, string, map[string]string) (int,
func (m *mockStore) GetAllNodes() ([]models.ProbeNode, error) { return nil, nil }
func (m *mockStore) UpdateNodeLastSeen(string) error { return nil }
func (m *mockStore) DeleteNode(string) error { return nil }
func (m *mockStore) LoadAlertHealth() (map[int]models.AlertHealthRecord, error) {
return nil, nil
}
func (m *mockStore) SaveAlertHealth(models.AlertHealthRecord) error { return nil }
func (m *mockStore) SaveLog(string) error { return nil }
func (m *mockStore) LoadLogs(int) ([]string, error) { return nil, nil }
func (m *mockStore) GetAllMaintenanceWindows(int) ([]models.MaintenanceWindow, error) {
@@ -75,6 +80,8 @@ func (m *mockStore) DeleteMaintenanceWindow(int) error { retur
func (m *mockStore) IsMonitorInMaintenance(int) (bool, error) { return false, nil }
func (m *mockStore) GetPreference(string) (string, error) { return "", nil }
func (m *mockStore) SetPreference(string, string) error { return nil }
func (m *mockStore) SaveStateChange(int, string, string, string) error { return nil }
func (m *mockStore) GetStateChanges(int, int) ([]models.StateChange, error) { return nil, nil }
func (m *mockStore) Close() error { return nil }
func (m *mockStore) ExportData() (models.Backup, error) {
+1
View File
@@ -14,6 +14,7 @@ type Dialect interface {
ImportWipe(tx *sql.Tx)
ImportResetSequences(tx *sql.Tx)
UpsertNodeSQL() string
UpsertAlertHealthSQL() string
}
func rewritePlaceholders(query string, dollarStyle bool) string {
+21
View File
@@ -72,6 +72,23 @@ func (d *PostgresDialect) CreateTablesSQL() []string {
key TEXT PRIMARY KEY,
value TEXT NOT NULL
)`,
`CREATE TABLE IF NOT EXISTS state_changes (
id SERIAL PRIMARY KEY,
site_id INTEGER NOT NULL,
from_status TEXT NOT NULL,
to_status TEXT NOT NULL,
error_reason TEXT DEFAULT '',
changed_at TIMESTAMP DEFAULT NOW()
)`,
`CREATE INDEX IF NOT EXISTS idx_state_changes_site ON state_changes(site_id, changed_at DESC)`,
`CREATE TABLE IF NOT EXISTS alert_health (
alert_id INTEGER PRIMARY KEY,
last_send_at TIMESTAMP,
last_send_ok BOOLEAN DEFAULT FALSE,
last_error TEXT DEFAULT '',
send_count INTEGER DEFAULT 0,
fail_count INTEGER DEFAULT 0
)`,
}
}
@@ -97,6 +114,10 @@ func (d *PostgresDialect) UpsertNodeSQL() string {
return "INSERT INTO nodes (id, name, region, last_seen, version) VALUES ($1, $2, $3, NOW(), $4) ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name, region = EXCLUDED.region, last_seen = NOW(), version = EXCLUDED.version"
}
func (d *PostgresDialect) UpsertAlertHealthSQL() string {
return "INSERT INTO alert_health (alert_id, last_send_at, last_send_ok, last_error, send_count, fail_count) VALUES ($1, $2, $3, $4, $5, $6) ON CONFLICT (alert_id) DO UPDATE SET last_send_at = EXCLUDED.last_send_at, last_send_ok = EXCLUDED.last_send_ok, last_error = EXCLUDED.last_error, send_count = EXCLUDED.send_count, fail_count = EXCLUDED.fail_count"
}
func (d *PostgresDialect) ResetSequenceOnEmpty(db *sql.DB, table string) {}
func (d *PostgresDialect) ImportWipe(tx *sql.Tx) {
+21
View File
@@ -79,6 +79,23 @@ func (d *SQLiteDialect) CreateTablesSQL() []string {
key TEXT PRIMARY KEY,
value TEXT NOT NULL
)`,
`CREATE TABLE IF NOT EXISTS state_changes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
site_id INTEGER NOT NULL,
from_status TEXT NOT NULL,
to_status TEXT NOT NULL,
error_reason TEXT DEFAULT '',
changed_at DATETIME DEFAULT CURRENT_TIMESTAMP
)`,
`CREATE INDEX IF NOT EXISTS idx_state_changes_site ON state_changes(site_id, changed_at DESC)`,
`CREATE TABLE IF NOT EXISTS alert_health (
alert_id INTEGER PRIMARY KEY,
last_send_at DATETIME,
last_send_ok BOOLEAN DEFAULT 0,
last_error TEXT DEFAULT '',
send_count INTEGER DEFAULT 0,
fail_count INTEGER DEFAULT 0
)`,
}
}
@@ -104,6 +121,10 @@ func (d *SQLiteDialect) UpsertNodeSQL() string {
return "INSERT OR REPLACE INTO nodes (id, name, region, last_seen, version) VALUES (?, ?, ?, CURRENT_TIMESTAMP, ?)"
}
func (d *SQLiteDialect) UpsertAlertHealthSQL() string {
return "INSERT OR REPLACE INTO alert_health (alert_id, last_send_at, last_send_ok, last_error, send_count, fail_count) VALUES (?, ?, ?, ?, ?, ?)"
}
func (d *SQLiteDialect) ResetSequenceOnEmpty(db *sql.DB, table string) {
var count int
_ = db.QueryRow("SELECT COUNT(*) FROM " + table).Scan(&count) //nolint:errcheck
+55 -1
View File
@@ -9,7 +9,7 @@ import (
"strings"
"time"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
)
const (
@@ -347,6 +347,29 @@ func (s *SQLStore) DeleteUser(id int) error {
return err
}
func (s *SQLStore) SaveStateChange(siteID int, fromStatus, toStatus, errorReason string) error {
_, err := s.db.Exec(s.q("INSERT INTO state_changes (site_id, from_status, to_status, error_reason) VALUES (?, ?, ?, ?)"),
siteID, fromStatus, toStatus, errorReason)
return err
}
func (s *SQLStore) GetStateChanges(siteID int, limit int) ([]models.StateChange, error) {
rows, err := s.db.Query(s.q("SELECT id, site_id, from_status, to_status, error_reason, changed_at FROM state_changes WHERE site_id = ? ORDER BY changed_at DESC LIMIT ?"), siteID, limit)
if err != nil {
return nil, err
}
defer rows.Close()
var changes []models.StateChange
for rows.Next() {
var sc models.StateChange
if err := rows.Scan(&sc.ID, &sc.SiteID, &sc.FromStatus, &sc.ToStatus, &sc.ErrorReason, &sc.ChangedAt); err != nil {
return changes, err
}
changes = append(changes, sc)
}
return changes, rows.Err()
}
func (s *SQLStore) SaveCheck(siteID int, latencyNs int64, isUp bool) error {
return s.SaveCheckFromNode(siteID, "", latencyNs, isUp)
}
@@ -407,6 +430,37 @@ func (s *SQLStore) DeleteNode(id string) error {
return err
}
func (s *SQLStore) LoadAlertHealth() (map[int]models.AlertHealthRecord, error) {
rows, err := s.db.Query("SELECT alert_id, last_send_at, last_send_ok, last_error, send_count, fail_count FROM alert_health")
if err != nil {
return nil, err
}
defer rows.Close()
out := make(map[int]models.AlertHealthRecord)
for rows.Next() {
var r models.AlertHealthRecord
var lastSend sql.NullTime
if err := rows.Scan(&r.AlertID, &lastSend, &r.LastSendOK, &r.LastError, &r.SendCount, &r.FailCount); err != nil {
return out, err
}
if lastSend.Valid {
r.LastSendAt = lastSend.Time
}
out[r.AlertID] = r
}
return out, rows.Err()
}
func (s *SQLStore) SaveAlertHealth(h models.AlertHealthRecord) error {
var lastSend interface{}
if !h.LastSendAt.IsZero() {
lastSend = h.LastSendAt
}
_, err := s.db.Exec(s.dialect.UpsertAlertHealthSQL(),
h.AlertID, lastSend, h.LastSendOK, h.LastError, h.SendCount, h.FailCount)
return err
}
func (s *SQLStore) SaveLog(message string) error {
_, err := s.db.Exec(s.q("INSERT INTO logs (message) VALUES (?)"), message)
if err != nil {
+1 -1
View File
@@ -1,7 +1,7 @@
package store
import (
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
"testing"
)
+9 -1
View File
@@ -1,7 +1,7 @@
package store
import (
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
)
type Store interface {
@@ -38,6 +38,10 @@ type Store interface {
SaveCheckFromNode(siteID int, nodeID string, latencyNs int64, isUp bool) error
LoadAllHistory(limit int) (map[int][]models.CheckRecord, error)
// State Changes
SaveStateChange(siteID int, fromStatus, toStatus, errorReason string) error
GetStateChanges(siteID int, limit int) ([]models.StateChange, error)
// Nodes
RegisterNode(node models.ProbeNode) error
GetNode(id string) (models.ProbeNode, error)
@@ -45,6 +49,10 @@ type Store interface {
UpdateNodeLastSeen(id string) error
DeleteNode(id string) error
// Alert Health
LoadAlertHealth() (map[int]models.AlertHealthRecord, error)
SaveAlertHealth(h models.AlertHealthRecord) error
// Logs
SaveLog(message string) error
LoadLogs(limit int) ([]string, error)
+96 -5
View File
@@ -2,7 +2,10 @@ package tui
import (
"fmt"
"strings"
"time"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/monitor"
tea "github.com/charmbracelet/bubbletea"
"github.com/charmbracelet/huh"
"github.com/charmbracelet/lipgloss"
@@ -113,34 +116,122 @@ func fmtAlertConfig(alert struct {
}
}
func fmtAlertHealth(h monitor.AlertHealth) string {
if h.LastSendAt.IsZero() {
return subtleStyle.Render("●")
}
if h.LastSendOK {
return specialStyle.Render("●")
}
return dangerStyle.Render("●")
}
func fmtAlertLastSent(h monitor.AlertHealth) string {
if h.LastSendAt.IsZero() {
return subtleStyle.Render("never")
}
d := time.Since(h.LastSendAt)
if d < time.Minute {
return fmt.Sprintf("%ds ago", int(d.Seconds()))
}
if d < time.Hour {
return fmt.Sprintf("%dm ago", int(d.Minutes()))
}
if d < 24*time.Hour {
return fmt.Sprintf("%dh ago", int(d.Hours()))
}
return fmt.Sprintf("%dd ago", int(d.Hours())/24)
}
func (m Model) viewAlertsTab() string {
if len(m.alerts) == 0 {
return "\n No alert channels configured. Press [n] to add one."
}
var headers []string
var widths []int
if m.isWide() {
headers = []string{"#", "", "NAME", "TYPE", "CONFIG", "LAST SENT"}
widths = []int{4, 3, 18, 12, 40, 12}
} else {
headers = []string{"#", "", "NAME", "TYPE", "CONFIG", "SENT"}
widths = []int{4, 3, 14, 10, 24, 8}
}
nameW := widths[2]
cfgW := widths[4]
return m.renderTable(
[]string{"#", "NAME", "TYPE", "CONFIG"},
headers,
len(m.alerts),
func(start, end int) [][]string {
var rows [][]string
for i := start; i < end; i++ {
a := m.alerts[i]
h := m.engine.GetAlertHealth(a.ID)
rows = append(rows, []string{
fmt.Sprintf("%d", i+1),
m.zones.Mark(fmt.Sprintf("alert-%d", i), limitStr(a.Name, 15)),
fmtAlertHealth(h),
m.zones.Mark(fmt.Sprintf("alert-%d", i), limitStr(a.Name, nameW-2)),
fmtAlertType(a.Type),
fmtAlertConfig(struct {
limitStr(fmtAlertConfig(struct {
Type string
Settings map[string]string
}{a.Type, a.Settings}),
}{a.Type, a.Settings}), cfgW-2),
fmtAlertLastSent(h),
})
}
return rows
},
nil, nil,
widths, nil,
)
}
func (m Model) viewAlertDetailPanel() string {
if m.cursor >= len(m.alerts) {
return ""
}
a := m.alerts[m.cursor]
h := m.engine.GetAlertHealth(a.ID)
var b strings.Builder
b.WriteString(subtleStyle.Render(" Alerts > ") + titleStyle.Render(a.Name) + "\n\n")
row := func(label, value string) {
fmt.Fprintf(&b, " %-16s %s\n", subtleStyle.Render(label), value)
}
row("Type", fmtAlertType(a.Type))
if h.LastSendAt.IsZero() {
row("Health", subtleStyle.Render("never sent"))
} else if h.LastSendOK {
row("Health", specialStyle.Render("OK"))
} else {
row("Health", dangerStyle.Render("FAILED"))
}
if !h.LastSendAt.IsZero() {
row("Last Sent", h.LastSendAt.Format("2006-01-02 15:04:05")+" ("+fmtAlertLastSent(h)+")")
}
if h.SendCount > 0 {
row("Sends", fmt.Sprintf("%d sent, %d failed", h.SendCount, h.FailCount))
}
if h.LastError != "" {
row("Last Error", dangerStyle.Render(limitStr(h.LastError, 60)))
}
b.WriteString("\n" + subtleStyle.Render(" CONFIGURATION") + "\n")
for k, v := range a.Settings {
row(k, v)
}
b.WriteString("\n\n")
b.WriteString(subtleStyle.Render(" [i/Esc] Back [e] Edit [t] Test [q] Quit"))
return lipgloss.NewStyle().Padding(1, 2).Render(b.String())
}
func (m *Model) initAlertHuhForm() tea.Cmd {
m.alertFormData = &alertFormData{
AlertType: "discord",
+88 -20
View File
@@ -5,27 +5,83 @@ import (
"strings"
)
func colorizeLog(line string) string {
type logSeverity int
const (
severityInfo logSeverity = iota
severityWarn
severityDown
severityUp
severitySystem
)
func classifyLog(line string) logSeverity {
lower := strings.ToLower(line)
switch {
case strings.Contains(lower, "confirmed down"),
strings.Contains(lower, "is down"),
strings.Contains(lower, "missed heartbeat"),
strings.Contains(lower, "failed check"),
strings.Contains(lower, "ssl warning"):
return dangerStyle.Render(line)
strings.Contains(lower, "alert send failed"):
return severityDown
case strings.Contains(lower, "recovered"),
strings.Contains(lower, "is up"),
strings.Contains(lower, "recovery"):
return specialStyle.Render(line)
strings.Contains(lower, "recovery"),
strings.Contains(lower, "first heartbeat"):
return severityUp
case strings.Contains(lower, "failed check"),
strings.Contains(lower, "ssl warning"),
strings.Contains(lower, "overdue"),
strings.Contains(lower, "was late"):
return severityWarn
case strings.Contains(lower, "engine"),
strings.Contains(lower, "cluster"):
return titleStyle.Render(line)
strings.Contains(lower, "cluster"),
strings.Contains(lower, "loaded"),
strings.Contains(lower, "paused"),
strings.Contains(lower, "resumed"):
return severitySystem
default:
return line
return severityInfo
}
}
func isImportantLog(sev logSeverity) bool {
return sev == severityDown || sev == severityUp || sev == severitySystem
}
func renderLogTag(sev logSeverity) string {
switch sev {
case severityDown:
return dangerStyle.Render(" DOWN ")
case severityUp:
return specialStyle.Render(" UP ")
case severityWarn:
return warnStyle.Render(" WARN ")
case severitySystem:
return titleStyle.Render(" SYS ")
default:
return subtleStyle.Render(" info ")
}
}
func renderLogLine(line string) string {
sev := classifyLog(line)
tag := renderLogTag(sev)
ts := ""
msg := line
if len(line) > 10 && line[0] == '[' {
if idx := strings.Index(line, "]"); idx > 0 && idx < 12 {
ts = subtleStyle.Render(line[1:idx])
msg = strings.TrimSpace(line[idx+1:])
}
}
if ts != "" {
return fmt.Sprintf(" %s %s %s", ts, tag, msg)
}
return fmt.Sprintf(" %s %s", tag, msg)
}
func (m Model) viewLogsTab() string {
content := m.logViewport.View()
if strings.TrimSpace(content) == "" || content == "Waiting for logs..." {
@@ -33,22 +89,34 @@ func (m Model) viewLogsTab() string {
}
lines := strings.Split(content, "\n")
var colored []string
var rendered []string
total := 0
shown := 0
for _, line := range lines {
if line == "" {
colored = append(colored, line)
if strings.TrimSpace(line) == "" {
continue
}
colored = append(colored, colorizeLog(line))
total++
sev := classifyLog(line)
if m.logFilterImportant && !isImportantLog(sev) {
continue
}
shown++
rendered = append(rendered, renderLogLine(line))
}
count := 0
for _, l := range lines {
if strings.TrimSpace(l) != "" {
count++
}
filterLabel := "All"
if m.logFilterImportant {
filterLabel = "Important"
}
header := subtleStyle.Render(fmt.Sprintf(" %d entries [↑/↓] Scroll [PgUp/PgDn] Page", count))
return "\n" + header + "\n\n" + strings.Join(colored, "\n")
header := subtleStyle.Render(fmt.Sprintf(
" %d entries [↑/↓] Scroll [PgUp/PgDn] Page [f] Filter: %s", shown, filterLabel))
if m.logFilterImportant && shown < total {
header += subtleStyle.Render(fmt.Sprintf(" (%d hidden)", total-shown))
}
return "\n" + header + "\n\n" + strings.Join(rendered, "\n")
}
+27 -10
View File
@@ -2,10 +2,11 @@ package tui
import (
"fmt"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"strconv"
"time"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
tea "github.com/charmbracelet/bubbletea"
"github.com/charmbracelet/huh"
"github.com/charmbracelet/lipgloss"
@@ -40,19 +41,19 @@ func fmtMaintType(t string) string {
return maintStyle.Render("maintenance")
}
func fmtMaintMonitor(monitorID int, sites []models.Site) string {
func fmtMaintMonitorW(monitorID int, sites []models.Site, maxW int) string {
if monitorID == 0 {
return "All"
}
for _, s := range sites {
if s.ID == monitorID {
return limitStr(s.Name, 18)
return limitStr(s.Name, maxW)
}
}
return fmt.Sprintf("#%d", monitorID)
}
func fmtMaintTime(t time.Time) string {
func fmtMaintTime(t time.Time, colW int) string {
if t.IsZero() {
return subtleStyle.Render("—")
}
@@ -60,8 +61,11 @@ func fmtMaintTime(t time.Time) string {
if t.Year() == now.Year() && t.YearDay() == now.YearDay() {
return t.Format("15:04")
}
if colW >= 14 {
return t.Format("15:04 Jan 02")
}
return t.Format("Jan 02")
}
func (m Model) isMonitorInMaintenance(monitorID int) bool {
for _, mw := range m.maintenanceWindows {
@@ -92,8 +96,21 @@ func (m Model) viewMaintTab() string {
return "\n No maintenance windows or incidents. Press [n] to create one."
}
var headers []string
var widths []int
if m.isWide() {
headers = []string{"#", "TITLE", "TYPE", "MONITORS", "STATUS", "STARTED", "ENDS"}
widths = []int{4, 24, 14, 22, 12, 16, 16}
} else {
headers = []string{"#", "TITLE", "TYPE", "MON", "ST", "START", "ENDS"}
widths = []int{4, 14, 13, 14, 11, 14, 14}
}
titleW := widths[1]
monW := widths[3]
timeW := widths[5]
return m.renderTable(
[]string{"#", "TITLE", "TYPE", "MONITORS", "STATUS", "STARTED", "ENDS"},
headers,
len(m.maintenanceWindows),
func(start, end int) [][]string {
var rows [][]string
@@ -102,17 +119,17 @@ func (m Model) viewMaintTab() string {
mw := m.maintenanceWindows[i]
rows = append(rows, []string{
strconv.Itoa(i + 1),
m.zones.Mark(fmt.Sprintf("maint-%d", i), limitStr(mw.Title, 24)),
m.zones.Mark(fmt.Sprintf("maint-%d", i), limitStr(mw.Title, titleW-2)),
fmtMaintType(mw.Type),
fmtMaintMonitor(mw.MonitorID, allSites),
fmtMaintMonitorW(mw.MonitorID, allSites, monW-2),
fmtMaintStatus(mw),
fmtMaintTime(mw.StartTime),
fmtMaintTime(mw.EndTime),
fmtMaintTime(mw.StartTime, timeW),
fmtMaintTime(mw.EndTime, timeW),
})
}
return rows
},
[]int{6, 0, 14, 20, 12, 16, 16},
widths,
nil,
)
}
+13 -4
View File
@@ -10,16 +10,25 @@ func (m Model) viewNodesTab() string {
return "\n No probe nodes connected."
}
colWidths := []int{0, 12, 20, 10, 8}
var headers []string
var widths []int
if m.isWide() {
headers = []string{"NAME", "REGION", "LAST SEEN", "VERSION", "STATUS"}
widths = []int{24, 14, 16, 12, 10}
} else {
headers = []string{"NAME", "REGION", "SEEN", "VER", "STATUS"}
widths = []int{16, 10, 10, 8, 8}
}
nameW := widths[0]
return m.renderTable(
[]string{"NAME", "REGION", "LAST SEEN", "VERSION", "STATUS"},
headers,
len(m.nodes),
func(start, end int) [][]string {
var rows [][]string
for i := start; i < end; i++ {
node := m.nodes[i]
name := limitStr(node.Name, 20)
name := limitStr(node.Name, nameW-2)
if name == "" {
name = node.ID
}
@@ -37,7 +46,7 @@ func (m Model) viewNodesTab() string {
}
return rows
},
colWidths,
widths,
nil,
)
}
+194 -36
View File
@@ -7,7 +7,7 @@ import (
"strings"
"time"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
tea "github.com/charmbracelet/bubbletea"
"github.com/charmbracelet/huh"
@@ -60,14 +60,18 @@ type siteFormData struct {
Regions string
}
func latencySparkline(latencies []time.Duration, width int) string {
func latencySparkline(latencies []time.Duration, statuses []bool, width int) string {
if len(latencies) == 0 {
return subtleStyle.Render(strings.Repeat("·", width))
}
samples := latencies
sampledStatuses := statuses
if len(samples) > width {
samples = samples[len(samples)-width:]
if len(sampledStatuses) > width {
sampledStatuses = sampledStatuses[len(sampledStatuses)-width:]
}
}
minL, maxL := samples[0], samples[0]
@@ -85,7 +89,7 @@ func latencySparkline(latencies []time.Duration, width int) string {
sb.WriteString(subtleStyle.Render(strings.Repeat("·", remaining)))
}
spread := maxL - minL
for _, l := range samples {
for i, l := range samples {
idx := 0
if spread > 0 {
idx = int(float64(l-minL) / float64(spread) * 7)
@@ -94,6 +98,10 @@ func latencySparkline(latencies []time.Duration, width int) string {
}
}
ch := string(sparkChars[idx])
isDown := i < len(sampledStatuses) && !sampledStatuses[i]
if isDown {
sb.WriteString(dangerStyle.Render(ch))
} else {
ms := l.Milliseconds()
if ms < 200 {
sb.WriteString(specialStyle.Render(ch))
@@ -103,6 +111,7 @@ func latencySparkline(latencies []time.Duration, width int) string {
sb.WriteString(dangerStyle.Render(ch))
}
}
}
return sb.String()
}
@@ -302,6 +311,8 @@ func fmtStatus(status string, paused bool, inMaint bool) string {
switch status {
case "DOWN", "SSL EXP":
return dangerStyle.Render(status)
case "LATE":
return warnStyle.Render(status)
case "PENDING":
return subtleStyle.Render(status)
default:
@@ -309,28 +320,94 @@ func fmtStatus(status string, paused bool, inMaint bool) string {
}
}
func (m Model) dynamicWidths() (nameW, sparkW int) {
fixed := 6 + 10 + 10 + 8 + 8 + 7 + 9 // #, TYPE, STATUS, LATENCY, UPTIME, SSL, RETRY
overhead := 30 // cell padding + borders
avail := m.termWidth - chromePadH - 2 - fixed - overhead
if avail < 30 {
avail = 30
func fmtDuration(d time.Duration) string {
if d < time.Minute {
return fmt.Sprintf("%ds", int(d.Seconds()))
}
if d < time.Hour {
return fmt.Sprintf("%dm", int(d.Minutes()))
}
if d < 24*time.Hour {
h := int(d.Hours())
m := int(d.Minutes()) % 60
if m > 0 {
return fmt.Sprintf("%dh %dm", h, m)
}
return fmt.Sprintf("%dh", h)
}
days := int(d.Hours()) / 24
hours := int(d.Hours()) % 24
if hours > 0 {
return fmt.Sprintf("%dd %dh", days, hours)
}
return fmt.Sprintf("%dd", days)
}
type tableLayout struct {
nameW, sparkW int
headers []string
colWidths []int
}
func (m Model) computeLayout() tableLayout {
wide := m.isWide()
var fixed int
var headers []string
var widths []int
if wide {
// # NAME TYPE STATUS LATENCY UPTIME HISTORY SSL RETRIES
headers = []string{"#", "NAME", "TYPE", "STATUS", "LATENCY", "UPTIME", "HISTORY", "SSL", "RETRIES"}
widths = []int{4, 0, 10, 10, 10, 8, 0, 7, 9}
fixed = 4 + 10 + 10 + 10 + 8 + 7 + 9
} else {
// # NAME TYPE STATUS LAT UP% HISTORY SSL RT
headers = []string{"#", "NAME", "TYPE", "STATUS", "LAT", "UP%", "HISTORY", "SSL", "RT"}
widths = []int{4, 0, 8, 8, 7, 8, 0, 5, 5}
fixed = 4 + 8 + 8 + 7 + 8 + 5 + 5
}
numCols := len(headers)
borderOverhead := 2 + (numCols - 1)
avail := m.termWidth - chromePadH - 2 - borderOverhead - fixed
if avail < 20 {
avail = 20
}
maxName := 0
for _, s := range m.sites {
if n := len([]rune(s.Name)); n > maxName {
maxName = n
}
}
maxName += 4
nameW := avail / 2
if nameW > maxName {
nameW = maxName
}
nameW = avail / 2
sparkW = avail - nameW - 2 // -2 for spark column padding
if nameW < 13 {
nameW = 13
}
if nameW > 40 {
nameW = 40
}
sparkW := avail - nameW
if sparkW < 10 {
sparkW = 10
}
if sparkW > 60 {
sparkW = 60
widths[1] = nameW
widths[6] = sparkW
return tableLayout{
nameW: nameW,
sparkW: sparkW,
headers: headers,
colWidths: widths,
}
return
}
func (m Model) viewSitesTab() string {
@@ -348,12 +425,16 @@ func (m Model) viewSitesTab() string {
return "\n" + welcome
}
nameW, sparkWidth := m.dynamicWidths()
colWidths := []int{6, 0, 10, 10, 8, 8, sparkWidth + 2, 7, 9}
layout := m.computeLayout()
nameW := layout.nameW
sparkWidth := layout.sparkW - 2
if sparkWidth < 8 {
sparkWidth = 8
}
var groupRows map[int]bool
return m.renderTable(
[]string{"#", "NAME", "TYPE", "STATUS", "LATENCY", "UPTIME", "HISTORY", "SSL", "RETRY"},
layout.headers,
len(m.sites),
func(start, end int) [][]string {
groupRows = make(map[int]bool)
@@ -366,7 +447,7 @@ func (m Model) viewSitesTab() string {
icon := typeIcon("group", m.collapsed[site.ID])
rows = append(rows, []string{
strconv.Itoa(i + 1),
m.zones.Mark(fmt.Sprintf("site-%d", i), icon+" "+limitStr(site.Name, nameW-2)),
m.zones.Mark(fmt.Sprintf("site-%d", i), icon+" "+limitStr(site.Name, nameW-4)),
"group",
fmtStatus(site.Status, site.Paused, m.isMonitorInMaintenance(site.ID)),
subtleStyle.Render("—"),
@@ -384,9 +465,17 @@ func (m Model) viewSitesTab() string {
if i+1 >= len(m.sites) || m.sites[i+1].ParentID != site.ParentID {
prefix = "└"
}
name = prefix + " " + limitStr(name, nameW-2)
name = prefix + " " + limitStr(name, nameW-4)
} else {
name = limitStr(name, nameW)
name = limitStr(name, nameW-2)
}
if (site.Status == "DOWN" || site.Status == "SSL EXP" || site.Status == "LATE") && site.LastError != "" {
nameLen := len([]rune(name))
errSpace := nameW - nameLen - 3
if errSpace > 10 {
name = name + " " + subtleStyle.Render(limitStr(site.LastError, errSpace))
}
}
hist, _ := m.engine.GetHistory(site.ID)
@@ -394,7 +483,7 @@ func (m Model) viewSitesTab() string {
if site.Type == "push" {
spark = heartbeatSparkline(hist.Statuses, sparkWidth)
} else {
spark = latencySparkline(hist.Latencies, sparkWidth)
spark = latencySparkline(hist.Latencies, hist.Statuses, sparkWidth)
}
rows = append(rows, []string{
@@ -411,7 +500,7 @@ func (m Model) viewSitesTab() string {
}
return rows
},
colWidths,
layout.colWidths,
func(row, col int) *lipgloss.Style {
if groupRows[row] {
s := siteGroupStyle
@@ -731,7 +820,30 @@ func (m Model) viewDetailPanel() string {
fmt.Fprintf(&b, " %-16s %s\n", subtleStyle.Render(label), value)
}
section := func(label string) {
b.WriteString("\n" + subtleStyle.Render(" "+label) + "\n")
}
row("Status", fmtStatus(site.Status, site.Paused, m.isMonitorInMaintenance(site.ID)))
if (site.Status == "DOWN" || site.Status == "SSL EXP" || site.Status == "LATE") && site.LastError != "" {
row("Error", dangerStyle.Render(limitStr(site.LastError, 60)))
}
if site.Type == "http" && site.StatusCode > 0 {
row("HTTP Code", strconv.Itoa(site.StatusCode))
}
if !site.StatusChangedAt.IsZero() {
dur := time.Since(site.StatusChangedAt)
row("State Since", site.StatusChangedAt.Format("2006-01-02 15:04:05")+" ("+fmtDuration(dur)+")")
}
if !site.LastSuccessAt.IsZero() {
ago := time.Since(site.LastSuccessAt)
row("Last Success", site.LastSuccessAt.Format("15:04:05")+" ("+fmtDuration(ago)+" ago)")
}
if m.isMonitorInMaintenance(site.ID) {
for _, mw := range m.maintenanceWindows {
if mw.Type == "maintenance" && (mw.MonitorID == 0 || mw.MonitorID == site.ID || mw.MonitorID == site.ParentID) {
@@ -740,6 +852,8 @@ func (m Model) viewDetailPanel() string {
}
}
}
section("ENDPOINT")
row("Type", site.Type)
if site.URL != "" {
row("URL", site.URL)
@@ -750,20 +864,36 @@ func (m Model) viewDetailPanel() string {
if site.Port > 0 {
row("Port", strconv.Itoa(site.Port))
}
section("TIMING")
row("Interval", fmt.Sprintf("%ds", site.Interval))
if site.Timeout > 0 {
row("Timeout", fmt.Sprintf("%ds", site.Timeout))
}
row("Latency", fmtLatency(site.Latency))
row("Uptime", fmtUptime(hist.Statuses))
if !site.LastCheck.IsZero() {
row("Last Check", site.LastCheck.Format("15:04:05"))
}
if site.Type == "http" {
section("HTTP")
if site.Method != "" && site.Method != "GET" {
row("Method", site.Method)
row("Codes", site.AcceptedCodes)
}
codes := site.AcceptedCodes
if codes == "" {
codes = "200-299"
}
row("Codes", codes)
row("SSL", fmtSSL(site))
if site.IgnoreTLS {
row("TLS Verify", dangerStyle.Render("disabled"))
}
}
if site.MaxRetries > 0 || site.Regions != "" || site.Description != "" {
section("CONFIG")
if site.MaxRetries > 0 {
row("Retries", fmtRetries(site))
}
@@ -773,8 +903,6 @@ func (m Model) viewDetailPanel() string {
if site.Description != "" {
row("Description", site.Description)
}
if !site.LastCheck.IsZero() {
row("Last Check", site.LastCheck.Format("15:04:05"))
}
probeResults := m.engine.GetProbeResults(site.ID)
@@ -787,7 +915,30 @@ func (m Model) viewDetailPanel() string {
}
latency := time.Duration(result.LatencyNs).Milliseconds()
ago := time.Since(result.CheckedAt).Truncate(time.Second)
fmt.Fprintf(&b, " %-14s %s %dms %s ago\n", nodeID, status, latency, ago)
line := fmt.Sprintf(" %-14s %s %dms %s ago", nodeID, status, latency, ago)
if !result.IsUp && result.ErrorReason != "" {
line += " " + dangerStyle.Render(limitStr(result.ErrorReason, 30))
}
b.WriteString(line + "\n")
}
}
stateChanges := m.engine.GetStateChanges(site.ID, 5)
if len(stateChanges) > 0 {
b.WriteString("\n" + subtleStyle.Render(" STATE CHANGES") + "\n")
for _, sc := range stateChanges {
ago := fmtDuration(time.Since(sc.ChangedAt))
arrow := subtleStyle.Render(sc.FromStatus) + " → "
if sc.ToStatus == "UP" {
arrow += specialStyle.Render(sc.ToStatus)
} else {
arrow += dangerStyle.Render(sc.ToStatus)
}
line := fmt.Sprintf(" %s %s", arrow, subtleStyle.Render(ago+" ago"))
if sc.ErrorReason != "" && sc.ToStatus != "UP" {
line += " " + dangerStyle.Render(limitStr(sc.ErrorReason, 40))
}
b.WriteString(line + "\n")
}
}
@@ -807,20 +958,27 @@ func (m Model) viewDetailPanel() string {
up, len(hist.Statuses))
}
} else {
b.WriteString(" " + latencySparkline(hist.Latencies, sparkWidth))
if len(hist.Latencies) > 0 {
minL, maxL := hist.Latencies[0], hist.Latencies[0]
var total time.Duration
for _, l := range hist.Latencies {
total += l
if l < minL {
minL = l
b.WriteString(" " + latencySparkline(hist.Latencies, hist.Statuses, sparkWidth))
// Stats over successful checks only — a failed check is stored as 0ns latency
// and would otherwise drag Min to 0ms and skew the average.
var minL, maxL, total time.Duration
count := 0
for i, l := range hist.Latencies {
if i < len(hist.Statuses) && !hist.Statuses[i] {
continue
}
if l > maxL {
if count == 0 {
minL, maxL = l, l
} else if l < minL {
minL = l
} else if l > maxL {
maxL = l
}
total += l
count++
}
avg := total / time.Duration(len(hist.Latencies))
if count > 0 {
avg := total / time.Duration(count)
fmt.Fprintf(&b, "\n %s %dms %s %dms %s %dms",
subtleStyle.Render("Min"), minL.Milliseconds(),
subtleStyle.Render("Avg"), avg.Milliseconds(),
+14 -3
View File
@@ -32,8 +32,19 @@ func (m Model) viewUsersTab() string {
return "\n No users configured. Press [n] to add one."
}
var headers []string
var widths []int
if m.isWide() {
headers = []string{"#", "USERNAME", "ROLE", "PUBLIC KEY"}
widths = []int{4, 18, 10, 50}
} else {
headers = []string{"#", "USER", "ROLE", "KEY"}
widths = []int{4, 14, 8, 30}
}
userW := widths[1]
return m.renderTable(
[]string{"#", "USERNAME", "ROLE", "PUBLIC KEY"},
headers,
len(m.users),
func(start, end int) [][]string {
var rows [][]string
@@ -41,14 +52,14 @@ func (m Model) viewUsersTab() string {
u := m.users[i]
rows = append(rows, []string{
fmt.Sprintf("%d", i+1),
m.zones.Mark(fmt.Sprintf("user-%d", i), limitStr(u.Username, 15)),
m.zones.Mark(fmt.Sprintf("user-%d", i), limitStr(u.Username, userW-2)),
fmtRole(u.Role),
fmtKey(u.PublicKey),
})
}
return rows
},
nil, nil,
widths, nil,
)
}
+23 -4
View File
@@ -15,6 +15,12 @@ var (
type StyleOverride func(row, col int) *lipgloss.Style
const wideBreakpoint = 120
func (m Model) isWide() bool {
return m.termWidth >= wideBreakpoint
}
func (m Model) renderTable(headers []string, items int, buildRows func(start, end int) [][]string, colWidths []int, styleOverride StyleOverride) string {
if items == 0 {
return ""
@@ -28,7 +34,16 @@ func (m Model) renderTable(headers []string, items int, buildRows func(start, en
selectedVisual := m.cursor - m.tableOffset
rows := buildRows(m.tableOffset, end)
tableWidth := m.termWidth - chromePadH - 2
colTotal := 0
for _, w := range colWidths {
colTotal += w
}
borderOverhead := 2 + len(colWidths) - 1
tableWidth := colTotal + borderOverhead
maxWidth := m.termWidth - chromePadH - 2
if tableWidth > maxWidth {
tableWidth = maxWidth
}
if tableWidth < 40 {
tableWidth = 40
}
@@ -41,7 +56,11 @@ func (m Model) renderTable(headers []string, items int, buildRows func(start, en
Rows(rows...).
StyleFunc(func(row, col int) lipgloss.Style {
if row == table.HeaderRow {
return tableHeaderStyle
h := tableHeaderStyle
if col < len(colWidths) && colWidths[col] > 0 {
h = h.Width(colWidths[col]).MaxWidth(colWidths[col])
}
return h
}
isSelected := row == selectedVisual
if styleOverride != nil {
@@ -51,7 +70,7 @@ func (m Model) renderTable(headers []string, items int, buildRows func(start, en
style = tableSelectedStyle.Foreground(s.GetForeground())
}
if col < len(colWidths) && colWidths[col] > 0 {
style = style.Width(colWidths[col])
style = style.Width(colWidths[col]).MaxWidth(colWidths[col])
}
return style
}
@@ -64,7 +83,7 @@ func (m Model) renderTable(headers []string, items int, buildRows func(start, en
base = tableSelectedStyle
}
if col < len(colWidths) && colWidths[col] > 0 {
base = base.Width(colWidths[col])
base = base.Width(colWidths[col]).MaxWidth(colWidths[col])
}
return base
})
+81 -14
View File
@@ -4,13 +4,14 @@ import (
"encoding/json"
"fmt"
"math"
"os"
"sort"
"strings"
"time"
"gitea.lerkolabs.com/lerko/uptop/internal/models"
"gitea.lerkolabs.com/lerko/uptop/internal/monitor"
"gitea.lerkolabs.com/lerko/uptop/internal/store"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/models"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/monitor"
"gitea.lerkolabs.com/lerkolabs/uptop/internal/store"
"github.com/charmbracelet/bubbles/viewport"
tea "github.com/charmbracelet/bubbletea"
@@ -68,6 +69,7 @@ const (
stateLogs
stateUsers
stateDetail
stateAlertDetail
stateFormSite
stateFormAlert
stateFormUser
@@ -93,6 +95,7 @@ type Model struct {
maintFormData *maintFormData
logViewport viewport.Model
logFilterImportant bool
isAdmin bool
zones *zone.Manager
@@ -120,6 +123,10 @@ type Model struct {
filterMode bool
filterText string
// demoMode renders a stable status dot instead of the animated pulse so
// screenshots/recordings don't capture the spinner mid-frame. Set via UPTOP_DEMO=1.
demoMode bool
}
func InitialModel(isAdmin bool, s store.Store, eng *monitor.Engine) Model {
@@ -153,6 +160,7 @@ func InitialModel(isAdmin bool, s store.Store, eng *monitor.Engine) Model {
collapsed: collapsed,
theme: theme,
themeIndex: themeIdx,
demoMode: os.Getenv("UPTOP_DEMO") == "1",
}
}
@@ -383,6 +391,14 @@ func (m Model) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
return m, tea.Quit
}
return m, nil
case stateAlertDetail:
switch msg.String() {
case "i", "esc":
m.state = stateDashboard
case "q":
return m, tea.Quit
}
return m, nil
case stateDashboard, stateLogs, stateUsers:
switch msg.String() {
case "q":
@@ -392,6 +408,11 @@ func (m Model) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
m.filterMode = true
return m, nil
}
case "f":
if m.state == stateLogs {
m.logFilterImportant = !m.logFilterImportant
return m, nil
}
case "tab":
m.switchTab(m.currentTab + 1)
case "pgup", "pgdown":
@@ -463,6 +484,16 @@ func (m Model) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
m.state = stateFormUser
return m, m.initUserHuhForm()
}
case "t":
if m.currentTab == 1 && len(m.alerts) > 0 {
a := m.alerts[m.cursor]
go func() {
if err := m.engine.TestAlert(a.ID); err != nil {
m.engine.AddLog(fmt.Sprintf("Test alert failed (%s): %v", a.Name, err))
}
}()
return m, nil
}
case " ":
if m.currentTab == 0 && len(m.sites) > 0 && m.sites[m.cursor].Type == "group" {
gid := m.sites[m.cursor].ID
@@ -481,6 +512,8 @@ func (m Model) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
case "i":
if m.currentTab == 0 && len(m.sites) > 0 {
m.state = stateDetail
} else if m.currentTab == 1 && len(m.alerts) > 0 {
m.state = stateAlertDetail
}
case "x":
if m.currentTab == 4 && len(m.maintenanceWindows) > 0 {
@@ -727,11 +760,6 @@ func (m *Model) submitForm() {
}
func (m Model) pulseIndicator() string {
frame := m.tickCount % len(pulseFrames)
brightness := int(m.pulsePos*155) + 100
if brightness > 255 {
brightness = 255
}
hasDown := false
for _, s := range m.sites {
if !s.Paused && !m.isMonitorInMaintenance(s.ID) && (s.Status == "DOWN" || s.Status == "SSL EXP") {
@@ -739,6 +767,19 @@ func (m Model) pulseIndicator() string {
break
}
}
// Stills can't show animation: render a stable status dot in demo mode.
if m.demoMode {
c := m.theme.Success
if hasDown {
c = m.theme.Danger
}
return lipgloss.NewStyle().Foreground(c).Render("●")
}
frame := m.tickCount % len(pulseFrames)
brightness := int(m.pulsePos*155) + 100
if brightness > 255 {
brightness = 255
}
var color string
if hasDown {
color = fmt.Sprintf("#%02x%02x%02x", brightness, brightness/4, brightness/4)
@@ -802,6 +843,8 @@ func (m Model) View() string {
return ""
case stateDetail:
return m.viewDetailPanel()
case stateAlertDetail:
return m.viewAlertDetailPanel()
default:
return m.zones.Scan(m.viewDashboard())
}
@@ -811,13 +854,20 @@ func (m Model) viewDashboard() string {
allSites := m.engine.GetAllSites()
totalMonitors := 0
downCount := 0
lateCount := 0
for _, s := range allSites {
if s.Type == "group" {
continue
}
totalMonitors++
if !s.Paused && !m.isMonitorInMaintenance(s.ID) && (s.Status == "DOWN" || s.Status == "SSL EXP") {
if s.Paused || m.isMonitorInMaintenance(s.ID) {
continue
}
switch s.Status {
case "DOWN", "SSL EXP":
downCount++
case "LATE":
lateCount++
}
}
offlineNodes := 0
@@ -830,6 +880,8 @@ func (m Model) viewDashboard() string {
var sitesLabel string
if downCount > 0 {
sitesLabel = fmt.Sprintf("Sites (%d↓)", downCount)
} else if lateCount > 0 {
sitesLabel = fmt.Sprintf("Sites (%d⚠)", lateCount)
} else if totalMonitors > 0 {
sitesLabel = fmt.Sprintf("Sites (%d)", totalMonitors)
} else {
@@ -895,14 +947,19 @@ func (m Model) viewDashboard() string {
}
}
upCount := totalMonitors - downCount
upCount := totalMonitors - downCount - lateCount
var upStr string
if downCount > 0 {
upStr = dangerStyle.Render(fmt.Sprintf("%d/%d UP", upCount, totalMonitors))
} else if lateCount > 0 {
upStr = warnStyle.Render(fmt.Sprintf("%d/%d UP", upCount, totalMonitors))
} else {
upStr = specialStyle.Render(fmt.Sprintf("%d/%d UP", upCount, totalMonitors))
}
statusParts := []string{upStr}
if lateCount > 0 {
statusParts = append(statusParts, warnStyle.Render(fmt.Sprintf("%d LATE", lateCount)))
}
if len(m.nodes) > 0 {
online := 0
for _, n := range m.nodes {
@@ -910,7 +967,11 @@ func (m Model) viewDashboard() string {
online++
}
}
statusParts = append(statusParts, fmt.Sprintf("%d probes", online))
probeLabel := "probes"
if online == 1 {
probeLabel = "probe"
}
statusParts = append(statusParts, fmt.Sprintf("%d %s", online, probeLabel))
}
statusLine := strings.Join(statusParts, subtleStyle.Render(" · "))
@@ -923,6 +984,10 @@ func (m Model) viewDashboard() string {
switch m.currentTab {
case 0:
keys = "[/]Filter [n]New [e]Edit [i]Info [d]Del [p]Pause [T]Theme [Tab]Switch [q]Quit"
case 1:
keys = "[n]New [e]Edit [i]Info [d]Del [t]Test [T]Theme [Tab]Switch [q]Quit"
case 2:
keys = "[f]Filter [T]Theme [Tab]Switch [q]Quit"
case 4:
keys = "[n]New [x]End [d]Del [T]Theme [Tab]Switch [q]Quit"
case 5:
@@ -949,10 +1014,12 @@ func siteOrder(s models.Site) int {
switch s.Status {
case "DOWN", "SSL EXP":
return 0
case "PENDING":
return 2
default:
case "LATE":
return 1
case "PENDING":
return 3
default:
return 2
}
}