feat(cluster): add distributed probing foundation #9

Merged
lerko merged 3 commits from feat/distributed-probing-foundation into develop 2026-05-16 16:01:16 +00:00
Owner

Summary

  • Add ProbeNode model and nodes table (SQLite + Postgres) for probe registration
  • Add node_id column to check_history for multi-source check tracking
  • Extend Store interface with node CRUD and SaveCheckFromNode
  • Add UpsertNodeSQL to Dialect interface (INSERT OR REPLACE / ON CONFLICT)
  • Add 3 new API endpoints: POST /api/probe/register, GET /api/probe/assignments, POST /api/probe/results
  • Fully backward compatible — existing SaveCheck wraps SaveCheckFromNode with empty node_id

Phase 1 of 3 — Distributed Probing

Foundation layer only. Results are stored to DB but not yet aggregated into engine state. Probe execution mode comes in Phase 2.

Test plan

  • go test ./... passes
  • Start app with existing DB — verify migrations add node_id column and nodes table without data loss
  • curl -X POST -H "X-Upkeep-Secret: <key>" -d '{"id":"test","name":"Test","region":"local"}' /api/probe/register returns 200
  • curl -H "X-Upkeep-Secret: <key>" /api/probe/assignments returns non-paused, non-push/group sites
  • curl -X POST -H "X-Upkeep-Secret: <key>" -d '{"node_id":"test","results":[{"site_id":1,"latency_ns":5000000,"is_up":true}]}' /api/probe/results returns 200
## Summary - Add `ProbeNode` model and `nodes` table (SQLite + Postgres) for probe registration - Add `node_id` column to `check_history` for multi-source check tracking - Extend `Store` interface with node CRUD and `SaveCheckFromNode` - Add `UpsertNodeSQL` to `Dialect` interface (INSERT OR REPLACE / ON CONFLICT) - Add 3 new API endpoints: `POST /api/probe/register`, `GET /api/probe/assignments`, `POST /api/probe/results` - Fully backward compatible — existing `SaveCheck` wraps `SaveCheckFromNode` with empty node_id ## Phase 1 of 3 — Distributed Probing Foundation layer only. Results are stored to DB but not yet aggregated into engine state. Probe execution mode comes in Phase 2. ## Test plan - [x] `go test ./...` passes - [ ] Start app with existing DB — verify migrations add `node_id` column and `nodes` table without data loss - [ ] `curl -X POST -H "X-Upkeep-Secret: <key>" -d '{"id":"test","name":"Test","region":"local"}' /api/probe/register` returns 200 - [ ] `curl -H "X-Upkeep-Secret: <key>" /api/probe/assignments` returns non-paused, non-push/group sites - [ ] `curl -X POST -H "X-Upkeep-Secret: <key>" -d '{"node_id":"test","results":[{"site_id":1,"latency_ns":5000000,"is_up":true}]}' /api/probe/results` returns 200
lerko added 1 commit 2026-05-16 15:06:25 +00:00
Add node-aware check history and probe registration infrastructure:
- ProbeNode model and nodes table (SQLite + Postgres)
- node_id column on check_history for multi-source tracking
- Store interface: RegisterNode, GetNode, GetAllNodes, DeleteNode, SaveCheckFromNode
- Dialect: UpsertNodeSQL (INSERT OR REPLACE / ON CONFLICT)
- API endpoints: POST /api/probe/register, GET /api/probe/assignments, POST /api/probe/results
- Backward compatible: existing SaveCheck wraps SaveCheckFromNode with empty node_id
lerko added 1 commit 2026-05-16 15:20:03 +00:00
Phase 2 of distributed probing:
- Extract check logic into standalone RunCheck() for use by probes
- Add probe cluster mode: stateless nodes that fetch assignments, execute
  checks, and report results to the leader
- Add multi-node result aggregation with configurable strategy
  (any-down, majority-down, all-down)
- Leader ingests probe results into engine live state and triggers alerts
- New env vars: UPKEEP_NODE_ID, UPKEEP_NODE_NAME, UPKEEP_NODE_REGION,
  UPKEEP_AGG_STRATEGY
- Example docker-compose.probe.yml with leader + 2 regional probes
lerko added 1 commit 2026-05-16 15:50:23 +00:00
Phase 3 of distributed probing:
- Add regions column to sites table for per-monitor probe affinity
- Region-filtered probe assignments (empty regions = all probes)
- New Nodes TUI tab showing connected probes with status/region/last-seen
- Regions input field in site form for configuring probe affinity
- Config-as-code support for regions (export/import/diff)
- Prometheus upkeep_probe_up metric with per-node labels
- Reindex TUI tabs: Sites, Alerts, Logs, Nodes, Users
lerko merged commit 4ac4973eaf into develop 2026-05-16 16:01:16 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: lerkolabs/uptop#9