feat(cluster): add probe execution mode, check extraction, and result aggregation

Phase 2 of distributed probing:
- Extract check logic into standalone RunCheck() for use by probes
- Add probe cluster mode: stateless nodes that fetch assignments, execute
  checks, and report results to the leader
- Add multi-node result aggregation with configurable strategy
  (any-down, majority-down, all-down)
- Leader ingests probe results into engine live state and triggers alerts
- New env vars: UPKEEP_NODE_ID, UPKEEP_NODE_NAME, UPKEEP_NODE_REGION,
  UPKEEP_AGG_STRATEGY
- Example docker-compose.probe.yml with leader + 2 regional probes
This commit is contained in:
2026-05-16 11:19:57 -04:00
parent ca9faa0acd
commit ca5a42314f
8 changed files with 592 additions and 215 deletions
+35
View File
@@ -0,0 +1,35 @@
services:
leader:
build: .
environment:
- UPKEEP_CLUSTER_MODE=leader
- UPKEEP_CLUSTER_SECRET=changeme
- UPKEEP_AGG_STRATEGY=any-down
- UPKEEP_STATUS_ENABLED=true
ports:
- "8080:8080"
- "23234:23234"
probe-us-east:
build: .
environment:
- UPKEEP_CLUSTER_MODE=probe
- UPKEEP_NODE_ID=us-east-1
- UPKEEP_NODE_NAME=US East Probe
- UPKEEP_NODE_REGION=us-east
- UPKEEP_PEER_URL=http://leader:8080
- UPKEEP_CLUSTER_SECRET=changeme
depends_on:
- leader
probe-eu-west:
build: .
environment:
- UPKEEP_CLUSTER_MODE=probe
- UPKEEP_NODE_ID=eu-west-1
- UPKEEP_NODE_NAME=EU West Probe
- UPKEEP_NODE_REGION=eu-west
- UPKEEP_PEER_URL=http://leader:8080
- UPKEEP_CLUSTER_SECRET=changeme
depends_on:
- leader