feat(cluster): add probe execution mode, check extraction, and result aggregation

Phase 2 of distributed probing:
- Extract check logic into standalone RunCheck() for use by probes
- Add probe cluster mode: stateless nodes that fetch assignments, execute
  checks, and report results to the leader
- Add multi-node result aggregation with configurable strategy
  (any-down, majority-down, all-down)
- Leader ingests probe results into engine live state and triggers alerts
- New env vars: UPKEEP_NODE_ID, UPKEEP_NODE_NAME, UPKEEP_NODE_REGION,
  UPKEEP_AGG_STRATEGY
- Example docker-compose.probe.yml with leader + 2 regional probes
This commit is contained in:
2026-05-16 11:19:57 -04:00
parent ca9faa0acd
commit ca5a42314f
8 changed files with 592 additions and 215 deletions
+2
View File
@@ -33,6 +33,8 @@ func Start(ctx context.Context, cfg Config, eng *monitor.Engine) {
eng.SetActive(false)
go runFollowerLoop(ctx, cfg, eng)
}
// "probe" mode is handled directly in main.go before cluster.Start is called
}
func runFollowerLoop(ctx context.Context, cfg Config, eng *monitor.Engine) {