upleb.uk

Public git repos — served from a NIP-34 GRASP relay at git.upleb.uk

summaryrefslogtreecommitdiff
path: root/docs/explanation/monitoring.md
diff options
context:
space:
mode:
authorDanConwayDev <DanConwayDev@protonmail.com>2026-01-09 14:23:44 +0000
committerDanConwayDev <DanConwayDev@protonmail.com>2026-01-09 14:23:44 +0000
commit208ea60836cfc98857cf3359a73d8874ed5d935a (patch)
treee653189912f8e749170ef6645a85f9d6c907b3e6 /docs/explanation/monitoring.md
parent93a1684f068603b354ba3c05957a25459c73de05 (diff)
refactor(sync): rename ConnectedDegraded to ConnectedHistoricSyncFailures
Resolves naming conflict with RelayHealthState::Degraded by using a more explicit name that clearly indicates the connection status relates to historic sync failures, not connection health degradation. Changes: - ConnectionStatus::ConnectedDegraded → ConnectedHistoricSyncFailures - Updated all documentation and comments - Updated Prometheus metric descriptions - Metric value remains 4 for backward compatibility This makes it clear that: - ConnectedHistoricSyncFailures = connection lifecycle (missing historic data) - RelayHealthState::Degraded = connection health (reliability issues) These are orthogonal concerns - a relay can be ConnectedHistoricSyncFailures but Healthy, or Connected but Degraded.
Diffstat (limited to 'docs/explanation/monitoring.md')
-rw-r--r--docs/explanation/monitoring.md10
1 files changed, 5 insertions, 5 deletions
diff --git a/docs/explanation/monitoring.md b/docs/explanation/monitoring.md
index cc164ab..7520813 100644
--- a/docs/explanation/monitoring.md
+++ b/docs/explanation/monitoring.md
@@ -98,7 +98,7 @@ When GRASP-02 proactive sync is implemented, the following metrics will be added
98 98
99| Metric | Type | Labels | Description | 99| Metric | Type | Labels | Description |
100|--------|------|--------|-------------| 100|--------|------|--------|-------------|
101| `ngit_sync_relay_connected` | Gauge | relay | Connection status (0=disconnected, 1=connecting, 2=syncing, 3=connected, 4=connected_degraded) | 101| `ngit_sync_relay_connected` | Gauge | relay | Connection status (0=disconnected, 1=connecting, 2=syncing, 3=connected, 4=connected_historic_sync_failures) |
102| `ngit_sync_connection_attempts_total` | Counter | relay, result | Connection attempt outcomes | 102| `ngit_sync_connection_attempts_total` | Counter | relay, result | Connection attempt outcomes |
103| `ngit_sync_relay_status` | Gauge | relay | Health status (1=healthy, 2=disconnected, 3=degraded, 4=dead, 5=rate_limited) | 103| `ngit_sync_relay_status` | Gauge | relay | Health status (1=healthy, 2=disconnected, 3=degraded, 4=dead, 5=rate_limited) |
104| `ngit_sync_relay_failures` | Gauge | relay | Current consecutive failure count | 104| `ngit_sync_relay_failures` | Gauge | relay | Current consecutive failure count |
@@ -115,9 +115,9 @@ The `ngit_sync_relay_connected` metric tracks the connection lifecycle:
115- `1` = **Connecting** - Connection attempt in progress 115- `1` = **Connecting** - Connection attempt in progress
116- `2` = **Syncing** - Connected, historic sync in progress 116- `2` = **Syncing** - Connected, historic sync in progress
117- `3` = **Connected** - Connected, historic sync complete, live sync active 117- `3` = **Connected** - Connected, historic sync complete, live sync active
118- `4` = **ConnectedDegraded** - Connected, historic sync failed, live sync active, partial data 118- `4` = **ConnectedHistoricSyncFailures** - Connected, historic sync had failures, live sync active, partial data
119 119
120This allows operators to distinguish between "connected but still catching up" (Syncing) vs "fully synced and live" (Connected) vs "degraded - missing historic data" (ConnectedDegraded). 120This allows operators to distinguish between "connected but still catching up" (Syncing) vs "fully synced and live" (Connected) vs "historic sync failures - missing historic data" (ConnectedHistoricSyncFailures).
121 121
122### Relay Health States 122### Relay Health States
123 123
@@ -137,12 +137,12 @@ sum by (relay) (ngit_sync_relay_connected == 0) # Disconnected
137sum by (relay) (ngit_sync_relay_connected == 1) # Connecting 137sum by (relay) (ngit_sync_relay_connected == 1) # Connecting
138sum by (relay) (ngit_sync_relay_connected == 2) # Syncing 138sum by (relay) (ngit_sync_relay_connected == 2) # Syncing
139sum by (relay) (ngit_sync_relay_connected == 3) # Connected 139sum by (relay) (ngit_sync_relay_connected == 3) # Connected
140sum by (relay) (ngit_sync_relay_connected == 4) # ConnectedDegraded 140sum by (relay) (ngit_sync_relay_connected == 4) # ConnectedHistoricSyncFailures
141 141
142# Relays still syncing (not yet fully caught up) 142# Relays still syncing (not yet fully caught up)
143count(ngit_sync_relay_connected == 2) 143count(ngit_sync_relay_connected == 2)
144 144
145# Relays with degraded sync (missing historic data) 145# Relays with historic sync failures (missing historic data)
146count(ngit_sync_relay_connected == 4) 146count(ngit_sync_relay_connected == 4)
147 147
148# Connection success rate over last hour 148# Connection success rate over last hour