upleb.uk

Public git repos — served from a NIP-34 GRASP relay at git.upleb.uk

summaryrefslogtreecommitdiff
path: root/docs/explanation/monitoring.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/explanation/monitoring.md')
-rw-r--r--docs/explanation/monitoring.md10
1 files changed, 5 insertions, 5 deletions
diff --git a/docs/explanation/monitoring.md b/docs/explanation/monitoring.md
index cc164ab..7520813 100644
--- a/docs/explanation/monitoring.md
+++ b/docs/explanation/monitoring.md
@@ -98,7 +98,7 @@ When GRASP-02 proactive sync is implemented, the following metrics will be added
98 98
99| Metric | Type | Labels | Description | 99| Metric | Type | Labels | Description |
100|--------|------|--------|-------------| 100|--------|------|--------|-------------|
101| `ngit_sync_relay_connected` | Gauge | relay | Connection status (0=disconnected, 1=connecting, 2=syncing, 3=connected, 4=connected_degraded) | 101| `ngit_sync_relay_connected` | Gauge | relay | Connection status (0=disconnected, 1=connecting, 2=syncing, 3=connected, 4=connected_historic_sync_failures) |
102| `ngit_sync_connection_attempts_total` | Counter | relay, result | Connection attempt outcomes | 102| `ngit_sync_connection_attempts_total` | Counter | relay, result | Connection attempt outcomes |
103| `ngit_sync_relay_status` | Gauge | relay | Health status (1=healthy, 2=disconnected, 3=degraded, 4=dead, 5=rate_limited) | 103| `ngit_sync_relay_status` | Gauge | relay | Health status (1=healthy, 2=disconnected, 3=degraded, 4=dead, 5=rate_limited) |
104| `ngit_sync_relay_failures` | Gauge | relay | Current consecutive failure count | 104| `ngit_sync_relay_failures` | Gauge | relay | Current consecutive failure count |
@@ -115,9 +115,9 @@ The `ngit_sync_relay_connected` metric tracks the connection lifecycle:
115- `1` = **Connecting** - Connection attempt in progress 115- `1` = **Connecting** - Connection attempt in progress
116- `2` = **Syncing** - Connected, historic sync in progress 116- `2` = **Syncing** - Connected, historic sync in progress
117- `3` = **Connected** - Connected, historic sync complete, live sync active 117- `3` = **Connected** - Connected, historic sync complete, live sync active
118- `4` = **ConnectedDegraded** - Connected, historic sync failed, live sync active, partial data 118- `4` = **ConnectedHistoricSyncFailures** - Connected, historic sync had failures, live sync active, partial data
119 119
120This allows operators to distinguish between "connected but still catching up" (Syncing) vs "fully synced and live" (Connected) vs "degraded - missing historic data" (ConnectedDegraded). 120This allows operators to distinguish between "connected but still catching up" (Syncing) vs "fully synced and live" (Connected) vs "historic sync failures - missing historic data" (ConnectedHistoricSyncFailures).
121 121
122### Relay Health States 122### Relay Health States
123 123
@@ -137,12 +137,12 @@ sum by (relay) (ngit_sync_relay_connected == 0) # Disconnected
137sum by (relay) (ngit_sync_relay_connected == 1) # Connecting 137sum by (relay) (ngit_sync_relay_connected == 1) # Connecting
138sum by (relay) (ngit_sync_relay_connected == 2) # Syncing 138sum by (relay) (ngit_sync_relay_connected == 2) # Syncing
139sum by (relay) (ngit_sync_relay_connected == 3) # Connected 139sum by (relay) (ngit_sync_relay_connected == 3) # Connected
140sum by (relay) (ngit_sync_relay_connected == 4) # ConnectedDegraded 140sum by (relay) (ngit_sync_relay_connected == 4) # ConnectedHistoricSyncFailures
141 141
142# Relays still syncing (not yet fully caught up) 142# Relays still syncing (not yet fully caught up)
143count(ngit_sync_relay_connected == 2) 143count(ngit_sync_relay_connected == 2)
144 144
145# Relays with degraded sync (missing historic data) 145# Relays with historic sync failures (missing historic data)
146count(ngit_sync_relay_connected == 4) 146count(ngit_sync_relay_connected == 4)
147 147
148# Connection success rate over last hour 148# Connection success rate over last hour