diff options
| author | DanConwayDev <DanConwayDev@protonmail.com> | 2026-01-09 14:23:44 +0000 |
|---|---|---|
| committer | DanConwayDev <DanConwayDev@protonmail.com> | 2026-01-09 14:23:44 +0000 |
| commit | 208ea60836cfc98857cf3359a73d8874ed5d935a (patch) | |
| tree | e653189912f8e749170ef6645a85f9d6c907b3e6 /docs/explanation/monitoring.md | |
| parent | 93a1684f068603b354ba3c05957a25459c73de05 (diff) | |
refactor(sync): rename ConnectedDegraded to ConnectedHistoricSyncFailures
Resolves naming conflict with RelayHealthState::Degraded by using a more
explicit name that clearly indicates the connection status relates to
historic sync failures, not connection health degradation.
Changes:
- ConnectionStatus::ConnectedDegraded → ConnectedHistoricSyncFailures
- Updated all documentation and comments
- Updated Prometheus metric descriptions
- Metric value remains 4 for backward compatibility
This makes it clear that:
- ConnectedHistoricSyncFailures = connection lifecycle (missing historic data)
- RelayHealthState::Degraded = connection health (reliability issues)
These are orthogonal concerns - a relay can be ConnectedHistoricSyncFailures
but Healthy, or Connected but Degraded.
Diffstat (limited to 'docs/explanation/monitoring.md')
| -rw-r--r-- | docs/explanation/monitoring.md | 10 |
1 files changed, 5 insertions, 5 deletions
diff --git a/docs/explanation/monitoring.md b/docs/explanation/monitoring.md index cc164ab..7520813 100644 --- a/docs/explanation/monitoring.md +++ b/docs/explanation/monitoring.md | |||
| @@ -98,7 +98,7 @@ When GRASP-02 proactive sync is implemented, the following metrics will be added | |||
| 98 | 98 | ||
| 99 | | Metric | Type | Labels | Description | | 99 | | Metric | Type | Labels | Description | |
| 100 | |--------|------|--------|-------------| | 100 | |--------|------|--------|-------------| |
| 101 | | `ngit_sync_relay_connected` | Gauge | relay | Connection status (0=disconnected, 1=connecting, 2=syncing, 3=connected, 4=connected_degraded) | | 101 | | `ngit_sync_relay_connected` | Gauge | relay | Connection status (0=disconnected, 1=connecting, 2=syncing, 3=connected, 4=connected_historic_sync_failures) | |
| 102 | | `ngit_sync_connection_attempts_total` | Counter | relay, result | Connection attempt outcomes | | 102 | | `ngit_sync_connection_attempts_total` | Counter | relay, result | Connection attempt outcomes | |
| 103 | | `ngit_sync_relay_status` | Gauge | relay | Health status (1=healthy, 2=disconnected, 3=degraded, 4=dead, 5=rate_limited) | | 103 | | `ngit_sync_relay_status` | Gauge | relay | Health status (1=healthy, 2=disconnected, 3=degraded, 4=dead, 5=rate_limited) | |
| 104 | | `ngit_sync_relay_failures` | Gauge | relay | Current consecutive failure count | | 104 | | `ngit_sync_relay_failures` | Gauge | relay | Current consecutive failure count | |
| @@ -115,9 +115,9 @@ The `ngit_sync_relay_connected` metric tracks the connection lifecycle: | |||
| 115 | - `1` = **Connecting** - Connection attempt in progress | 115 | - `1` = **Connecting** - Connection attempt in progress |
| 116 | - `2` = **Syncing** - Connected, historic sync in progress | 116 | - `2` = **Syncing** - Connected, historic sync in progress |
| 117 | - `3` = **Connected** - Connected, historic sync complete, live sync active | 117 | - `3` = **Connected** - Connected, historic sync complete, live sync active |
| 118 | - `4` = **ConnectedDegraded** - Connected, historic sync failed, live sync active, partial data | 118 | - `4` = **ConnectedHistoricSyncFailures** - Connected, historic sync had failures, live sync active, partial data |
| 119 | 119 | ||
| 120 | This allows operators to distinguish between "connected but still catching up" (Syncing) vs "fully synced and live" (Connected) vs "degraded - missing historic data" (ConnectedDegraded). | 120 | This allows operators to distinguish between "connected but still catching up" (Syncing) vs "fully synced and live" (Connected) vs "historic sync failures - missing historic data" (ConnectedHistoricSyncFailures). |
| 121 | 121 | ||
| 122 | ### Relay Health States | 122 | ### Relay Health States |
| 123 | 123 | ||
| @@ -137,12 +137,12 @@ sum by (relay) (ngit_sync_relay_connected == 0) # Disconnected | |||
| 137 | sum by (relay) (ngit_sync_relay_connected == 1) # Connecting | 137 | sum by (relay) (ngit_sync_relay_connected == 1) # Connecting |
| 138 | sum by (relay) (ngit_sync_relay_connected == 2) # Syncing | 138 | sum by (relay) (ngit_sync_relay_connected == 2) # Syncing |
| 139 | sum by (relay) (ngit_sync_relay_connected == 3) # Connected | 139 | sum by (relay) (ngit_sync_relay_connected == 3) # Connected |
| 140 | sum by (relay) (ngit_sync_relay_connected == 4) # ConnectedDegraded | 140 | sum by (relay) (ngit_sync_relay_connected == 4) # ConnectedHistoricSyncFailures |
| 141 | 141 | ||
| 142 | # Relays still syncing (not yet fully caught up) | 142 | # Relays still syncing (not yet fully caught up) |
| 143 | count(ngit_sync_relay_connected == 2) | 143 | count(ngit_sync_relay_connected == 2) |
| 144 | 144 | ||
| 145 | # Relays with degraded sync (missing historic data) | 145 | # Relays with historic sync failures (missing historic data) |
| 146 | count(ngit_sync_relay_connected == 4) | 146 | count(ngit_sync_relay_connected == 4) |
| 147 | 147 | ||
| 148 | # Connection success rate over last hour | 148 | # Connection success rate over last hour |