From 208ea60836cfc98857cf3359a73d8874ed5d935a Mon Sep 17 00:00:00 2001
From: DanConwayDev <DanConwayDev@protonmail.com>
Date: Fri, 9 Jan 2026 14:23:44 +0000
Subject: refactor(sync): rename ConnectedDegraded to
 ConnectedHistoricSyncFailures
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Resolves naming conflict with RelayHealthState::Degraded by using a more
explicit name that clearly indicates the connection status relates to
historic sync failures, not connection health degradation.

Changes:
- ConnectionStatus::ConnectedDegraded → ConnectedHistoricSyncFailures
- Updated all documentation and comments
- Updated Prometheus metric descriptions
- Metric value remains 4 for backward compatibility

This makes it clear that:
- ConnectedHistoricSyncFailures = connection lifecycle (missing historic data)
- RelayHealthState::Degraded = connection health (reliability issues)

These are orthogonal concerns - a relay can be ConnectedHistoricSyncFailures
but Healthy, or Connected but Degraded.
---
 docs/explanation/grasp-02-proactive-sync.md | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

(limited to 'docs/explanation/grasp-02-proactive-sync.md')
diff --git a/docs/explanation/grasp-02-proactive-sync.md b/docs/explanation/grasp-02-proactive-sync.md
index b17b8bf..e983316 100644
--- a/docs/explanation/grasp-02-proactive-sync.md
+++ b/docs/explanation/grasp-02-proactive-sync.md
@@ -79,8 +79,8 @@ pub enum ConnectionStatus {
     Syncing,
     /// Successfully connected, historic sync completed
     Connected,
-    /// Successfully connected, historic sync failed but live sync active
-    ConnectedDegraded,
+    /// Successfully connected, historic sync had failures but live sync active
+    ConnectedHistoricSyncFailures,
 }
 
 /// Complete state for a single relay - combines sync needs with connection lifecycle
@@ -210,18 +210,18 @@ stateDiagram-v2
     Connecting --> Syncing: success → handle_connect_or_reconnect
     Connecting --> Disconnected: failure + record in health tracker
     Syncing --> Connected: all batches succeed → check_and_complete_historic_sync
-    Syncing --> ConnectedDegraded: any batch failed → check_and_complete_historic_sync
+    Syncing --> ConnectedHistoricSyncFailures: any batch failed → check_and_complete_historic_sync
     Syncing --> Disconnected: connection lost → handle_disconnect
     Connected --> Disconnected: connection lost → handle_disconnect
-    ConnectedDegraded --> Disconnected: connection lost → handle_disconnect
+    ConnectedHistoricSyncFailures --> Disconnected: connection lost → handle_disconnect
     Connected --> [*]: intentional disconnect via check_disconnects
-    ConnectedDegraded --> [*]: intentional disconnect via check_disconnects
+    ConnectedHistoricSyncFailures --> [*]: intentional disconnect via check_disconnects
 
     note right of Disconnected: disconnected_at set for 15min rule<br/>RelayConnection kept in HashMap
     note right of Connecting: connection attempt with timeout
     note right of Syncing: historic sync in progress<br/>event loop spawned here
     note right of Connected: historic sync complete<br/>last_connected tracked for since filter
-    note right of ConnectedDegraded: historic sync failed (missing events)<br/>live sync active, partial data
+    note right of ConnectedHistoricSyncFailures: historic sync had failures (missing events)<br/>live sync active, partial data
 ```
 
 ### Connection Flow Methods
@@ -252,22 +252,22 @@ Each layer creates one or more `PendingBatch` entries tracked in `PendingSyncInd
   2. Wait 6 seconds (batch window + buffer) for self-subscriber to process in-flight events
   3. Second check: Are there still no pending batches? If yes, return early
   4. If no pending batches after wait:
-     - If any batch failed: transition `Syncing` → `ConnectedDegraded`
+     - If any batch failed: transition `Syncing` → `ConnectedHistoricSyncFailures`
      - If all batches succeeded: transition `Syncing` → `Connected`
      - Set `historic_sync_completed = true`
 
 **Why the double-check?** There's an async gap between receiving EOSE and the self-subscriber processing events to create Layer 2/3 filters. The 6-second wait (5s batch window + 1s buffer) ensures we don't prematurely mark sync complete while Layer 2/3 batches are being created.
 
-**Batch Failure Tracking**: When negentropy retry protection triggers (relay returns zero requested events on retry), the batch is marked as `failed = true`. This causes the relay to transition to `ConnectedDegraded` instead of `Connected`, signaling that live sync is active but historic sync is incomplete.
+**Batch Failure Tracking**: When negentropy retry protection triggers (relay returns zero requested events on retry), the batch is marked as `failed = true`. This causes the relay to transition to `ConnectedHistoricSyncFailures` instead of `Connected`, signaling that live sync is active but historic sync is incomplete.
 
 **Metrics tracking**: The `ngit_sync_relay_connected` metric shows:
 - `0` = Disconnected
 - `1` = Connecting
 - `2` = Syncing (historic sync in progress)
 - `3` = Connected (historic sync complete, live sync active)
-- `4` = ConnectedDegraded (historic sync failed, live sync active, partial data)
+- `4` = ConnectedHistoricSyncFailures (historic sync had failures, live sync active, partial data)
 
-This allows operators to monitor sync progress and distinguish between "connected but still catching up" vs "fully synced and live" vs "degraded (missing historic data)".
+This allows operators to monitor sync progress and distinguish between "connected but still catching up" vs "fully synced and live" vs "historic sync failures (missing historic data)".
 
 ### Event Loop Lifecycle
 
-- 
cgit v1.2.3