upleb.uk

Public git repos — served from a NIP-34 GRASP relay at git.upleb.uk

summaryrefslogtreecommitdiff
path: root/docs/explanation/grasp-02-proactive-sync.md
diff options
context:
space:
mode:
authorDanConwayDev <DanConwayDev@protonmail.com>2026-01-09 14:23:44 +0000
committerDanConwayDev <DanConwayDev@protonmail.com>2026-01-09 14:23:44 +0000
commit208ea60836cfc98857cf3359a73d8874ed5d935a (patch)
treee653189912f8e749170ef6645a85f9d6c907b3e6 /docs/explanation/grasp-02-proactive-sync.md
parent93a1684f068603b354ba3c05957a25459c73de05 (diff)
refactor(sync): rename ConnectedDegraded to ConnectedHistoricSyncFailures
Resolves naming conflict with RelayHealthState::Degraded by using a more explicit name that clearly indicates the connection status relates to historic sync failures, not connection health degradation. Changes: - ConnectionStatus::ConnectedDegraded → ConnectedHistoricSyncFailures - Updated all documentation and comments - Updated Prometheus metric descriptions - Metric value remains 4 for backward compatibility This makes it clear that: - ConnectedHistoricSyncFailures = connection lifecycle (missing historic data) - RelayHealthState::Degraded = connection health (reliability issues) These are orthogonal concerns - a relay can be ConnectedHistoricSyncFailures but Healthy, or Connected but Degraded.
Diffstat (limited to 'docs/explanation/grasp-02-proactive-sync.md')
-rw-r--r--docs/explanation/grasp-02-proactive-sync.md20
1 files changed, 10 insertions, 10 deletions
diff --git a/docs/explanation/grasp-02-proactive-sync.md b/docs/explanation/grasp-02-proactive-sync.md
index b17b8bf..e983316 100644
--- a/docs/explanation/grasp-02-proactive-sync.md
+++ b/docs/explanation/grasp-02-proactive-sync.md
@@ -79,8 +79,8 @@ pub enum ConnectionStatus {
79 Syncing, 79 Syncing,
80 /// Successfully connected, historic sync completed 80 /// Successfully connected, historic sync completed
81 Connected, 81 Connected,
82 /// Successfully connected, historic sync failed but live sync active 82 /// Successfully connected, historic sync had failures but live sync active
83 ConnectedDegraded, 83 ConnectedHistoricSyncFailures,
84} 84}
85 85
86/// Complete state for a single relay - combines sync needs with connection lifecycle 86/// Complete state for a single relay - combines sync needs with connection lifecycle
@@ -210,18 +210,18 @@ stateDiagram-v2
210 Connecting --> Syncing: success → handle_connect_or_reconnect 210 Connecting --> Syncing: success → handle_connect_or_reconnect
211 Connecting --> Disconnected: failure + record in health tracker 211 Connecting --> Disconnected: failure + record in health tracker
212 Syncing --> Connected: all batches succeed → check_and_complete_historic_sync 212 Syncing --> Connected: all batches succeed → check_and_complete_historic_sync
213 Syncing --> ConnectedDegraded: any batch failed → check_and_complete_historic_sync 213 Syncing --> ConnectedHistoricSyncFailures: any batch failed → check_and_complete_historic_sync
214 Syncing --> Disconnected: connection lost → handle_disconnect 214 Syncing --> Disconnected: connection lost → handle_disconnect
215 Connected --> Disconnected: connection lost → handle_disconnect 215 Connected --> Disconnected: connection lost → handle_disconnect
216 ConnectedDegraded --> Disconnected: connection lost → handle_disconnect 216 ConnectedHistoricSyncFailures --> Disconnected: connection lost → handle_disconnect
217 Connected --> [*]: intentional disconnect via check_disconnects 217 Connected --> [*]: intentional disconnect via check_disconnects
218 ConnectedDegraded --> [*]: intentional disconnect via check_disconnects 218 ConnectedHistoricSyncFailures --> [*]: intentional disconnect via check_disconnects
219 219
220 note right of Disconnected: disconnected_at set for 15min rule<br/>RelayConnection kept in HashMap 220 note right of Disconnected: disconnected_at set for 15min rule<br/>RelayConnection kept in HashMap
221 note right of Connecting: connection attempt with timeout 221 note right of Connecting: connection attempt with timeout
222 note right of Syncing: historic sync in progress<br/>event loop spawned here 222 note right of Syncing: historic sync in progress<br/>event loop spawned here
223 note right of Connected: historic sync complete<br/>last_connected tracked for since filter 223 note right of Connected: historic sync complete<br/>last_connected tracked for since filter
224 note right of ConnectedDegraded: historic sync failed (missing events)<br/>live sync active, partial data 224 note right of ConnectedHistoricSyncFailures: historic sync had failures (missing events)<br/>live sync active, partial data
225``` 225```
226 226
227### Connection Flow Methods 227### Connection Flow Methods
@@ -252,22 +252,22 @@ Each layer creates one or more `PendingBatch` entries tracked in `PendingSyncInd
252 2. Wait 6 seconds (batch window + buffer) for self-subscriber to process in-flight events 252 2. Wait 6 seconds (batch window + buffer) for self-subscriber to process in-flight events
253 3. Second check: Are there still no pending batches? If yes, return early 253 3. Second check: Are there still no pending batches? If yes, return early
254 4. If no pending batches after wait: 254 4. If no pending batches after wait:
255 - If any batch failed: transition `Syncing` → `ConnectedDegraded` 255 - If any batch failed: transition `Syncing` → `ConnectedHistoricSyncFailures`
256 - If all batches succeeded: transition `Syncing` → `Connected` 256 - If all batches succeeded: transition `Syncing` → `Connected`
257 - Set `historic_sync_completed = true` 257 - Set `historic_sync_completed = true`
258 258
259**Why the double-check?** There's an async gap between receiving EOSE and the self-subscriber processing events to create Layer 2/3 filters. The 6-second wait (5s batch window + 1s buffer) ensures we don't prematurely mark sync complete while Layer 2/3 batches are being created. 259**Why the double-check?** There's an async gap between receiving EOSE and the self-subscriber processing events to create Layer 2/3 filters. The 6-second wait (5s batch window + 1s buffer) ensures we don't prematurely mark sync complete while Layer 2/3 batches are being created.
260 260
261**Batch Failure Tracking**: When negentropy retry protection triggers (relay returns zero requested events on retry), the batch is marked as `failed = true`. This causes the relay to transition to `ConnectedDegraded` instead of `Connected`, signaling that live sync is active but historic sync is incomplete. 261**Batch Failure Tracking**: When negentropy retry protection triggers (relay returns zero requested events on retry), the batch is marked as `failed = true`. This causes the relay to transition to `ConnectedHistoricSyncFailures` instead of `Connected`, signaling that live sync is active but historic sync is incomplete.
262 262
263**Metrics tracking**: The `ngit_sync_relay_connected` metric shows: 263**Metrics tracking**: The `ngit_sync_relay_connected` metric shows:
264- `0` = Disconnected 264- `0` = Disconnected
265- `1` = Connecting 265- `1` = Connecting
266- `2` = Syncing (historic sync in progress) 266- `2` = Syncing (historic sync in progress)
267- `3` = Connected (historic sync complete, live sync active) 267- `3` = Connected (historic sync complete, live sync active)
268- `4` = ConnectedDegraded (historic sync failed, live sync active, partial data) 268- `4` = ConnectedHistoricSyncFailures (historic sync had failures, live sync active, partial data)
269 269
270This allows operators to monitor sync progress and distinguish between "connected but still catching up" vs "fully synced and live" vs "degraded (missing historic data)". 270This allows operators to monitor sync progress and distinguish between "connected but still catching up" vs "fully synced and live" vs "historic sync failures (missing historic data)".
271 271
272### Event Loop Lifecycle 272### Event Loop Lifecycle
273 273