diff options
| author | DanConwayDev <DanConwayDev@protonmail.com> | 2026-01-09 13:28:11 +0000 |
|---|---|---|
| committer | DanConwayDev <DanConwayDev@protonmail.com> | 2026-01-09 13:28:11 +0000 |
| commit | c34492069abacae67482af4c8356241958a524f7 (patch) | |
| tree | fd9b8ca3c26a96742bad4e9e359a20fc37c998aa /docs/explanation/grasp-02-proactive-sync.md | |
| parent | eb10e85f199266affd3bca0a3d4cd934f74f3e7f (diff) | |
feat(sync): add Syncing connection status to track historic sync progress
- Add ConnectionStatus::Syncing state between Connecting and Connected
- Track historic_sync_completed and historic_sync_completed_at in RelayState
- Auto-detect sync completion via check_and_complete_historic_sync()
- Update metrics: ngit_sync_relay_connected now shows 0-3 (disconnected/connecting/syncing/connected)
- Update Prometheus metric documentation with new status values
- Add state machine diagram showing Syncing transition
- Operators can now distinguish 'connected but catching up' vs 'fully synced'
Diffstat (limited to 'docs/explanation/grasp-02-proactive-sync.md')
| -rw-r--r-- | docs/explanation/grasp-02-proactive-sync.md | 54 |
1 files changed, 44 insertions, 10 deletions
diff --git a/docs/explanation/grasp-02-proactive-sync.md b/docs/explanation/grasp-02-proactive-sync.md index 461bde7..e1fb367 100644 --- a/docs/explanation/grasp-02-proactive-sync.md +++ b/docs/explanation/grasp-02-proactive-sync.md | |||
| @@ -75,7 +75,9 @@ pub enum ConnectionStatus { | |||
| 75 | Disconnected, | 75 | Disconnected, |
| 76 | /// Connection attempt in progress | 76 | /// Connection attempt in progress |
| 77 | Connecting, | 77 | Connecting, |
| 78 | /// Successfully connected and subscribed | 78 | /// Successfully connected, historic sync in progress |
| 79 | Syncing, | ||
| 80 | /// Successfully connected, historic sync completed | ||
| 79 | Connected, | 81 | Connected, |
| 80 | } | 82 | } |
| 81 | 83 | ||
| @@ -97,6 +99,11 @@ pub struct RelayState { | |||
| 97 | /// Whether announcement filter historic sync has completed for this relay | 99 | /// Whether announcement filter historic sync has completed for this relay |
| 98 | /// Used to determine if we can use `since` filter on reconnect for Layer 1 | 100 | /// Used to determine if we can use `since` filter on reconnect for Layer 1 |
| 99 | pub announcements_synced: bool, | 101 | pub announcements_synced: bool, |
| 102 | /// Whether initial historic sync has fully completed (all layers) | ||
| 103 | /// Used to transition from Syncing -> Connected status | ||
| 104 | pub historic_sync_completed: bool, | ||
| 105 | /// When historic sync completed (None if never completed or cleared on fresh_start) | ||
| 106 | pub historic_sync_completed_at: Option<Timestamp>, | ||
| 100 | } | 107 | } |
| 101 | 108 | ||
| 102 | impl RelayState { | 109 | impl RelayState { |
| @@ -198,25 +205,52 @@ When a relay doesn't support NIP-77 Negentropy, historic sync falls back to trad | |||
| 198 | stateDiagram-v2 | 205 | stateDiagram-v2 |
| 199 | [*] --> Disconnected: discover relay → register_relay() | 206 | [*] --> Disconnected: discover relay → register_relay() |
| 200 | Disconnected --> Connecting: retry_disconnected_relays → try_connect_relay | 207 | Disconnected --> Connecting: retry_disconnected_relays → try_connect_relay |
| 201 | Connecting --> Connected: success → handle_connect_or_reconnect | 208 | Connecting --> Syncing: success → handle_connect_or_reconnect |
| 202 | Connecting --> Disconnected: failure + record in health tracker | 209 | Connecting --> Disconnected: failure + record in health tracker |
| 210 | Syncing --> Connected: all historic batches complete → check_and_complete_historic_sync | ||
| 211 | Syncing --> Disconnected: connection lost → handle_disconnect | ||
| 203 | Connected --> Disconnected: connection lost → handle_disconnect | 212 | Connected --> Disconnected: connection lost → handle_disconnect |
| 204 | Connected --> [*]: intentional disconnect via check_disconnects | 213 | Connected --> [*]: intentional disconnect via check_disconnects |
| 205 | 214 | ||
| 206 | note right of Disconnected: disconnected_at set for 15min rule<br/>RelayConnection kept in HashMap | 215 | note right of Disconnected: disconnected_at set for 15min rule<br/>RelayConnection kept in HashMap |
| 207 | note right of Connected: last_connected tracked for since filter<br/>Event loop spawned here | ||
| 208 | note right of Connecting: connection attempt with timeout | 216 | note right of Connecting: connection attempt with timeout |
| 217 | note right of Syncing: historic sync in progress<br/>event loop spawned here | ||
| 218 | note right of Connected: historic sync complete<br/>last_connected tracked for since filter | ||
| 209 | ``` | 219 | ``` |
| 210 | 220 | ||
| 211 | ### Connection Flow Methods | 221 | ### Connection Flow Methods |
| 212 | 222 | ||
| 213 | | Method | Purpose | When Called | Actions | | 223 | | Method | Purpose | When Called | Actions | |
| 214 | | ------------------------------- | ------------------------- | --------------------------------- | --------------------------------------------------------------- | | 224 | | ----------------------------------- | ---------------------------- | --------------------------------- | --------------------------------------------------------------- | |
| 215 | | `register_relay()` | Initialize relay tracking | Discovery via RepoSyncIndex | Creates RelayConnection, stores in HashMap, returns immediately | | 225 | | `register_relay()` | Initialize relay tracking | Discovery via RepoSyncIndex | Creates RelayConnection, stores in HashMap, returns immediately | |
| 216 | | `try_connect_relay()` | Attempt connection | Health tracker allows retry | Calls connection.connect(), sends notification on success | | 226 | | `try_connect_relay()` | Attempt connection | Health tracker allows retry | Calls connection.connect(), sends notification on success | |
| 217 | | `handle_connect_or_reconnect()` | Setup after connection | ConnectNotification received | Spawns event loop, updates state, decides sync strategy | | 227 | | `handle_connect_or_reconnect()` | Setup after connection | ConnectNotification received | Spawns event loop, sets Syncing, decides sync strategy | |
| 218 | | `handle_disconnect()` | Cleanup after disconnect | DisconnectNotification received | Updates state, clears pending, KEEPS RelayConnection | | 228 | | `check_and_complete_historic_sync()` | Detect sync completion | After each batch confirmation | Transitions Syncing → Connected when no pending batches | |
| 219 | | `retry_disconnected_relays()` | Periodic reconnection | Every 2s (health & metrics timer) | For each ready relay: try_connect_relay() | | 229 | | `handle_disconnect()` | Cleanup after disconnect | DisconnectNotification received | Updates state, clears pending, KEEPS RelayConnection | |
| 230 | | `retry_disconnected_relays()` | Periodic reconnection | Every 2s (health & metrics timer) | For each ready relay: try_connect_relay() | | ||
| 231 | |||
| 232 | ### Historic Sync Completion | ||
| 233 | |||
| 234 | When a relay first connects, it enters the **Syncing** state and begins historic sync: | ||
| 235 | |||
| 236 | 1. **Layer 1 (Announcements)**: Generic filter for all repository announcements | ||
| 237 | 2. **Layer 2 (Repo Events)**: Filters for events tagging discovered repositories | ||
| 238 | 3. **Layer 3 (Root Events)**: Filters for events tagging discovered PRs/Issues/Patches | ||
| 239 | |||
| 240 | Each layer creates one or more `PendingBatch` entries tracked in `PendingSyncIndex`. As EOSE messages arrive: | ||
| 241 | |||
| 242 | - `handle_eose()` confirms each batch via `confirm_batch()` | ||
| 243 | - `confirm_batch()` moves items to confirmed state and calls `check_and_complete_historic_sync()` | ||
| 244 | - `check_and_complete_historic_sync()` checks if `PendingSyncIndex` is empty for this relay | ||
| 245 | - When empty: transitions `Syncing` → `Connected`, sets `historic_sync_completed = true` | ||
| 246 | |||
| 247 | **Metrics tracking**: The `ngit_sync_relay_connected` metric shows: | ||
| 248 | - `0` = Disconnected | ||
| 249 | - `1` = Connecting | ||
| 250 | - `2` = Syncing (historic sync in progress) | ||
| 251 | - `3` = Connected (historic sync complete, live sync active) | ||
| 252 | |||
| 253 | This allows operators to monitor sync progress and distinguish between "connected but still catching up" vs "fully synced and live". | ||
| 220 | 254 | ||
| 221 | ### Event Loop Lifecycle | 255 | ### Event Loop Lifecycle |
| 222 | 256 | ||