diff options
| author | DanConwayDev <DanConwayDev@protonmail.com> | 2025-12-08 20:31:18 +0000 |
|---|---|---|
| committer | DanConwayDev <DanConwayDev@protonmail.com> | 2025-12-08 20:31:18 +0000 |
| commit | 103ede51485601892af1df6dab9f96f232b10f49 (patch) | |
| tree | f4c7f7d4f5afc235d54dfe4437310c3c96cc97c8 /docs/explanation/grasp-02-proactive-sync-v2.md | |
| parent | 20388e4e76864195860dfab81a5b80725184e7c9 (diff) | |
redsign sync - architecture tweaks
Diffstat (limited to 'docs/explanation/grasp-02-proactive-sync-v2.md')
| -rw-r--r-- | docs/explanation/grasp-02-proactive-sync-v2.md | 72 |
1 files changed, 69 insertions, 3 deletions
diff --git a/docs/explanation/grasp-02-proactive-sync-v2.md b/docs/explanation/grasp-02-proactive-sync-v2.md index faa9255..d1bbe14 100644 --- a/docs/explanation/grasp-02-proactive-sync-v2.md +++ b/docs/explanation/grasp-02-proactive-sync-v2.md | |||
| @@ -11,6 +11,34 @@ This document presents a simplified redesign of the proactive sync module. The k | |||
| 11 | 3. **Efficiency**: Minimize connections and bandwidth through filter consolidation | 11 | 3. **Efficiency**: Minimize connections and bandwidth through filter consolidation |
| 12 | 4. **Consistency**: Use unified filters for both live sync and catchup | 12 | 4. **Consistency**: Use unified filters for both live sync and catchup |
| 13 | 13 | ||
| 14 | ## Scale Targets & Upper Bounds | ||
| 15 | |||
| 16 | This design targets the following scale: | ||
| 17 | |||
| 18 | | Metric | Target | Notes | | ||
| 19 | |--------|--------|-------| | ||
| 20 | | **Repositories** | 1,000 | Repos we host/track | | ||
| 21 | | **Root events per repo** | 50 (avg) | PRs, Issues, Patches per repo | | ||
| 22 | | **Total relays in ecosystem** | 100 | Unique relays across all repos | | ||
| 23 | | **Relays per repo** | 5 (avg) | Relays listed in each repo's announcement | | ||
| 24 | | **Total root events** | ~50,000 | 1,000 repos × 50 events | | ||
| 25 | | **Sync connections** | ~50-100 | Based on relay overlap | | ||
| 26 | |||
| 27 | **Memory Estimate (in-memory HashMaps):** | ||
| 28 | |||
| 29 | - `FollowingRepoRootEvents`: ~1,000 entries × 50 EventIds = ~3-5 MB | ||
| 30 | - `SyncRelays`: ~100 entries × varying repo counts = ~2-3 MB | ||
| 31 | - **Total in-memory state**: ~10 MB (well within acceptable limits) | ||
| 32 | |||
| 33 | **Upper Bounds (redesign triggers):** | ||
| 34 | |||
| 35 | - **10,000+ repos**: Consider database-backed state instead of in-memory HashMaps | ||
| 36 | - **500+ sync relays**: Consider connection pooling or relay prioritization | ||
| 37 | - **500+ root events per repo**: Consider per-repo pagination in Layer 3 filters | ||
| 38 | - **Sustained >100 events/second**: Consider write batching to database | ||
| 39 | |||
| 40 | Beyond these limits, the in-memory HashMap model may need to evolve to a database-backed approach with lazy loading. | ||
| 41 | |||
| 14 | ## Core Data Structures | 42 | ## Core Data Structures |
| 15 | 43 | ||
| 16 | The entire sync filter state is captured in two HashMaps, initialized from database queries at startup: | 44 | The entire sync filter state is captured in two HashMaps, initialized from database queries at startup: |
| @@ -216,6 +244,8 @@ A single self-subscriber watches for new events from **our own relay** and updat | |||
| 216 | 244 | ||
| 217 | The batch timer **starts only when the first event arrives**, not on a fixed interval. This prevents the scenario where an event arriving at second 4 of a 5-second interval only gets 1 second before the batch fires. | 245 | The batch timer **starts only when the first event arrives**, not on a fixed interval. This prevents the scenario where an event arriving at second 4 of a 5-second interval only gets 1 second before the batch fires. |
| 218 | 246 | ||
| 247 | **Important:** Once the batch timer starts, it does NOT reset when additional events arrive. The batch will fire exactly 5 seconds after the first event, regardless of how many subsequent events are queued. This ensures predictable latency and prevents indefinite batching during high-activity periods. | ||
| 248 | |||
| 219 | ```rust | 249 | ```rust |
| 220 | impl SelfSubscriber { | 250 | impl SelfSubscriber { |
| 221 | async fn run(&self) { | 251 | async fn run(&self) { |
| @@ -326,6 +356,9 @@ impl SelfSubscriber { | |||
| 326 | let current_filter_count = self.get_filter_count_for_relay(relay_url).await; | 356 | let current_filter_count = self.get_filter_count_for_relay(relay_url).await; |
| 327 | let new_filter_count = current_filter_count + incremental_filters.len(); | 357 | let new_filter_count = current_filter_count + incremental_filters.len(); |
| 328 | 358 | ||
| 359 | // Note: 70 is a conservative threshold that may need tuning based on | ||
| 360 | // production observations. It was chosen to trigger consolidation earlier | ||
| 361 | // than v1's 150, but optimal value depends on relay behavior. | ||
| 329 | if new_filter_count > 70 { | 362 | if new_filter_count > 70 { |
| 330 | // Consolidate: add incremental filters first (no since), wait for EOSE, | 363 | // Consolidate: add incremental filters first (no since), wait for EOSE, |
| 331 | // then close all and resubscribe with consolidated filters (with since) | 364 | // then close all and resubscribe with consolidated filters (with since) |
| @@ -425,6 +458,37 @@ async fn consolidate_relay_subscription( | |||
| 425 | } | 458 | } |
| 426 | ``` | 459 | ``` |
| 427 | 460 | ||
| 461 | ## Daily Full Catchup | ||
| 462 | |||
| 463 | To capture events that may have taken longer than 15 minutes to propagate through the nostr network, we perform a daily full catchup: | ||
| 464 | |||
| 465 | ```rust | ||
| 466 | impl SyncManager { | ||
| 467 | /// Runs approximately every 24 hours per relay connection | ||
| 468 | async fn daily_catchup(&mut self, relay_url: &str) { | ||
| 469 | // Close all current subscriptions for this relay | ||
| 470 | self.close_all_subscriptions(relay_url).await; | ||
| 471 | |||
| 472 | // Rebuild fresh filters from current HashMap state | ||
| 473 | let filters = self.build_three_layer_filters_for_relay(relay_url).await; | ||
| 474 | |||
| 475 | // Subscribe WITHOUT since filter to get full historical sync | ||
| 476 | for filter in filters { | ||
| 477 | self.subscribe_to_relay(relay_url, filter).await; | ||
| 478 | } | ||
| 479 | |||
| 480 | // After EOSE, switch back to live mode with since filter | ||
| 481 | self.wait_for_eose(relay_url).await; | ||
| 482 | |||
| 483 | // Re-add since filter for ongoing live sync | ||
| 484 | let since = Timestamp::now() - 900; // 15 minutes ago | ||
| 485 | self.resubscribe_with_since(relay_url, since).await; | ||
| 486 | } | ||
| 487 | } | ||
| 488 | ``` | ||
| 489 | |||
| 490 | **Rationale:** The 15-minute reconnection window is standard for nostr event propagation, but some events may take longer. Rather than increasing the window (which would cause more duplicate processing), we do a daily full catchup to ensure nothing is missed. This adds minimal complexity while providing comprehensive coverage. | ||
| 491 | |||
| 428 | ## Sync Relay Connections | 492 | ## Sync Relay Connections |
| 429 | 493 | ||
| 430 | Each sync relay connection uses the three-layer filter strategy: | 494 | Each sync relay connection uses the three-layer filter strategy: |
| @@ -628,16 +692,18 @@ src/sync/ | |||
| 628 | 692 | ||
| 629 | 1. **Single Source of Truth**: Two HashMaps represent all sync state, initialized from database | 693 | 1. **Single Source of Truth**: Two HashMaps represent all sync state, initialized from database |
| 630 | 2. **Event-Driven Updates**: Self-subscriber updates HashMaps; relay connections read from them | 694 | 2. **Event-Driven Updates**: Self-subscriber updates HashMaps; relay connections read from them |
| 631 | 3. **Batched Filter Updates**: 5-second window that starts on first event (not fixed interval) | 695 | 3. **Batched Filter Updates**: 5-second window that starts on first event (timer does NOT reset on subsequent events) |
| 632 | 4. **Uniform Reconnection**: Always use `since = last_successful - 15min` | 696 | 4. **Uniform Reconnection**: Always use `since = last_successful - 15min` |
| 633 | 5. **No Jitter**: Trade-offs not worth it - orphan filters and inefficiency outweigh thundering herd concerns | 697 | 5. **No Jitter**: Trade-offs not worth it - orphan filters and inefficiency outweigh thundering herd concerns |
| 634 | 6. **Bootstrap Relay Protected**: Never removed from sync_relays; subscribes with Layer 1 even if empty | 698 | 6. **Bootstrap Relay Protected**: Never removed from sync_relays - ensures at least one sync connection exists even when no repositories currently list our service (cold start / recovery scenario) |
| 635 | 7. **30618 Remote-Only**: Maintainer state synced from remote relays, not self-subscribed | 699 | 7. **30618 Remote-Only**: Maintainer state synced from remote relays, not self-subscribed |
| 636 | 8. **70 Filter Consolidation Threshold**: Lower than v1's 150 for earlier consolidation | 700 | 8. **70 Filter Consolidation Threshold**: Lower than v1's 150 for earlier consolidation (conservative value that may need tuning based on production observation) |
| 637 | 9. **100-Tag Batching**: Consistent batch size for Layer 2 and Layer 3 filters | 701 | 9. **100-Tag Batching**: Consistent batch size for Layer 2 and Layer 3 filters |
| 638 | 10. **Layer 2 All Kinds**: Subscribe to ALL events with 'a' tags, not just NIP-34 kinds | 702 | 10. **Layer 2 All Kinds**: Subscribe to ALL events with 'a' tags, not just NIP-34 kinds |
| 639 | 11. **Two-Phase Consolidation**: Incremental filters WITHOUT since first, then consolidated WITH since | 703 | 11. **Two-Phase Consolidation**: Incremental filters WITHOUT since first, then consolidated WITH since |
| 640 | 12. **Multiple Repo Refs**: Handle events that tag multiple repos correctly | 704 | 12. **Multiple Repo Refs**: Handle events that tag multiple repos correctly |
| 705 | 13. **Daily Full Catchup**: Periodic sync restart without `since` filter (~24h) to catch slow-propagating events | ||
| 706 | 14. **Dual DB Queries at Startup**: Separate queries for root events and announcements. Could be combined into a single query, but some relays cache by kind which may make separate queries more efficient. Trade-off deferred for future optimization. | ||
| 641 | 707 | ||
| 642 | --- | 708 | --- |
| 643 | 709 | ||