diff options
| author | DanConwayDev <DanConwayDev@protonmail.com> | 2026-01-08 00:26:51 +0000 |
|---|---|---|
| committer | DanConwayDev <DanConwayDev@protonmail.com> | 2026-01-08 00:26:51 +0000 |
| commit | 543d9e66dd44b70ed467c61635e6c8056fef8555 (patch) | |
| tree | 99783725680e3f1d4c88699777746bc3ea9fa806 /docs/explanation/grasp-02-proactive-sync-purgatory-git-data.md | |
| parent | c67ebe6f33bfa191f17eb0df24d3ee18092c74e1 (diff) | |
docs: update docs with sync and purgatory and git data sync
Diffstat (limited to 'docs/explanation/grasp-02-proactive-sync-purgatory-git-data.md')
| -rw-r--r-- | docs/explanation/grasp-02-proactive-sync-purgatory-git-data.md | 675 |
1 files changed, 675 insertions, 0 deletions
diff --git a/docs/explanation/grasp-02-proactive-sync-purgatory-git-data.md b/docs/explanation/grasp-02-proactive-sync-purgatory-git-data.md new file mode 100644 index 0000000..31c3e46 --- /dev/null +++ b/docs/explanation/grasp-02-proactive-sync-purgatory-git-data.md | |||
| @@ -0,0 +1,675 @@ | |||
| 1 | # GRASP-02 Proactive Sync: Purgatory Git Data Fetching | ||
| 2 | |||
| 3 | **Status**: ✅ Implemented | ||
| 4 | **Implementation**: [`src/purgatory/sync/`](../../src/purgatory/sync/) | ||
| 5 | **Related**: | ||
| 6 | |||
| 7 | - [Purgatory Design](purgatory-design.md) - Core purgatory concepts | ||
| 8 | - [GRASP-02 Proactive Sync](grasp-02-proactive-sync.md) - Full GRASP-02 implementation | ||
| 9 | - [Unified Git Data Sync](unify-git-data-sync.md) - Shared processing logic | ||
| 10 | |||
| 11 | --- | ||
| 12 | |||
| 13 | ## Overview | ||
| 14 | |||
| 15 | When Nostr events arrive before their git data, they enter **purgatory** waiting to be served. But they don't wait passively—ngit-grasp **actively hunts** for the missing git data across all git servers assoicated with the repo until it finds what it needs. | ||
| 16 | |||
| 17 | ### How It Works | ||
| 18 | |||
| 19 | **If the data exists, we'll find it.** | ||
| 20 | |||
| 21 | The system scours git servers listed in repository announcements and PR events, checking every **2 minutes** for **30 minutes**. If we find the data, events are released immediately. If not, they expire from purgatory after 30 minutes. | ||
| 22 | |||
| 23 | **Smart timing based on how events arrive:** | ||
| 24 | |||
| 25 | - **User-submitted events**: Wait **3 minutes** before hunting—we expect a `git push` to follow shortly | ||
| 26 | - **Sync-received events**: Start hunting after just **500ms**—batch burst arrivals, then get to work | ||
| 27 | |||
| 28 | **Playing nicely with other servers:** | ||
| 29 | |||
| 30 | We respect remote server capacity with: | ||
| 31 | |||
| 32 | - **Throttling**: Max 5 concurrent requests per domain, 30 requests/minute | ||
| 33 | - **Backoff**: Start at 20 seconds, double each attempt, cap at 2 minutes | ||
| 34 | - **Round-robin**: Fair distribution across repositories waiting for the same domain | ||
| 35 | - **Fresh start**: New events reset retry count—recent updates often mean fresh data | ||
| 36 | |||
| 37 | **The result**: If git data is available anywhere in the clone URL list, we'll find it within minutes. If it's not available within 30 minutes, the events expire cleanly. | ||
| 38 | |||
| 39 | ### Key Features | ||
| 40 | |||
| 41 | ✅ **Proactive hunting** - Scours git servers every 2 min (backoff), finds data automatically | ||
| 42 | ✅ **Respectful throttling** - 5 concurrent + 30/min per domain, plays nice with other implementations | ||
| 43 | ✅ **Smart timing** - 3min delay for user pushes, 500ms for synced events | ||
| 44 | ✅ **30min expiry** - Auto-cleanup of events when data never arrives | ||
| 45 | ✅ **Fully testable** - Mock-based architecture for reliable unit tests | ||
| 46 | |||
| 47 | --- | ||
| 48 | |||
| 49 | ## The Problem: Out-of-Order Arrival | ||
| 50 | |||
| 51 | In a distributed system, git data and Nostr events can arrive in any order: | ||
| 52 | |||
| 53 | ``` | ||
| 54 | Timeline A: Event arrives first (user push expected) | ||
| 55 | t=0s: State event received → enters purgatory | ||
| 56 | t=180s: (3min wait - expecting git push) | ||
| 57 | t=30s: Git push arrives → event released ✅ | ||
| 58 | |||
| 59 | Timeline B: Git arrives first | ||
| 60 | t=0s: Git push received → data available | ||
| 61 | t=30s: State event received → immediately served ✅ | ||
| 62 | |||
| 63 | Timeline C: Sync scenario (hunt for data) | ||
| 64 | t=0s: State event received from relay X → enters purgatory | ||
| 65 | t=0.5s: (500ms delay to batch bursts) | ||
| 66 | t=0.5s: Start hunting git servers → check server1, server2, server3... | ||
| 67 | t=45s: Git data found on server2 → event released ✅ | ||
| 68 | |||
| 69 | Timeline D: Data never arrives | ||
| 70 | t=0s: State event received → enters purgatory | ||
| 71 | t=0.5s: Start hunting → server1 (not found), server2 (timeout), server3 (not found) | ||
| 72 | t=20s: Retry → server1 (not found), server2 (not found), server3 (not found) | ||
| 73 | t=60s: Retry → all servers checked, no data | ||
| 74 | ... | ||
| 75 | t=1800s: 30 minutes expired → event discarded, purgatory cleaned up 🗑️ | ||
| 76 | ``` | ||
| 77 | |||
| 78 | **Without proactive sync**: Events in Timeline C would wait indefinitely (or until manual git push). | ||
| 79 | **With proactive sync**: System automatically hunts for data across all known servers, releasing events as soon as the data is found. | ||
| 80 | |||
| 81 | --- | ||
| 82 | |||
| 83 | ## Architecture: Two-Path Sync Design | ||
| 84 | |||
| 85 | The system uses **two independent execution paths** that work together: | ||
| 86 | |||
| 87 | ### Path 1: Main Sync Loop (Non-Throttled URLs) | ||
| 88 | |||
| 89 | Runs every **1 second**, processes identifiers ready for sync: | ||
| 90 | |||
| 91 | 1. Find ready identifiers (where `!in_progress && next_attempt <= now`) | ||
| 92 | 2. Spawn parallel tasks for each identifier | ||
| 93 | 3. Each task tries non-throttled URLs until: | ||
| 94 | - ✅ All OIDs fetched (complete) → remove from queue | ||
| 95 | - ⏸️ Only throttled URLs remain → enqueue with throttled domains, apply backoff | ||
| 96 | - ❌ No URLs left (all tried/throttled) → apply backoff, retry later | ||
| 97 | |||
| 98 | **Key insight**: Main loop doesn't wait for throttled domains. It quickly tries available servers, then hands off to domain queues for rate-limited processing. | ||
| 99 | |||
| 100 | ### Path 2: Domain Throttle Queues (Throttled URLs) | ||
| 101 | |||
| 102 | **Trigger-based** (no polling), processes when capacity frees: | ||
| 103 | |||
| 104 | 1. Identifier enqueued with throttled domain (from main loop) | ||
| 105 | 2. When domain has capacity (slot frees or rate limit window passes): | ||
| 106 | - Pick next identifier (round-robin for fairness) | ||
| 107 | - Try one URL from that domain | ||
| 108 | - Mark URL as tried, release slot | ||
| 109 | 3. Trigger repeats until queue empty or capacity exhausted | ||
| 110 | |||
| 111 | **Key insight**: Each domain independently manages its queue, ensuring we respect rate limits while maximizing throughput. | ||
| 112 | |||
| 113 | --- | ||
| 114 | |||
| 115 | ## Data Flow: From Event to Release | ||
| 116 | |||
| 117 | ```mermaid | ||
| 118 | graph TB | ||
| 119 | A[Event Arrives] --> B{Git Data<br/>Available?} | ||
| 120 | B -->|Yes| C[Serve Immediately] | ||
| 121 | B -->|No| D[Enter Purgatory] | ||
| 122 | |||
| 123 | D --> E[Enqueue for Sync] | ||
| 124 | E --> F{Event Source?} | ||
| 125 | F -->|User Submit| G[3min Delay<br/>expect push] | ||
| 126 | F -->|Relay Sync| H[500ms Delay<br/>batch burst] | ||
| 127 | |||
| 128 | G --> I[Main Sync Loop<br/>1s interval] | ||
| 129 | H --> I | ||
| 130 | |||
| 131 | I --> J{Ready?} | ||
| 132 | J -->|Not Yet| I | ||
| 133 | J -->|Yes| K[Spawn Sync Task] | ||
| 134 | |||
| 135 | K --> L[Try Non-Throttled URLs] | ||
| 136 | L --> M{Got All OIDs?} | ||
| 137 | M -->|Yes| N[Process & Release] | ||
| 138 | M -->|Partial| O[Enqueue Throttled Domains] | ||
| 139 | M -->|None| P[Apply Backoff] | ||
| 140 | |||
| 141 | O --> Q[Domain Queue] | ||
| 142 | Q --> R{Has Capacity?} | ||
| 143 | R -->|No| Q | ||
| 144 | R -->|Yes| S[Try Domain URL] | ||
| 145 | S --> T{Got OIDs?} | ||
| 146 | T -->|Yes| N | ||
| 147 | T -->|No| U[Try Next in Queue] | ||
| 148 | |||
| 149 | P --> I | ||
| 150 | N --> V[Event Served] | ||
| 151 | |||
| 152 | style D fill:#fff3cd | ||
| 153 | style N fill:#d4edda | ||
| 154 | style V fill:#d1ecf1 | ||
| 155 | ``` | ||
| 156 | |||
| 157 | --- | ||
| 158 | |||
| 159 | ## Retry Strategy: Exponential Backoff with Fresh Start | ||
| 160 | |||
| 161 | ### Backoff Schedule | ||
| 162 | |||
| 163 | When sync attempts don't complete (OIDs still needed), backoff increases: | ||
| 164 | |||
| 165 | | Attempt | Delay | Formula | | ||
| 166 | | ------- | ------------- | ---------------------- | | ||
| 167 | | 1 | 20s | `20s * 2^0` | | ||
| 168 | | 2 | 40s | `20s * 2^1` | | ||
| 169 | | 3 | 80s | `20s * 2^2` | | ||
| 170 | | 4+ | 120s (capped) | `min(20s * 2^n, 120s)` | | ||
| 171 | |||
| 172 | **Implementation**: [`src/purgatory/sync/queue.rs:SyncQueueEntry::backoff()`](../../src/purgatory/sync/queue.rs) | ||
| 173 | |||
| 174 | ### Fresh Start on New Events | ||
| 175 | |||
| 176 | **Critical feature**: When a new event arrives for an identifier already in the sync queue, the `attempt_count` resets to 0. | ||
| 177 | |||
| 178 | **Why?** New events often mean: | ||
| 179 | |||
| 180 | - A maintainer just updated the repository | ||
| 181 | - Fresh git data might be available at new clone URLs | ||
| 182 | - Previous failures might have been temporary | ||
| 183 | |||
| 184 | **Example**: | ||
| 185 | |||
| 186 | ``` | ||
| 187 | t=0s: State A arrives → queue with 3min delay, attempt_count=0 | ||
| 188 | t=180s: First sync attempt fails → backoff 20s, attempt_count=1 | ||
| 189 | t=200s: Second attempt fails → backoff 40s, attempt_count=2 | ||
| 190 | t=210s: State B arrives (same identifier) → attempt_count=0 ✨ | ||
| 191 | t=210s: Immediate retry (new event delay) → success! | ||
| 192 | ``` | ||
| 193 | |||
| 194 | --- | ||
| 195 | |||
| 196 | ## Debounced Delays: Smart Timing | ||
| 197 | |||
| 198 | ### User-Submitted Events: 3 Minutes | ||
| 199 | |||
| 200 | When a user submits an event via `EVENT` message, we expect a `git push` to follow shortly: | ||
| 201 | |||
| 202 | ``` | ||
| 203 | t=0s: User submits state event → purgatory + 3min delay | ||
| 204 | t=30s: User runs `git push` → data arrives → event released ✅ | ||
| 205 | ``` | ||
| 206 | |||
| 207 | **Why 3 minutes?** Gives users time to: | ||
| 208 | |||
| 209 | - Finish composing their commit message | ||
| 210 | - Run `git push` command | ||
| 211 | - Handle network delays | ||
| 212 | |||
| 213 | **Configuration**: Hardcoded in [`src/purgatory/mod.rs:DEFAULT_SYNC_DELAY`](../../src/purgatory/mod.rs) | ||
| 214 | |||
| 215 | ### Sync-Triggered Events: 500ms | ||
| 216 | |||
| 217 | When events arrive during relay sync (e.g., negentropy catchup), they often come in bursts: | ||
| 218 | |||
| 219 | ``` | ||
| 220 | t=0s: State A arrives → purgatory + 500ms delay | ||
| 221 | t=0.1s: State B arrives → purgatory + 500ms delay (same repo) | ||
| 222 | t=0.2s: State C arrives → purgatory + 500ms delay (same repo) | ||
| 223 | t=0.5s: Single sync attempt fetches data for all three ✅ | ||
| 224 | ``` | ||
| 225 | |||
| 226 | **Why 500ms?** Batches burst arrivals without excessive delay. | ||
| 227 | |||
| 228 | **Configuration**: Hardcoded in [`src/purgatory/mod.rs:IMMEDIATE_SYNC_DELAY`](../../src/purgatory/mod.rs) | ||
| 229 | |||
| 230 | ### Debouncing Mechanism | ||
| 231 | |||
| 232 | Multiple events for the same identifier **don't create multiple sync tasks**. The `enqueue_sync` method: | ||
| 233 | |||
| 234 | 1. If identifier not in queue → create new entry with delay | ||
| 235 | 2. If identifier already queued → reset `attempt_count`, update `next_attempt` if sooner | ||
| 236 | |||
| 237 | **Result**: Rapid event arrivals → single sync attempt after debounce window. | ||
| 238 | |||
| 239 | **Implementation**: [`src/purgatory/mod.rs:Purgatory::enqueue_sync()`](../../src/purgatory/mod.rs) | ||
| 240 | |||
| 241 | --- | ||
| 242 | |||
| 243 | ## Domain Throttling: Respectful Rate Limiting | ||
| 244 | |||
| 245 | ### Why Throttle? | ||
| 246 | |||
| 247 | Git servers have finite resources. Without throttling: | ||
| 248 | |||
| 249 | - ❌ We could overwhelm small servers with concurrent requests | ||
| 250 | - ❌ Servers might rate-limit or ban us | ||
| 251 | - ❌ Other clients sharing the server suffer degraded performance | ||
| 252 | |||
| 253 | With throttling: | ||
| 254 | |||
| 255 | - ✅ Respect server capacity (5 concurrent max per domain) | ||
| 256 | - ✅ Stay under rate limits (30 requests/min per domain) | ||
| 257 | - ✅ Fair access for all clients | ||
| 258 | |||
| 259 | ### Two-Level Limits | ||
| 260 | |||
| 261 | Each domain has **two independent limits**: | ||
| 262 | |||
| 263 | #### 1. Concurrent Request Limit (Default: 5) | ||
| 264 | |||
| 265 | Maximum in-flight requests to a domain at any moment. | ||
| 266 | |||
| 267 | **Example**: | ||
| 268 | |||
| 269 | ``` | ||
| 270 | Domain: github.com | ||
| 271 | In-flight: [fetch-1, fetch-2, fetch-3, fetch-4, fetch-5] | ||
| 272 | Status: AT CAPACITY (throttled) | ||
| 273 | |||
| 274 | fetch-3 completes → in-flight: 4 | ||
| 275 | Status: HAS CAPACITY (process next queued identifier) | ||
| 276 | ``` | ||
| 277 | |||
| 278 | #### 2. Rate Limit (Default: 30/min) | ||
| 279 | |||
| 280 | Maximum requests in any 60-second sliding window. | ||
| 281 | |||
| 282 | **Example**: | ||
| 283 | |||
| 284 | ``` | ||
| 285 | t=0s: Request 1 → request_times: [0s] | ||
| 286 | t=1s: Request 2 → request_times: [0s, 1s] | ||
| 287 | ... | ||
| 288 | t=30s: Request 30 → request_times: [0s, 1s, ..., 30s] | ||
| 289 | t=31s: Request 31? → THROTTLED (30 requests in last 60s) | ||
| 290 | t=61s: Request at t=0s aged out → request_times: [1s, ..., 30s] | ||
| 291 | t=61s: Request 31 → ALLOWED (only 29 in last 60s) | ||
| 292 | ``` | ||
| 293 | |||
| 294 | **Implementation**: [`src/purgatory/sync/throttle.rs:DomainThrottle::has_capacity()`](../../src/purgatory/sync/throttle.rs) | ||
| 295 | |||
| 296 | ### Round-Robin Fairness | ||
| 297 | |||
| 298 | When multiple identifiers are queued for a throttled domain, we use **round-robin** to ensure fairness: | ||
| 299 | |||
| 300 | ``` | ||
| 301 | Queue: [repo-A, repo-B, repo-C] | ||
| 302 | Round-robin index: 0 | ||
| 303 | |||
| 304 | Attempt 1: Try repo-A (index=0) → fetch → index=1 | ||
| 305 | Attempt 2: Try repo-B (index=1) → fetch → index=2 | ||
| 306 | Attempt 3: Try repo-C (index=2) → fetch → index=0 | ||
| 307 | Attempt 4: Try repo-A (index=0) → ... | ||
| 308 | ``` | ||
| 309 | |||
| 310 | **Why round-robin?** Prevents head-of-line blocking. Without it, repo-A might consume all slots while repo-B and repo-C wait indefinitely. | ||
| 311 | |||
| 312 | **Implementation**: [`src/purgatory/sync/throttle.rs:DomainThrottle::next_ready_identifier()`](../../src/purgatory/sync/throttle.rs) | ||
| 313 | |||
| 314 | ### Trigger-Based Processing (Not Polling) | ||
| 315 | |||
| 316 | Domain queues **don't poll** for capacity. Instead, processing is triggered by two events: | ||
| 317 | |||
| 318 | 1. **`complete_request()`** - A request finishes, slot frees | ||
| 319 | 2. **`enqueue_identifier()`** - New identifier added to queue | ||
| 320 | |||
| 321 | Both methods check `has_capacity()` and trigger `try_process_next()` if true. | ||
| 322 | |||
| 323 | **Why trigger-based?** | ||
| 324 | |||
| 325 | - ✅ Lower CPU usage (no busy-waiting) | ||
| 326 | - ✅ Instant response when capacity frees | ||
| 327 | - ✅ Simpler reasoning (event-driven) | ||
| 328 | |||
| 329 | **Implementation**: [`src/purgatory/sync/throttle.rs:ThrottleManager`](../../src/purgatory/sync/throttle.rs) | ||
| 330 | |||
| 331 | --- | ||
| 332 | |||
| 333 | ## 30-Minute Purgatory Expiry | ||
| 334 | |||
| 335 | Purgatory entries **automatically expire** after 30 minutes to prevent unbounded memory growth. | ||
| 336 | |||
| 337 | ### Why 30 Minutes? | ||
| 338 | |||
| 339 | From the [GRASP-01 spec](https://github.com/DanConwayDev/grasp/blob/main/01.md#purgatory): | ||
| 340 | |||
| 341 | > Events should be kept in purgatory and otherwise discarded after 30 minutes. | ||
| 342 | |||
| 343 | This balances: | ||
| 344 | |||
| 345 | - ⏰ **Long enough** for typical sync scenarios (git data usually arrives within minutes) | ||
| 346 | - 🧹 **Short enough** to prevent memory leaks from abandoned events | ||
| 347 | - 🔄 **Recoverable** events are still on other relays and can be re-submitted | ||
| 348 | |||
| 349 | ### Implementation | ||
| 350 | |||
| 351 | Each purgatory entry tracks: | ||
| 352 | |||
| 353 | - `created_at: Instant` - When added to purgatory | ||
| 354 | - `expires_at: Instant` - When to discard (created_at + 30min) | ||
| 355 | |||
| 356 | The main sync loop checks expiry before processing: | ||
| 357 | |||
| 358 | ```rust | ||
| 359 | if !self.has_pending_events(&identifier) { | ||
| 360 | // No events remain (expired or released) → remove from sync queue | ||
| 361 | self.sync_queue.remove(&identifier); | ||
| 362 | } | ||
| 363 | ``` | ||
| 364 | |||
| 365 | **Note**: Expiry is checked implicitly via `has_pending_events()`. If all events for an identifier have expired, the identifier is removed from the sync queue. | ||
| 366 | |||
| 367 | **Implementation**: [`src/purgatory/mod.rs:DEFAULT_EXPIRY`](../../src/purgatory/mod.rs) | ||
| 368 | |||
| 369 | --- | ||
| 370 | |||
| 371 | ## Testability: Mock-Based Architecture | ||
| 372 | |||
| 373 | A key design goal was **100% unit test coverage** without requiring real git servers or databases. | ||
| 374 | |||
| 375 | ### SyncContext Trait | ||
| 376 | |||
| 377 | All external dependencies are abstracted behind the `SyncContext` trait: | ||
| 378 | |||
| 379 | ```rust | ||
| 380 | #[async_trait] | ||
| 381 | pub trait SyncContext: Send + Sync { | ||
| 382 | async fn fetch_repository_data(&self, identifier: &str) -> Result<RepositoryData>; | ||
| 383 | fn collect_needed_oids(&self, identifier: &str) -> HashSet<String>; | ||
| 384 | async fn oid_exists(&self, repo_path: &Path, oid: &str) -> bool; | ||
| 385 | async fn fetch_oids(&self, repo_path: &Path, url: &str, oids: &[String]) -> Result<Vec<String>>; | ||
| 386 | async fn process_newly_available_git_data(&self, ...) -> Result<ProcessResult>; | ||
| 387 | fn has_pending_events(&self, identifier: &str) -> bool; | ||
| 388 | fn find_target_repo(&self, data: &RepositoryData) -> Option<PathBuf>; | ||
| 389 | fn our_domain(&self) -> Option<&str>; | ||
| 390 | } | ||
| 391 | ``` | ||
| 392 | |||
| 393 | **Two Implementations**: | ||
| 394 | |||
| 395 | 1. **`RealSyncContext`** - Production implementation connecting to real systems | ||
| 396 | 2. **`MockSyncContext`** - Test implementation with configurable behavior | ||
| 397 | |||
| 398 | ### MockSyncContext Features | ||
| 399 | |||
| 400 | The mock supports builder-pattern configuration: | ||
| 401 | |||
| 402 | ```rust | ||
| 403 | let mock = MockSyncContext::new() | ||
| 404 | .with_repository_data("test-repo", RepositoryData { | ||
| 405 | announcements: vec![...], | ||
| 406 | clone_urls: vec!["https://server1.com/repo.git".to_string()], | ||
| 407 | }) | ||
| 408 | .with_needed_oids("test-repo", hashset!["abc123", "def456"]) | ||
| 409 | .with_fetch_result("https://server1.com/repo.git", Ok(vec!["abc123"])) | ||
| 410 | .with_fetch_result("https://server2.com/repo.git", Ok(vec!["def456"])); | ||
| 411 | ``` | ||
| 412 | |||
| 413 | **Test Example** (from [`src/purgatory/sync/functions.rs`](../../src/purgatory/sync/functions.rs)): | ||
| 414 | |||
| 415 | ```rust | ||
| 416 | #[tokio::test] | ||
| 417 | async fn test_sync_identifier_partial_success() { | ||
| 418 | let mock = MockSyncContext::new() | ||
| 419 | .with_repository_data("repo", RepositoryData { | ||
| 420 | clone_urls: vec![ | ||
| 421 | "https://server1.com/repo.git".to_string(), | ||
| 422 | "https://server2.com/repo.git".to_string(), | ||
| 423 | ], | ||
| 424 | ..Default::default() | ||
| 425 | }) | ||
| 426 | .with_needed_oids("repo", hashset!["oid1", "oid2"]) | ||
| 427 | .with_fetch_result("https://server1.com/repo.git", Ok(vec!["oid1"])) | ||
| 428 | .with_fetch_result("https://server2.com/repo.git", Ok(vec!["oid2"])); | ||
| 429 | |||
| 430 | let throttle = Arc::new(ThrottleManager::new(5, 30)); | ||
| 431 | let complete = sync_identifier(&mock, "repo", &throttle).await; | ||
| 432 | |||
| 433 | assert!(complete); // Both OIDs fetched | ||
| 434 | } | ||
| 435 | ``` | ||
| 436 | |||
| 437 | **Why this matters**: | ||
| 438 | |||
| 439 | - ✅ Tests run **instantly** (no network I/O) | ||
| 440 | - ✅ Tests are **deterministic** (no flaky failures) | ||
| 441 | - ✅ Tests cover **edge cases** easily (network errors, partial success, etc.) | ||
| 442 | - ✅ Tests are **isolated** (no shared state between tests) | ||
| 443 | |||
| 444 | **Implementation**: [`src/purgatory/sync/context.rs:MockSyncContext`](../../src/purgatory/sync/context.rs) | ||
| 445 | |||
| 446 | --- | ||
| 447 | |||
| 448 | ## Configuration | ||
| 449 | |||
| 450 | Purgatory sync behavior is configurable via CLI flags or environment variables: | ||
| 451 | |||
| 452 | | Setting | CLI Flag | Environment Variable | Default | Description | | ||
| 453 | | ----------------------- | -------- | -------------------- | ------- | ---------------------------------------------------- | | ||
| 454 | | Domain concurrent limit | (future) | (future) | `5` | Max concurrent requests per domain | | ||
| 455 | | Domain rate limit | (future) | (future) | `30` | Max requests per minute per domain | | ||
| 456 | | Sync loop interval | N/A | N/A | `1s` | How often to check for ready identifiers (hardcoded) | | ||
| 457 | | Default sync delay | N/A | N/A | `180s` | Delay for user-submitted events (hardcoded) | | ||
| 458 | | Immediate sync delay | N/A | N/A | `500ms` | Delay for sync-triggered events (hardcoded) | | ||
| 459 | | Purgatory expiry | N/A | N/A | `30min` | How long events wait before expiring (hardcoded) | | ||
| 460 | |||
| 461 | **Note**: Currently, throttle limits and delays are hardcoded constants. Future work may expose these as configuration options if needed. | ||
| 462 | |||
| 463 | --- | ||
| 464 | |||
| 465 | ## Key Design Decisions | ||
| 466 | |||
| 467 | ### 1. Identifier-Based, Not Event-Based | ||
| 468 | |||
| 469 | **Decision**: Sync by repository identifier, not individual events. | ||
| 470 | |||
| 471 | **Rationale**: Multiple events for the same repository should trigger a single fetch operation, not N separate fetches. | ||
| 472 | |||
| 473 | **Impact**: Batches events efficiently, reduces server load. | ||
| 474 | |||
| 475 | ### 2. Two Separate `tried_urls` Tracking | ||
| 476 | |||
| 477 | **Decision**: Main sync loop and domain queues track tried URLs independently. | ||
| 478 | |||
| 479 | **Main sync**: Local `HashSet<String>` for current attempt (all domains) | ||
| 480 | **Domain queue**: Per-identifier `HashSet<String>` for this domain only | ||
| 481 | |||
| 482 | **Rationale**: | ||
| 483 | |||
| 484 | - Main sync skips throttled domains entirely (doesn't need their tried URLs) | ||
| 485 | - Domain queue only cares about URLs from its own domain | ||
| 486 | - No coordination needed → simpler code | ||
| 487 | |||
| 488 | **Impact**: Clean separation of concerns, easier to reason about. | ||
| 489 | |||
| 490 | ### 3. Trigger-Based Domain Processing | ||
| 491 | |||
| 492 | **Decision**: Domain queues process on triggers (capacity freed, new enqueue), not polling. | ||
| 493 | |||
| 494 | **Rationale**: | ||
| 495 | |||
| 496 | - Polling wastes CPU cycles checking capacity every interval | ||
| 497 | - Triggers provide instant response when capacity frees | ||
| 498 | - Event-driven design is easier to test and debug | ||
| 499 | |||
| 500 | **Impact**: Lower CPU usage, faster response times. | ||
| 501 | |||
| 502 | ### 4. Fresh Start on New Events | ||
| 503 | |||
| 504 | **Decision**: Reset `attempt_count` to 0 when new events arrive for an identifier. | ||
| 505 | |||
| 506 | **Rationale**: | ||
| 507 | |||
| 508 | - New events often mean fresh git data is available | ||
| 509 | - Previous failures might have been temporary | ||
| 510 | - Gives repositories a "second chance" without waiting for full backoff | ||
| 511 | |||
| 512 | **Impact**: Faster recovery from transient failures, better UX. | ||
| 513 | |||
| 514 | ### 5. OID Copying in `process_newly_available_git_data` | ||
| 515 | |||
| 516 | **Decision**: Copy OIDs and release events **per successful fetch**, not at end of sync. | ||
| 517 | |||
| 518 | **Rationale**: | ||
| 519 | |||
| 520 | - Events can be released as soon as their specific OIDs are available | ||
| 521 | - Partial success scenarios work correctly (some events release, others stay) | ||
| 522 | - Handles multiple state events for same identifier independently | ||
| 523 | |||
| 524 | **Impact**: Events release faster, better handling of partial success. | ||
| 525 | |||
| 526 | --- | ||
| 527 | |||
| 528 | ## Observability | ||
| 529 | |||
| 530 | ### Logging | ||
| 531 | |||
| 532 | Sync operations produce structured logs at different levels: | ||
| 533 | |||
| 534 | **INFO**: Major events | ||
| 535 | |||
| 536 | ``` | ||
| 537 | Starting purgatory sync loop (interval: 1s) | ||
| 538 | Sync complete - removed from sync queue (identifier=test-repo, complete=true) | ||
| 539 | ``` | ||
| 540 | |||
| 541 | **DEBUG**: Detailed progress | ||
| 542 | |||
| 543 | ``` | ||
| 544 | Added new sync queue entry (identifier=test-repo, delay_secs=180) | ||
| 545 | Starting sync task for identifier (identifier=test-repo) | ||
| 546 | Sync incomplete - applying backoff (identifier=test-repo, attempt_count=2, next_backoff_secs=40) | ||
| 547 | ``` | ||
| 548 | |||
| 549 | **WARN**: Errors and failures | ||
| 550 | |||
| 551 | ``` | ||
| 552 | Failed to fetch OIDs (url=https://server.com/repo.git, error=connection timeout) | ||
| 553 | ``` | ||
| 554 | |||
| 555 | ### Metrics (Future) | ||
| 556 | |||
| 557 | Planned Prometheus metrics for observability: | ||
| 558 | |||
| 559 | - `purgatory_sync_queue_size` - Number of identifiers pending sync | ||
| 560 | - `purgatory_sync_attempts_total{identifier}` - Total sync attempts per identifier | ||
| 561 | - `purgatory_sync_oids_fetched_total{identifier}` - OIDs successfully fetched | ||
| 562 | - `purgatory_domain_in_flight{domain}` - Current in-flight requests per domain | ||
| 563 | - `purgatory_domain_requests_total{domain}` - Total requests per domain | ||
| 564 | |||
| 565 | --- | ||
| 566 | |||
| 567 | ## Testing Strategy | ||
| 568 | |||
| 569 | ### Unit Tests | ||
| 570 | |||
| 571 | Core sync functions have comprehensive unit tests using `MockSyncContext`: | ||
| 572 | |||
| 573 | **`sync_identifier_next_url`** (3 tests): | ||
| 574 | |||
| 575 | - Skips throttled domains | ||
| 576 | - Skips tried URLs | ||
| 577 | - Returns None when all URLs exhausted | ||
| 578 | |||
| 579 | **`sync_identifier_from_url`** (2 tests): | ||
| 580 | |||
| 581 | - Successful fetch triggers processing | ||
| 582 | - Failed fetch doesn't trigger processing | ||
| 583 | |||
| 584 | **`sync_identifier`** (3 tests): | ||
| 585 | |||
| 586 | - Tries multiple URLs until complete | ||
| 587 | - Enqueues throttled domains when incomplete | ||
| 588 | - Handles partial success correctly | ||
| 589 | |||
| 590 | **`SyncQueueEntry`** (3 tests): | ||
| 591 | |||
| 592 | - Backoff calculation correct | ||
| 593 | - Fresh start on new events | ||
| 594 | - Ready state logic correct | ||
| 595 | |||
| 596 | **`DomainThrottle`** (4 tests): | ||
| 597 | |||
| 598 | - Concurrent limit enforced | ||
| 599 | - Rate limit enforced | ||
| 600 | - Round-robin fairness | ||
| 601 | - Queue management correct | ||
| 602 | |||
| 603 | **Total**: 15+ unit tests covering all core logic | ||
| 604 | |||
| 605 | **Location**: [`src/purgatory/sync/`](../../src/purgatory/sync/) (various `#[cfg(test)]` modules) | ||
| 606 | |||
| 607 | ### Integration Tests | ||
| 608 | |||
| 609 | End-to-end tests verify sync behavior with real relay instances: | ||
| 610 | |||
| 611 | **Planned tests**: | ||
| 612 | |||
| 613 | - State event syncs from remote server | ||
| 614 | - PR event syncs from remote server | ||
| 615 | - Partial OID aggregation across multiple servers | ||
| 616 | - Throttling prevents overwhelming servers | ||
| 617 | - Backoff retry after failures | ||
| 618 | |||
| 619 | **Location**: [`tests/purgatory_sync.rs`](../../tests/purgatory_sync.rs) (planned) | ||
| 620 | |||
| 621 | --- | ||
| 622 | |||
| 623 | ## Future Enhancements | ||
| 624 | |||
| 625 | ### 1. Configurable Throttle Limits | ||
| 626 | |||
| 627 | **Current**: Hardcoded to 5 concurrent, 30/min per domain | ||
| 628 | **Future**: CLI flags `--sync-domain-concurrent` and `--sync-domain-rate-limit` | ||
| 629 | |||
| 630 | **Use case**: Operators might want stricter limits for public servers or looser limits for trusted servers. | ||
| 631 | |||
| 632 | ### 2. Per-Domain Throttle Configuration | ||
| 633 | |||
| 634 | **Current**: Same limits for all domains | ||
| 635 | **Future**: Domain-specific overrides (e.g., `github.com:10,60` for higher limits) | ||
| 636 | |||
| 637 | **Use case**: Popular forges like GitHub/GitLab can handle more load than small personal servers. | ||
| 638 | |||
| 639 | ### 3. Prometheus Metrics | ||
| 640 | |||
| 641 | **Current**: Structured logging only | ||
| 642 | **Future**: Export metrics for monitoring dashboards | ||
| 643 | |||
| 644 | **Use case**: Operators want visibility into sync performance, throttle effectiveness, success rates. | ||
| 645 | |||
| 646 | ### 4. Negentropy Integration | ||
| 647 | |||
| 648 | **Current**: Sync triggered by event arrival | ||
| 649 | **Future**: Proactive sync discovers missing events via negentropy | ||
| 650 | |||
| 651 | **Use case**: Catch up with repositories after downtime without waiting for event re-submission. | ||
| 652 | |||
| 653 | --- | ||
| 654 | |||
| 655 | ## Related Documentation | ||
| 656 | |||
| 657 | - **[Purgatory Design](purgatory-design.md)** - Core purgatory concepts and event flows | ||
| 658 | - **[GRASP-02 Proactive Sync](grasp-02-proactive-sync.md)** - Full GRASP-02 implementation (relay sync) | ||
| 659 | - **[Unified Git Data Sync](unify-git-data-sync.md)** - Shared processing for push and sync paths | ||
| 660 | - **[Architecture Overview](architecture.md)** - System-wide architecture | ||
| 661 | |||
| 662 | --- | ||
| 663 | |||
| 664 | ## Summary | ||
| 665 | |||
| 666 | The purgatory sync system is a sophisticated, production-ready implementation that: | ||
| 667 | |||
| 668 | ✅ **Batches intelligently** - Groups events by identifier for efficient fetching | ||
| 669 | ✅ **Retries smartly** - Exponential backoff with fresh start on new events | ||
| 670 | ✅ **Throttles respectfully** - 5 concurrent + 30/min per domain, round-robin fairness | ||
| 671 | ✅ **Times strategically** - 3min for user events, 500ms for synced events | ||
| 672 | ✅ **Expires responsibly** - 30min auto-cleanup prevents memory leaks | ||
| 673 | ✅ **Tests thoroughly** - Mock-based architecture enables comprehensive unit tests | ||
| 674 | |||
| 675 | This design ensures ngit-grasp can serve repositories reliably even when git data and Nostr events arrive out-of-order or from different sources, while respecting remote server capacity and providing excellent observability. | ||