diff options
Diffstat (limited to 'docs/explanation/purgatory-design.md')
| -rw-r--r-- | docs/explanation/purgatory-design.md | 520 |
1 files changed, 316 insertions, 204 deletions
diff --git a/docs/explanation/purgatory-design.md b/docs/explanation/purgatory-design.md index b984745..bd792d4 100644 --- a/docs/explanation/purgatory-design.md +++ b/docs/explanation/purgatory-design.md | |||
| @@ -8,7 +8,11 @@ | |||
| 8 | 8 | ||
| 9 | ## Overview | 9 | ## Overview |
| 10 | 10 | ||
| 11 | Purgatory is an in-memory holding area that solves the **"which arrives first?"** problem in GRASP. Either nostr events or git pushes can arrive in any order: | 11 | Purgatory is an in-memory holding area that solves two related problems in GRASP: |
| 12 | |||
| 13 | ### Problem 1: "Which arrives first?" (State and PR events) | ||
| 14 | |||
| 15 | Either nostr events or git pushes can arrive in any order: | ||
| 12 | 16 | ||
| 13 | - **Event first**: Event waits in purgatory until git data arrives | 17 | - **Event first**: Event waits in purgatory until git data arrives |
| 14 | - **Git first**: Placeholder waits in purgatory until event arrives | 18 | - **Git first**: Placeholder waits in purgatory until event arrives |
| @@ -19,6 +23,18 @@ When both halves arrive, they are processed together and saved to the database. | |||
| 19 | 23 | ||
| 20 | > Accepted repo state announcements, PRs and PR Updates SHOULD be accepted with message "purgatory: won't be served until git data arrives" and kept in purgatory (not served) until the related git data arrives and otherwise discarded after 30 minutes. | 24 | > Accepted repo state announcements, PRs and PR Updates SHOULD be accepted with message "purgatory: won't be served until git data arrives" and kept in purgatory (not served) until the related git data arrives and otherwise discarded after 30 minutes. |
| 21 | 25 | ||
| 26 | ### Problem 2: Misleading empty repository announcements | ||
| 27 | |||
| 28 | When a repository announcement arrives, we must create the bare git repo immediately so pushes can succeed. But if no git data ever arrives, we would serve an empty repo and its announcement indefinitely—clients see the announcement, try to clone, and get nothing. | ||
| 29 | |||
| 30 | **Solution**: New announcements go to **announcement purgatory** instead of being immediately accepted: | ||
| 31 | |||
| 32 | 1. **Announcement arrives** → Create bare repo immediately, add announcement to purgatory | ||
| 33 | 2. **Git data arrives** → Promote announcement from purgatory to active (now served to clients) | ||
| 34 | 3. **No git data before expiry** → Delete bare repo, discard announcement (never served) | ||
| 35 | |||
| 36 | This ensures we only serve announcements for repos that actually have content. | ||
| 37 | |||
| 22 | --- | 38 | --- |
| 23 | 39 | ||
| 24 | ## Key Design Principles | 40 | ## Key Design Principles |
| @@ -31,16 +47,15 @@ Purgatory data is **not persisted** to disk. On restart, all purgatory entries a | |||
| 31 | - Git data can be re-pushed | 47 | - Git data can be re-pushed |
| 32 | - 30-minute expiry means data is transient anyway | 48 | - 30-minute expiry means data is transient anyway |
| 33 | 49 | ||
| 34 | ### 2. Separate Storage for State vs PR Events | 50 | ### 2. Separate Storage for Each Event Type |
| 35 | |||
| 36 | State events (kind 30618) and PR events (kind 1617/1618) have fundamentally different matching patterns: | ||
| 37 | 51 | ||
| 38 | | Event Type | Index | Matching Strategy | | 52 | | Store | Index | Purpose | |
| 39 | |------------|-------|-------------------| | 53 | |-------|-------|---------| |
| 40 | | **State Events** | `identifier` (d tag) | Compare refs at push time | | 54 | | `announcement_purgatory` | `(PublicKey, String)` — `(owner, identifier)` | Announcements awaiting git data | |
| 41 | | **PR Events** | `event_id` (hex string) | Direct match via `refs/nostr/<event-id>` | | 55 | | `state_events` | `identifier` (d tag) | State events awaiting git data | |
| 56 | | `pr_events` | `event_id` (hex string) | PR events awaiting git data | | ||
| 42 | 57 | ||
| 43 | They use **separate DashMap stores** for efficient concurrent access. | 58 | Announcement purgatory uses `(pubkey, identifier)` because identifier alone is not unique across different owners. |
| 44 | 59 | ||
| 45 | ### 3. Late Binding for State Events | 60 | ### 3. Late Binding for State Events |
| 46 | 61 | ||
| @@ -78,7 +93,23 @@ With purgatory checking during authorization: | |||
| 78 | 2. Git push arrives → Checks **database + purgatory** → State found → **AUTHORIZED** ✅ | 93 | 2. Git push arrives → Checks **database + purgatory** → State found → **AUTHORIZED** ✅ |
| 79 | 3. After push succeeds → Save event to database → Remove from purgatory | 94 | 3. After push succeeds → Save event to database → Remove from purgatory |
| 80 | 95 | ||
| 81 | See [`src/git/authorization.rs:51-162`](../../src/git/authorization.rs) for implementation. | 96 | See [`src/git/authorization.rs`](../../src/git/authorization.rs) for implementation. |
| 97 | |||
| 98 | ### 6. Announcement Purgatory: Bare Repo Created Immediately | ||
| 99 | |||
| 100 | **Decision:** Create the bare git repo when announcement enters purgatory. | ||
| 101 | |||
| 102 | **Why:** Git pushes may arrive at any time. Without a repo, pushes fail. | ||
| 103 | |||
| 104 | **Consequence:** We allocate disk space for repos that may expire unused. Must delete repos on expiry. | ||
| 105 | |||
| 106 | ### 7. Replacement Announcements Skip Purgatory | ||
| 107 | |||
| 108 | **Decision:** Announcements replacing an existing active (database) announcement are accepted immediately. | ||
| 109 | |||
| 110 | **Why:** The repository is already proven active with content. | ||
| 111 | |||
| 112 | **How:** Check if active announcement exists for `(pubkey, identifier)` before routing to purgatory. | ||
| 82 | 113 | ||
| 83 | --- | 114 | --- |
| 84 | 115 | ||
| @@ -103,22 +134,54 @@ pub struct RefUpdate { | |||
| 103 | } | 134 | } |
| 104 | ``` | 135 | ``` |
| 105 | 136 | ||
| 137 | ### Announcement Purgatory Entry | ||
| 138 | |||
| 139 | ```rust | ||
| 140 | pub struct AnnouncementPurgatoryEntry { | ||
| 141 | /// The kind 30617 announcement event | ||
| 142 | pub event: Event, | ||
| 143 | |||
| 144 | /// Repository identifier from 'd' tag | ||
| 145 | pub identifier: String, | ||
| 146 | |||
| 147 | /// Event author pubkey | ||
| 148 | pub owner: PublicKey, | ||
| 149 | |||
| 150 | /// Path to the bare git repo on disk (created immediately on entry) | ||
| 151 | pub repo_path: PathBuf, | ||
| 152 | |||
| 153 | /// Relay URLs from 'relays'/'clone' tags — for sync registration | ||
| 154 | pub relays: HashSet<String>, | ||
| 155 | |||
| 156 | /// When added to purgatory | ||
| 157 | pub created_at: Instant, | ||
| 158 | |||
| 159 | /// Expiry deadline (30 min from creation, may be extended) | ||
| 160 | pub expires_at: Instant, | ||
| 161 | |||
| 162 | /// Whether the bare repo has been deleted (soft expiry phase) | ||
| 163 | pub soft_expired: bool, | ||
| 164 | } | ||
| 165 | ``` | ||
| 166 | |||
| 167 | **Indexed by `(pubkey, identifier)`** because identifier is not unique across different owners. | ||
| 168 | |||
| 106 | ### State Purgatory Entry | 169 | ### State Purgatory Entry |
| 107 | 170 | ||
| 108 | ```rust | 171 | ```rust |
| 109 | pub struct StatePurgatoryEntry { | 172 | pub struct StatePurgatoryEntry { |
| 110 | /// The nostr state event (kind 30618) awaiting git data | 173 | /// The nostr state event (kind 30618) awaiting git data |
| 111 | pub event: Event, | 174 | pub event: Event, |
| 112 | 175 | ||
| 113 | /// Repository identifier from 'd' tag | 176 | /// Repository identifier from 'd' tag |
| 114 | pub identifier: String, | 177 | pub identifier: String, |
| 115 | 178 | ||
| 116 | /// Event author pubkey | 179 | /// Event author pubkey |
| 117 | pub author: PublicKey, | 180 | pub author: PublicKey, |
| 118 | 181 | ||
| 119 | /// When added to purgatory | 182 | /// When added to purgatory |
| 120 | pub created_at: Instant, | 183 | pub created_at: Instant, |
| 121 | 184 | ||
| 122 | /// Expiry deadline (30 min from creation, may be extended) | 185 | /// Expiry deadline (30 min from creation, may be extended) |
| 123 | pub expires_at: Instant, | 186 | pub expires_at: Instant, |
| 124 | } | 187 | } |
| @@ -132,14 +195,14 @@ pub struct StatePurgatoryEntry { | |||
| 132 | pub struct PrPurgatoryEntry { | 195 | pub struct PrPurgatoryEntry { |
| 133 | /// The nostr PR event, if received (None = git data arrived first) | 196 | /// The nostr PR event, if received (None = git data arrived first) |
| 134 | pub event: Option<Event>, | 197 | pub event: Option<Event>, |
| 135 | 198 | ||
| 136 | /// Expected commit SHA from 'c' tag (if event exists) | 199 | /// Expected commit SHA from 'c' tag (if event exists) |
| 137 | /// or actual commit pushed (if git arrived first) | 200 | /// or actual commit pushed (if git arrived first) |
| 138 | pub commit: String, | 201 | pub commit: String, |
| 139 | 202 | ||
| 140 | /// When added to purgatory | 203 | /// When added to purgatory |
| 141 | pub created_at: Instant, | 204 | pub created_at: Instant, |
| 142 | 205 | ||
| 143 | /// Expiry deadline (30 min from creation) | 206 | /// Expiry deadline (30 min from creation) |
| 144 | pub expires_at: Instant, | 207 | pub expires_at: Instant, |
| 145 | } | 208 | } |
| @@ -151,24 +214,155 @@ pub struct PrPurgatoryEntry { | |||
| 151 | 214 | ||
| 152 | ```rust | 215 | ```rust |
| 153 | pub struct Purgatory { | 216 | pub struct Purgatory { |
| 217 | /// Announcement events indexed by (owner, identifier) | ||
| 218 | announcement_purgatory: DashMap<(PublicKey, String), AnnouncementPurgatoryEntry>, | ||
| 219 | |||
| 154 | /// State events indexed by identifier (d tag) | 220 | /// State events indexed by identifier (d tag) |
| 155 | /// Multiple state events per identifier allowed (different authors) | 221 | /// Multiple state events per identifier allowed (different authors) |
| 156 | state_events: Arc<DashMap<String, Vec<StatePurgatoryEntry>>>, | 222 | state_events: DashMap<String, Vec<StatePurgatoryEntry>>, |
| 157 | 223 | ||
| 158 | /// PR events indexed by event_id (hex string) | 224 | /// PR events indexed by event_id (hex string) |
| 159 | /// Single entry per event ID | 225 | /// Single entry per event ID |
| 160 | pr_events: Arc<DashMap<String, PrPurgatoryEntry>>, | 226 | pr_events: DashMap<String, PrPurgatoryEntry>, |
| 161 | 227 | ||
| 162 | /// Sync queue for background git data fetching | 228 | /// Sync queue for background git data fetching |
| 163 | sync_queue: Arc<DashMap<String, SyncQueueEntry>>, | 229 | sync_queue: DashMap<String, SyncQueueEntry>, |
| 164 | 230 | ||
| 165 | _git_data_path: PathBuf, | 231 | /// Events that previously expired without git data (prevents re-sync loops) |
| 232 | expired_events: DashMap<EventId, Instant>, | ||
| 166 | } | 233 | } |
| 167 | ``` | 234 | ``` |
| 168 | 235 | ||
| 169 | --- | 236 | --- |
| 170 | 237 | ||
| 171 | ## Event Flows | 238 | ## Announcement Purgatory Flows |
| 239 | |||
| 240 | ### New Announcement Flow | ||
| 241 | |||
| 242 | ``` | ||
| 243 | Announcement arrives | ||
| 244 | | | ||
| 245 | v | ||
| 246 | Is there an active announcement for (pubkey, identifier) in DB? | ||
| 247 | | | ||
| 248 | +-- YES --> Accept immediately (replacement, repo already proven) | ||
| 249 | | | ||
| 250 | +-- NO --> Is there a purgatory entry for (pubkey, identifier)? | ||
| 251 | | | ||
| 252 | +-- YES --> Replace purgatory entry, extend expiry 30 min | ||
| 253 | | Return OK to client (but don't serve) | ||
| 254 | | | ||
| 255 | +-- NO --> Create bare repo | ||
| 256 | Add to purgatory | ||
| 257 | Return OK to client (but don't serve) | ||
| 258 | ``` | ||
| 259 | |||
| 260 | ### Git Data Arrival → Promotion | ||
| 261 | |||
| 262 | ``` | ||
| 263 | Git push/fetch completes with data | ||
| 264 | | | ||
| 265 | v | ||
| 266 | process_purgatory_announcements() called | ||
| 267 | | | ||
| 268 | v | ||
| 269 | Is there a purgatory announcement for (owner, identifier)? | ||
| 270 | | | ||
| 271 | +-- YES --> promote_announcement() removes from purgatory | ||
| 272 | | Save event to database | ||
| 273 | | Notify WebSocket clients | ||
| 274 | | (Sync upgrades to Full automatically via SelfSubscriber) | ||
| 275 | | | ||
| 276 | +-- NO --> Normal processing | ||
| 277 | ``` | ||
| 278 | |||
| 279 | ### State Event Arrival for Purgatory Announcement | ||
| 280 | |||
| 281 | ``` | ||
| 282 | State event arrives | ||
| 283 | | | ||
| 284 | v | ||
| 285 | fetch_repository_data_with_purgatory() checks DB + purgatory | ||
| 286 | | | ||
| 287 | +-- Announcement found in purgatory --> | ||
| 288 | | Validate authorization against purgatory announcement | ||
| 289 | | Extend purgatory announcement expiry (reset 30-min timer) | ||
| 290 | | If soft-expired: recreate bare repo, clear soft_expired flag | ||
| 291 | | Route state event to state purgatory | ||
| 292 | | | ||
| 293 | +-- No announcement anywhere --> Reject | ||
| 294 | ``` | ||
| 295 | |||
| 296 | ### Announcement Expiry (Two-Phase Soft Expiry) | ||
| 297 | |||
| 298 | The protocol specifies 30-minute expiry for announcements. We implement a two-phase soft expiry: | ||
| 299 | |||
| 300 | **Phase 1 — Initial 30-minute expiry (`soft_expired == false`):** | ||
| 301 | - Delete the bare git repo (frees disk space, respects protocol expiry) | ||
| 302 | - Set `soft_expired = true` | ||
| 303 | - Extend `expires_at` by 24 hours (`SOFT_EXPIRY_EXTENDED`) | ||
| 304 | - Continue syncing state events (same as active purgatory) | ||
| 305 | |||
| 306 | **Phase 2 — 24-hour soft expiry (`soft_expired == true`):** | ||
| 307 | - Add event ID to `expired_events` (prevents re-sync loops) | ||
| 308 | - Remove entry completely from `announcement_purgatory` | ||
| 309 | |||
| 310 | **Why soft expiry?** Without it, we'd face a dilemma: | ||
| 311 | |||
| 312 | - Add expired announcements to `failed_events` → permanently reject future state events, losing potential revival when state events arrive late | ||
| 313 | - Re-fetch the announcement event on every sync cycle → wasting bandwidth and creating unnecessary sync traffic | ||
| 314 | |||
| 315 | Soft expiry retains the event for 24 hours so that late-arriving state events (e.g. from a slow sync) can revive the announcement without forcing a full re-announcement flow. | ||
| 316 | |||
| 317 | **Revival:** If a state event arrives for a soft-expired announcement, `extend_announcement_expiry()` recreates the bare repo, clears `soft_expired`, and resets the 30-minute timer. | ||
| 318 | |||
| 319 | ### Expiry Extension Triggers | ||
| 320 | |||
| 321 | The 30-minute purgatory timer is reset (extended) in three scenarios: | ||
| 322 | |||
| 323 | | Trigger | Location | Why | | ||
| 324 | |---------|----------|-----| | ||
| 325 | | State event arrives | `StatePolicy::process_state_event()` | Repo is actively receiving metadata | | ||
| 326 | | Git push authorized against purgatory state | `get_state_authorization_for_specific_owner_repo()` | Repo is actively receiving git data | | ||
| 327 | | Replacement announcement arrives | `AnnouncementPolicy::validate()` | Announcement updated | | ||
| 328 | |||
| 329 | All three call `purgatory.extend_announcement_expiry(owner, identifier, 1800s)`. | ||
| 330 | |||
| 331 | ### Purgatory Lifecycle | ||
| 332 | |||
| 333 | ``` | ||
| 334 | ┌─────────────────────────────────────┐ | ||
| 335 | │ │ | ||
| 336 | v │ | ||
| 337 | Announcement ──> ACTIVE ──────────────────────────────────┤ | ||
| 338 | arrives (bare repo exists) │ | ||
| 339 | │ │ | ||
| 340 | ├── Git data ──> PROMOTED (exit) │ | ||
| 341 | │ │ | ||
| 342 | ├── Deletion ──> REMOVED (exit) │ | ||
| 343 | │ │ | ||
| 344 | v │ | ||
| 345 | SOFT_EXPIRED ──────────────────────────────┘ | ||
| 346 | (bare repo deleted, ^ | ||
| 347 | event retained) │ | ||
| 348 | │ │ | ||
| 349 | ├── State event arrives (revival) | ||
| 350 | │ | ||
| 351 | └── Extended expiry ──> REMOVED (exit) | ||
| 352 | ``` | ||
| 353 | |||
| 354 | | Exit | Trigger | Action | | ||
| 355 | |------|---------|--------| | ||
| 356 | | **Promotion** | Git data arrives | Move to database, sync upgrades to Full | | ||
| 357 | | **Soft expiry** | Initial 30-min timeout | Delete bare repo, retain event, continue sync | | ||
| 358 | | **Full expiry** | 24-hour soft expiry | Add to expired_events, remove from purgatory | | ||
| 359 | | **Deletion** | Kind 5 event | Delete bare repo, remove from purgatory | | ||
| 360 | | **Replacement** | Newer announcement (same pubkey, identifier) | Replace entry, extend expiry | | ||
| 361 | | **Service change** | Newer announcement removes our service | Remove from purgatory | | ||
| 362 | |||
| 363 | --- | ||
| 364 | |||
| 365 | ## State and PR Event Flows | ||
| 172 | 366 | ||
| 173 | ### State Event Arrival (Kind 30618) | 367 | ### State Event Arrival (Kind 30618) |
| 174 | 368 | ||
| @@ -377,11 +571,12 @@ Purgatory includes a background sync system that fetches git data from remote se | |||
| 377 | ▼ | 571 | ▼ |
| 378 | ┌─────────────────────────────────────────────────────┐ | 572 | ┌─────────────────────────────────────────────────────┐ |
| 379 | │ process_newly_available_git_data(repo, oids) │ | 573 | │ process_newly_available_git_data(repo, oids) │ |
| 380 | │ 1. Find satisfiable state events in purgatory │ | 574 | │ 1. Find satisfiable announcement in purgatory │ |
| 381 | │ 2. Find satisfiable PR events in purgatory │ | 575 | │ 2. Find satisfiable state events in purgatory │ |
| 382 | │ 3. Save events to database │ | 576 | │ 3. Find satisfiable PR events in purgatory │ |
| 383 | │ 4. Sync git data to other owner repos │ | 577 | │ 4. Save events to database │ |
| 384 | │ 5. Remove from purgatory │ | 578 | │ 5. Sync git data to other owner repos │ |
| 579 | │ 6. Remove from purgatory │ | ||
| 385 | └─────────────────────────────────────────────────────┘ | 580 | └─────────────────────────────────────────────────────┘ |
| 386 | ``` | 581 | ``` |
| 387 | 582 | ||
| @@ -402,8 +597,8 @@ pub struct SyncQueueEntry { | |||
| 402 | 597 | ||
| 403 | **Backoff strategy:** | 598 | **Backoff strategy:** |
| 404 | - First attempt: 20 seconds | 599 | - First attempt: 20 seconds |
| 405 | - Second attempt: 2 minutes | 600 | - Second attempt: 40 seconds |
| 406 | - Subsequent attempts: 2 minutes | 601 | - Subsequent attempts: capped at 2 minutes |
| 407 | 602 | ||
| 408 | ### Sync Delays | 603 | ### Sync Delays |
| 409 | 604 | ||
| @@ -428,7 +623,7 @@ pub struct ThrottleManager { | |||
| 428 | ``` | 623 | ``` |
| 429 | 624 | ||
| 430 | **Rate limiting:** | 625 | **Rate limiting:** |
| 431 | - Default: 5 requests per domain per 30 seconds | 626 | - Default: 5 concurrent requests per domain, 30 requests per minute |
| 432 | - Tracks request timestamps in a sliding window | 627 | - Tracks request timestamps in a sliding window |
| 433 | - Queues identifiers when domain is throttled | 628 | - Queues identifiers when domain is throttled |
| 434 | - Processes queue when capacity frees up | 629 | - Processes queue when capacity frees up |
| @@ -439,7 +634,47 @@ See [`src/purgatory/sync/throttle.rs`](../../src/purgatory/sync/throttle.rs) for | |||
| 439 | 634 | ||
| 440 | ## Purgatory API | 635 | ## Purgatory API |
| 441 | 636 | ||
| 442 | ### Adding Entries | 637 | ### Announcement Purgatory |
| 638 | |||
| 639 | ```rust | ||
| 640 | impl Purgatory { | ||
| 641 | /// Add an announcement to purgatory (bare repo already created by caller) | ||
| 642 | pub fn add_announcement( | ||
| 643 | &self, | ||
| 644 | event: Event, | ||
| 645 | identifier: String, | ||
| 646 | owner: PublicKey, | ||
| 647 | repo_path: PathBuf, | ||
| 648 | relays: HashSet<String>, | ||
| 649 | ); | ||
| 650 | |||
| 651 | /// Promote announcement: remove from purgatory, return event for DB save | ||
| 652 | pub fn promote_announcement( | ||
| 653 | &self, | ||
| 654 | owner: &PublicKey, | ||
| 655 | identifier: &str, | ||
| 656 | ) -> Option<Event>; | ||
| 657 | |||
| 658 | /// Get announcements by identifier (for authorization checks) | ||
| 659 | pub fn get_announcements_by_identifier( | ||
| 660 | &self, | ||
| 661 | identifier: &str, | ||
| 662 | ) -> Vec<AnnouncementPurgatoryEntry>; | ||
| 663 | |||
| 664 | /// Extend expiry (and revive soft-expired entries, recreating bare repo) | ||
| 665 | pub fn extend_announcement_expiry( | ||
| 666 | &self, | ||
| 667 | owner: &PublicKey, | ||
| 668 | identifier: &str, | ||
| 669 | duration: Duration, | ||
| 670 | ); | ||
| 671 | |||
| 672 | /// Get all announcements for sync registration | ||
| 673 | pub fn announcements_for_sync(&self) -> Vec<AnnouncementPurgatoryEntry>; | ||
| 674 | } | ||
| 675 | ``` | ||
| 676 | |||
| 677 | ### State and PR Purgatory | ||
| 443 | 678 | ||
| 444 | ```rust | 679 | ```rust |
| 445 | impl Purgatory { | 680 | impl Purgatory { |
| @@ -453,13 +688,7 @@ impl Purgatory { | |||
| 453 | 688 | ||
| 454 | /// Add a PR placeholder (git-data-first scenario) | 689 | /// Add a PR placeholder (git-data-first scenario) |
| 455 | pub fn add_pr_placeholder(&self, event_id: String, commit: String); | 690 | pub fn add_pr_placeholder(&self, event_id: String, commit: String); |
| 456 | } | ||
| 457 | ``` | ||
| 458 | 691 | ||
| 459 | ### Finding Entries | ||
| 460 | |||
| 461 | ```rust | ||
| 462 | impl Purgatory { | ||
| 463 | /// Find state events waiting for an identifier | 692 | /// Find state events waiting for an identifier |
| 464 | pub fn find_state(&self, identifier: &str) -> Vec<StatePurgatoryEntry>; | 693 | pub fn find_state(&self, identifier: &str) -> Vec<StatePurgatoryEntry>; |
| 465 | 694 | ||
| @@ -476,13 +705,7 @@ impl Purgatory { | |||
| 476 | 705 | ||
| 477 | /// Find a PR placeholder specifically (git-data-first) | 706 | /// Find a PR placeholder specifically (git-data-first) |
| 478 | pub fn find_pr_placeholder(&self, event_id: &str) -> Option<String>; | 707 | pub fn find_pr_placeholder(&self, event_id: &str) -> Option<String>; |
| 479 | } | ||
| 480 | ``` | ||
| 481 | 708 | ||
| 482 | ### Removing Entries | ||
| 483 | |||
| 484 | ```rust | ||
| 485 | impl Purgatory { | ||
| 486 | /// Remove all state events for an identifier | 709 | /// Remove all state events for an identifier |
| 487 | pub fn remove_state(&self, identifier: &str); | 710 | pub fn remove_state(&self, identifier: &str); |
| 488 | 711 | ||
| @@ -499,36 +722,14 @@ impl Purgatory { | |||
| 499 | ```rust | 722 | ```rust |
| 500 | impl Purgatory { | 723 | impl Purgatory { |
| 501 | /// Remove expired entries (called every 60 seconds) | 724 | /// Remove expired entries (called every 60 seconds) |
| 502 | /// Returns (state_removed, pr_removed) | 725 | /// Handles two-phase soft expiry for announcements |
| 503 | pub fn cleanup(&self) -> (usize, usize); | 726 | pub fn cleanup(&self); |
| 504 | 727 | ||
| 505 | /// Extend expiry for entries about to be processed | 728 | /// Extend expiry for state/PR entries about to be processed |
| 506 | /// Ensures at least `duration` remaining | ||
| 507 | pub fn extend_expiry(&self, identifier: &str, event_ids: &[EventId], duration: Duration); | 729 | pub fn extend_expiry(&self, identifier: &str, event_ids: &[EventId], duration: Duration); |
| 508 | 730 | ||
| 509 | /// Get current counts for metrics | 731 | /// Check if an event previously expired (prevents re-sync loops) |
| 510 | pub fn count(&self) -> (usize, usize); | 732 | pub fn is_expired(&self, event_id: &EventId) -> bool; |
| 511 | } | ||
| 512 | ``` | ||
| 513 | |||
| 514 | ### Sync Queue Management | ||
| 515 | |||
| 516 | ```rust | ||
| 517 | impl Purgatory { | ||
| 518 | /// Enqueue identifier for sync with custom delay | ||
| 519 | pub fn enqueue_sync(&self, identifier: &str, delay: Duration); | ||
| 520 | |||
| 521 | /// Enqueue with default delay (3 minutes) | ||
| 522 | pub fn enqueue_sync_default(&self, identifier: &str); | ||
| 523 | |||
| 524 | /// Enqueue with immediate delay (500ms) | ||
| 525 | pub fn enqueue_sync_immediate(&self, identifier: &str); | ||
| 526 | |||
| 527 | /// Check if identifier has pending events | ||
| 528 | pub fn has_pending_events(&self, identifier: &str) -> bool; | ||
| 529 | |||
| 530 | /// Remove identifier from sync queue | ||
| 531 | pub fn remove_from_sync_queue(&self, identifier: &str); | ||
| 532 | } | 733 | } |
| 533 | ``` | 734 | ``` |
| 534 | 735 | ||
| @@ -558,12 +759,6 @@ pub fn can_apply_state( | |||
| 558 | event: &Event, | 759 | event: &Event, |
| 559 | repo_path: &Path, | 760 | repo_path: &Path, |
| 560 | ) -> Result<bool>; | 761 | ) -> Result<bool>; |
| 561 | |||
| 562 | /// Get refs from state that aren't being pushed | ||
| 563 | pub fn get_unpushed_refs( | ||
| 564 | state_refs: &[RefPair], | ||
| 565 | pushed_refs: &[RefPair], | ||
| 566 | ) -> Vec<RefPair>; | ||
| 567 | ``` | 762 | ``` |
| 568 | 763 | ||
| 569 | See [`src/purgatory/helpers.rs`](../../src/purgatory/helpers.rs) for implementation. | 764 | See [`src/purgatory/helpers.rs`](../../src/purgatory/helpers.rs) for implementation. |
| @@ -572,123 +767,37 @@ See [`src/purgatory/helpers.rs`](../../src/purgatory/helpers.rs) for implementat | |||
| 572 | 767 | ||
| 573 | ## Integration Points | 768 | ## Integration Points |
| 574 | 769 | ||
| 575 | ### 1. Event Policy (Nip34WritePolicy) | 770 | ### 1. Announcement Policy (`src/nostr/policy/announcement.rs`) |
| 576 | 771 | ||
| 577 | State and PR events are added to purgatory when git data doesn't exist: | 772 | Routes new announcements to purgatory or accepts replacements: |
| 578 | 773 | ||
| 579 | ```rust | 774 | - If active DB announcement exists for `(pubkey, identifier)` → `Accept` immediately |
| 580 | // From src/nostr/policy/state.rs | 775 | - If purgatory entry exists → replace it, extend expiry, return `Accept` |
| 581 | async fn handle_state(&self, event: &Event) -> WritePolicyResult { | 776 | - Otherwise → return `AcceptPurgatory`, caller calls `add_to_purgatory()` which creates bare repo and adds to purgatory |
| 582 | let identifier = extract_identifier(event)?; | ||
| 583 | |||
| 584 | // Check if we have matching git data | ||
| 585 | if self.has_matching_git_data(&identifier, event).await? { | ||
| 586 | return WritePolicyResult::Accept; | ||
| 587 | } | ||
| 588 | |||
| 589 | // Add to purgatory | ||
| 590 | self.purgatory.add_state( | ||
| 591 | event.clone(), | ||
| 592 | identifier.clone(), | ||
| 593 | event.pubkey, | ||
| 594 | ); | ||
| 595 | |||
| 596 | WritePolicyResult::Reject { | ||
| 597 | status: true, // Client sees OK | ||
| 598 | message: "purgatory: awaiting git data".into() | ||
| 599 | } | ||
| 600 | } | ||
| 601 | ``` | ||
| 602 | 777 | ||
| 603 | ### 2. Git Push Authorization | 778 | ### 2. State Event Policy (`src/nostr/policy/state.rs`) |
| 604 | 779 | ||
| 605 | Authorization checks both database and purgatory: | 780 | Checks purgatory announcements for authorization and extends their expiry: |
| 606 | 781 | ||
| 607 | ```rust | 782 | ```rust |
| 608 | // From src/git/authorization.rs | 783 | // Fetch announcements from both DB and purgatory |
| 609 | pub async fn authorize_push( | 784 | let repo_data = fetch_repository_data_with_purgatory(db, purgatory, identifier).await?; |
| 610 | database: &SharedDatabase, | 785 | |
| 611 | identifier: &str, | 786 | // For each authorized owner with a purgatory announcement, extend expiry |
| 612 | owner_pubkey: &str, | 787 | purgatory.extend_announcement_expiry(&owner_pk, &identifier, Duration::from_secs(1800)); |
| 613 | request_body: &Bytes, | ||
| 614 | purgatory: &Arc<Purgatory>, // Critical! | ||
| 615 | repo_path: &std::path::Path, | ||
| 616 | ) -> anyhow::Result<AuthorizationResult> { | ||
| 617 | // Parse pushed refs | ||
| 618 | let pushed_refs = parse_pushed_refs(request_body); | ||
| 619 | |||
| 620 | // Check database for state events | ||
| 621 | let db_result = get_authorization_from_db(database, identifier).await?; | ||
| 622 | |||
| 623 | if !db_result.authorized { | ||
| 624 | // No state in database - check purgatory | ||
| 625 | let purgatory_result = get_state_authorization_for_specific_owner_repo( | ||
| 626 | database, | ||
| 627 | identifier, | ||
| 628 | owner_pubkey, | ||
| 629 | purgatory, | ||
| 630 | &pushed_refs, | ||
| 631 | repo_path, | ||
| 632 | ).await?; | ||
| 633 | |||
| 634 | return purgatory_result; | ||
| 635 | } | ||
| 636 | |||
| 637 | db_result | ||
| 638 | } | ||
| 639 | ``` | 788 | ``` |
| 640 | 789 | ||
| 641 | ### 3. Post-Push Processing | 790 | ### 3. Git Push Authorization (`src/git/authorization.rs`) |
| 642 | 791 | ||
| 643 | After successful push, events from purgatory are saved to database: | 792 | `fetch_repository_data_with_purgatory()` merges DB announcements with purgatory announcements for authorization. On successful authorization via purgatory state events, also extends announcement expiry. |
| 644 | 793 | ||
| 645 | ```rust | 794 | ### 4. Git Data Processing (`src/git/sync.rs`) |
| 646 | // From src/git/handlers.rs | ||
| 647 | if from_purgatory { | ||
| 648 | if let (Some(db), Some(purg)) = (&database, &purgatory) { | ||
| 649 | // Save state event to database | ||
| 650 | db.save_event(&state.event).await?; | ||
| 651 | |||
| 652 | // Remove from purgatory | ||
| 653 | purg.remove_state_event(identifier, &state.event.id); | ||
| 654 | } | ||
| 655 | } | ||
| 656 | ``` | ||
| 657 | 795 | ||
| 658 | ### 4. Background Sync Loop | 796 | `process_purgatory_announcements()` is called after any git push or background sync fetch. It promotes announcements from purgatory to the database and notifies WebSocket clients. |
| 659 | 797 | ||
| 660 | Started during application initialization: | 798 | ### 5. Sync Registration (`src/sync/`) |
| 661 | 799 | ||
| 662 | ```rust | 800 | A background timer (`run_purgatory_announcement_sync`, every 5 seconds) ensures purgatory announcements are registered in `RepoSyncIndex` with `SyncLevel::StateOnly`. When an announcement is promoted, the `SelfSubscriber` upgrades it to `SyncLevel::Full`. |
| 663 | // From src/main.rs | ||
| 664 | let purgatory = Arc::new(Purgatory::new(git_data_path)); | ||
| 665 | let ctx = Arc::new(RealSyncContext::new( | ||
| 666 | database.clone(), | ||
| 667 | purgatory.clone(), | ||
| 668 | config.domain.clone(), | ||
| 669 | git_data_path.clone(), | ||
| 670 | )); | ||
| 671 | let throttle_manager = Arc::new(ThrottleManager::new(5, 30)); | ||
| 672 | throttle_manager.set_context(ctx.clone()); | ||
| 673 | |||
| 674 | // Start sync loop | ||
| 675 | let sync_handle = purgatory.clone().start_sync_loop(ctx, throttle_manager); | ||
| 676 | |||
| 677 | // Start cleanup task | ||
| 678 | let cleanup_handle = tokio::spawn(async move { | ||
| 679 | let mut interval = tokio::time::interval(Duration::from_secs(60)); | ||
| 680 | loop { | ||
| 681 | interval.tick().await; | ||
| 682 | let (state_removed, pr_removed) = purgatory.cleanup(); | ||
| 683 | if state_removed + pr_removed > 0 { | ||
| 684 | tracing::debug!( | ||
| 685 | "Purgatory cleanup removed {} state, {} PR entries", | ||
| 686 | state_removed, pr_removed | ||
| 687 | ); | ||
| 688 | } | ||
| 689 | } | ||
| 690 | }); | ||
| 691 | ``` | ||
| 692 | 801 | ||
| 693 | --- | 802 | --- |
| 694 | 803 | ||
| @@ -698,7 +807,7 @@ let cleanup_handle = tokio::spawn(async move { | |||
| 698 | src/ | 807 | src/ |
| 699 | ├── purgatory/ | 808 | ├── purgatory/ |
| 700 | │ ├── mod.rs # Main Purgatory struct and API | 809 | │ ├── mod.rs # Main Purgatory struct and API |
| 701 | │ ├── types.rs # RefPair, StatePurgatoryEntry, PrPurgatoryEntry | 810 | │ ├── types.rs # RefPair, AnnouncementPurgatoryEntry, StatePurgatoryEntry, PrPurgatoryEntry |
| 702 | │ ├── helpers.rs # Ref extraction and matching functions | 811 | │ ├── helpers.rs # Ref extraction and matching functions |
| 703 | │ └── sync/ | 812 | │ └── sync/ |
| 704 | │ ├── mod.rs # Sync module exports | 813 | │ ├── mod.rs # Sync module exports |
| @@ -710,9 +819,10 @@ src/ | |||
| 710 | ├── git/ | 819 | ├── git/ |
| 711 | │ ├── authorization.rs # authorize_push with purgatory checking | 820 | │ ├── authorization.rs # authorize_push with purgatory checking |
| 712 | │ ├── handlers.rs # handle_receive_pack with post-push processing | 821 | │ ├── handlers.rs # handle_receive_pack with post-push processing |
| 713 | │ └── sync.rs # process_newly_available_git_data | 822 | │ └── sync.rs # process_newly_available_git_data, process_purgatory_announcements |
| 714 | └── nostr/ | 823 | └── nostr/ |
| 715 | └── policy/ | 824 | └── policy/ |
| 825 | ├── announcement.rs # Route announcements to purgatory | ||
| 716 | ├── state.rs # State event policy with purgatory | 826 | ├── state.rs # State event policy with purgatory |
| 717 | └── pr_event.rs # PR event policy with purgatory | 827 | └── pr_event.rs # PR event policy with purgatory |
| 718 | ``` | 828 | ``` |
| @@ -725,7 +835,7 @@ src/ | |||
| 725 | 835 | ||
| 726 | Located in each module: | 836 | Located in each module: |
| 727 | 837 | ||
| 728 | - **[`src/purgatory/mod.rs`](../../src/purgatory/mod.rs)** - Core purgatory operations | 838 | - **[`src/purgatory/mod.rs`](../../src/purgatory/mod.rs)** - Core purgatory operations including announcement purgatory |
| 729 | - **[`src/purgatory/helpers.rs`](../../src/purgatory/helpers.rs)** - Ref matching logic | 839 | - **[`src/purgatory/helpers.rs`](../../src/purgatory/helpers.rs)** - Ref matching logic |
| 730 | - **[`src/purgatory/sync/functions.rs`](../../src/purgatory/sync/functions.rs)** - Sync functions with MockSyncContext | 840 | - **[`src/purgatory/sync/functions.rs`](../../src/purgatory/sync/functions.rs)** - Sync functions with MockSyncContext |
| 731 | - **[`src/purgatory/sync/throttle.rs`](../../src/purgatory/sync/throttle.rs)** - Throttle manager | 841 | - **[`src/purgatory/sync/throttle.rs`](../../src/purgatory/sync/throttle.rs)** - Throttle manager |
| @@ -734,6 +844,9 @@ Located in each module: | |||
| 734 | 844 | ||
| 735 | Located in [`tests/`](../../tests/): | 845 | Located in [`tests/`](../../tests/): |
| 736 | 846 | ||
| 847 | - **Announcement purgatory flow** - Announcement enters purgatory, git data promotes it | ||
| 848 | - **Announcement soft expiry** - Bare repo deleted after 30 min, event retained 24h | ||
| 849 | - **Announcement revival** - State event revives soft-expired announcement | ||
| 737 | - **State event purgatory flow** - Event arrives, git push releases it | 850 | - **State event purgatory flow** - Event arrives, git push releases it |
| 738 | - **PR event purgatory flow** - Event arrives, git push releases it | 851 | - **PR event purgatory flow** - Event arrives, git push releases it |
| 739 | - **Git-data-first flow** - Git push creates placeholder, event completes it | 852 | - **Git-data-first flow** - Git push creates placeholder, event completes it |
| @@ -744,7 +857,19 @@ Located in [`tests/`](../../tests/): | |||
| 744 | 857 | ||
| 745 | ## Key Learnings | 858 | ## Key Learnings |
| 746 | 859 | ||
| 747 | ### 1. Purgatory Authorization is Critical | 860 | ### 1. Announcement Purgatory Prevents Misleading Empty Repos |
| 861 | |||
| 862 | Without announcement purgatory, we'd serve announcements for repos with no content. Clients would see the announcement, try to clone, and get nothing. | ||
| 863 | |||
| 864 | **Solution:** Announcements wait in purgatory until git data proves content exists. | ||
| 865 | |||
| 866 | ### 2. Soft Expiry Avoids Sync Loops | ||
| 867 | |||
| 868 | The protocol's 30-minute expiry creates a problem: without soft expiry, we'd either permanently block repositories or constantly re-sync expired announcement events. | ||
| 869 | |||
| 870 | **Solution:** Soft expiry retains the event for 24 hours after deleting the bare repo, allowing revival without re-fetching. | ||
| 871 | |||
| 872 | ### 3. Purgatory Authorization is Critical | ||
| 748 | 873 | ||
| 749 | Without checking purgatory during authorization, we have a deadlock: | 874 | Without checking purgatory during authorization, we have a deadlock: |
| 750 | - State event goes to purgatory (no git data) | 875 | - State event goes to purgatory (no git data) |
| @@ -753,7 +878,7 @@ Without checking purgatory during authorization, we have a deadlock: | |||
| 753 | 878 | ||
| 754 | **Solution:** `authorize_push()` checks both database and purgatory. | 879 | **Solution:** `authorize_push()` checks both database and purgatory. |
| 755 | 880 | ||
| 756 | ### 2. Late Binding for State Events | 881 | ### 4. Late Binding for State Events |
| 757 | 882 | ||
| 758 | Extracting refs at event arrival time doesn't work when: | 883 | Extracting refs at event arrival time doesn't work when: |
| 759 | - Multiple state events arrive for same identifier | 884 | - Multiple state events arrive for same identifier |
| @@ -761,7 +886,7 @@ Extracting refs at event arrival time doesn't work when: | |||
| 761 | 886 | ||
| 762 | **Solution:** Extract and match refs at push time via `find_matching_states()`. | 887 | **Solution:** Extract and match refs at push time via `find_matching_states()`. |
| 763 | 888 | ||
| 764 | ### 3. Bidirectional Waiting for PR Events | 889 | ### 5. Bidirectional Waiting for PR Events |
| 765 | 890 | ||
| 766 | PR events can arrive before or after git data: | 891 | PR events can arrive before or after git data: |
| 767 | - Event first → Wait for git push | 892 | - Event first → Wait for git push |
| @@ -769,26 +894,13 @@ PR events can arrive before or after git data: | |||
| 769 | 894 | ||
| 770 | **Solution:** `PrPurgatoryEntry.event: Option<Event>` with `None` = placeholder. | 895 | **Solution:** `PrPurgatoryEntry.event: Option<Event>` with `None` = placeholder. |
| 771 | 896 | ||
| 772 | ### 4. Sync Queue Debouncing | ||
| 773 | |||
| 774 | When events arrive in bursts (e.g., negentropy sync), we don't want to spawn a sync task for each event. | ||
| 775 | |||
| 776 | **Solution:** `enqueue_sync()` resets `attempt_count` and updates `next_attempt` if already queued. | ||
| 777 | |||
| 778 | ### 5. Domain Throttling with Queues | ||
| 779 | |||
| 780 | When a domain is throttled, we still want to eventually sync from it. | ||
| 781 | |||
| 782 | **Solution:** `ThrottleManager` maintains per-domain queues and processes them when capacity frees. | ||
| 783 | |||
| 784 | --- | 897 | --- |
| 785 | 898 | ||
| 786 | ## Related Documentation | 899 | ## Related Documentation |
| 787 | 900 | ||
| 788 | - [Inline Authorization](inline-authorization.md) - Why purgatory checking during authorization is essential | ||
| 789 | - [Architecture Overview](architecture.md) - Full system design | 901 | - [Architecture Overview](architecture.md) - Full system design |
| 790 | - [Background Sync](../how-to/purgatory-sync.md) - How to configure and monitor sync | 902 | - [GRASP-02 Proactive Sync](grasp-02-proactive-sync.md) - Relay-to-relay event sync with SyncLevel |
| 791 | - [Test Strategy](../reference/test-strategy.md) - How we test purgatory | 903 | - [GRASP-02 Purgatory Git Data Fetching](grasp-02-proactive-sync-purgatory-git-data.md) - Background git data hunting |
| 792 | 904 | ||
| 793 | --- | 905 | --- |
| 794 | 906 | ||