diff options
| -rw-r--r-- | docs/explanation/announcements-purgatory-design.md | 254 | ||||
| -rw-r--r-- | docs/explanation/announcements-purgatory-implementation.md | 296 | ||||
| -rw-r--r-- | docs/explanation/grasp-02-proactive-sync-purgatory-git-data.md | 67 | ||||
| -rw-r--r-- | docs/explanation/grasp-02-proactive-sync.md | 57 | ||||
| -rw-r--r-- | docs/explanation/purgatory-design.md | 520 |
5 files changed, 415 insertions, 779 deletions
diff --git a/docs/explanation/announcements-purgatory-design.md b/docs/explanation/announcements-purgatory-design.md deleted file mode 100644 index 009547b..0000000 --- a/docs/explanation/announcements-purgatory-design.md +++ /dev/null | |||
| @@ -1,254 +0,0 @@ | |||
| 1 | # Announcements Purgatory Design | ||
| 2 | |||
| 3 | ## Problem Statement | ||
| 4 | |||
| 5 | **Primary problem:** Serving announcement events alongside empty bare git repos misleads clients into thinking we host content. | ||
| 6 | |||
| 7 | When an announcement arrives, we must create the bare repo immediately (so git pushes can succeed). But if no git data ever arrives, we serve an empty repo and its announcement indefinitely. Clients see the announcement, try to clone, and get nothing. This is misleading. | ||
| 8 | |||
| 9 | **Secondary problem:** Sync downloads events for repos that may never have content. | ||
| 10 | |||
| 11 | Without purgatory, sync would fetch all L2/L3 events (patches, issues, etc.) for announcements that may never receive git data. This wastes bandwidth and creates orphaned events. | ||
| 12 | |||
| 13 | ## Solution Overview | ||
| 14 | |||
| 15 | New announcements go to **purgatory** instead of being immediately accepted: | ||
| 16 | |||
| 17 | 1. **Announcement arrives** - Create bare repo immediately, add announcement to purgatory | ||
| 18 | 2. **Git data arrives** - Promote announcement from purgatory to active (now served to clients) | ||
| 19 | 3. **No git data before expiry** - Delete bare repo, discard announcement (never served) | ||
| 20 | |||
| 21 | This ensures we only serve announcements for repos that actually have content. | ||
| 22 | |||
| 23 | ## Key Design Decisions | ||
| 24 | |||
| 25 | ### 1. Bare Repo Created Immediately | ||
| 26 | |||
| 27 | **Decision:** Create the bare git repo when announcement enters purgatory. | ||
| 28 | |||
| 29 | **Why:** Git pushes may arrive at any time. Without a repo, pushes fail. | ||
| 30 | |||
| 31 | **Consequence:** We allocate disk space for repos that may expire unused. Must delete repos on expiry. | ||
| 32 | |||
| 33 | ### 2. Git Data Triggers Promotion | ||
| 34 | |||
| 35 | **Decision:** Git data arrival promotes the announcement to active status. | ||
| 36 | |||
| 37 | **Why:** Git data proves the repository has content. State events alone don't prove content exists - they could reference empty repos. | ||
| 38 | |||
| 39 | **Where:** Promotion happens in the git receive path after successful push/fetch with data. | ||
| 40 | |||
| 41 | ### 3. Replacement Announcements Skip Purgatory | ||
| 42 | |||
| 43 | **Decision:** Announcements replacing an existing active announcement are accepted immediately. | ||
| 44 | |||
| 45 | **Why:** The repository is already proven active with content. | ||
| 46 | |||
| 47 | **How:** Check if active announcement exists for `(pubkey, identifier)` before routing to purgatory. | ||
| 48 | |||
| 49 | ### 4. Expiry Extension (Two Places) | ||
| 50 | |||
| 51 | **Decision:** Extend purgatory announcement expiry (reset the 30-minute protocol timer) in two scenarios: | ||
| 52 | |||
| 53 | | Trigger | Location | Why | | ||
| 54 | | ---------------------------- | ------------------------------------ | ----------------------------------- | | ||
| 55 | | State event arrives | `StatePolicy::process_state_event()` | Repo is actively receiving metadata | | ||
| 56 | | Git auth extends state event | `src/git/auth.rs` | Repo is actively receiving git data | | ||
| 57 | |||
| 58 | **Why:** Prevents premature expiry during slow sync operations or multi-step pushes. The protocol's 30-minute expiry is intended for abandoned repositories, not active ones receiving data. | ||
| 59 | |||
| 60 | ### 5. Authorization Must Check Purgatory Announcements | ||
| 61 | |||
| 62 | **Decision:** When validating state events or git operations, check purgatory announcements in addition to the database. | ||
| 63 | |||
| 64 | **Why:** State events and git pushes may arrive before git data promotes the announcement. They still need authorization from the announcement's maintainer set. | ||
| 65 | |||
| 66 | **Where:** `fetch_repository_data()` and related authorization functions must query both DB and purgatory. | ||
| 67 | |||
| 68 | ### 6. Sync Only State Events for Purgatory Announcements | ||
| 69 | |||
| 70 | **Decision:** Purgatory announcements trigger sync for state events only, not other L2/L3 events (patches, issues, PRs, etc.). | ||
| 71 | |||
| 72 | **Why:** Other L2/L3 events would be rejected anyway (no promoted announcement in DB). Syncing them wastes bandwidth and creates work for announcements that may never promote. | ||
| 73 | |||
| 74 | **How:** Sync uses a `SyncLevel` concept - `Full` for promoted repos, `StateOnly` for purgatory. On promotion, upgrade to `Full`. | ||
| 75 | |||
| 76 | ### 7. Soft Expiry Preserves Event Without Bare Repo | ||
| 77 | |||
| 78 | **Decision:** When a purgatory announcement expires (30 minutes per protocol spec), delete the bare repo but retain the announcement event for an extended period (e.g., 24h). | ||
| 79 | |||
| 80 | **Why the protocol specifies 30 minutes:** The grasp protocol defines a 30-minute expiry for announcement events to ensure clients don't indefinitely cache stale repository information. | ||
| 81 | |||
| 82 | **Why we implement soft expiry:** The protocol's 30-minute expiry creates a sync/storage problem. Without soft expiry, we'd either: | ||
| 83 | |||
| 84 | - Add expired announcements to `failed_events` and permanently reject future state events (losing potential revival when state events arrive late) | ||
| 85 | - Re-fetch the announcement event repeatedly on every sync cycle (wasting bandwidth and creating unnecessary sync traffic) | ||
| 86 | |||
| 87 | **Behavior during soft expiry:** | ||
| 88 | |||
| 89 | - Bare repo is deleted (saves disk space, respects protocol expiry) | ||
| 90 | - Announcement event retained in purgatory with `soft_expired` flag | ||
| 91 | - Sync continues requesting state events (same as active purgatory) | ||
| 92 | - If state event arrives: recreate bare repo, clear `soft_expired`, extend expiry | ||
| 93 | - If announcement republished directly to us: treat as fresh arrival | ||
| 94 | - After extended expiry: fully remove from purgatory | ||
| 95 | |||
| 96 | **In summary:** Soft expiry is an implementation optimization that prevents us from constantly re-syncing announcement events or permanently blocking repositories that receive delayed state events. | ||
| 97 | |||
| 98 | ## Data Structure | ||
| 99 | |||
| 100 | ```rust | ||
| 101 | // Key: (owner pubkey, identifier) - identifier alone is NOT unique | ||
| 102 | announcement_purgatory: Arc<DashMap<(PublicKey, String), AnnouncementPurgatoryEntry>> | ||
| 103 | |||
| 104 | pub struct AnnouncementPurgatoryEntry { | ||
| 105 | pub event: Event, | ||
| 106 | pub identifier: String, | ||
| 107 | pub owner: PublicKey, | ||
| 108 | pub repo_path: PathBuf, | ||
| 109 | pub relays: HashSet<String>, // For sync registration | ||
| 110 | pub created_at: Instant, | ||
| 111 | pub expires_at: Instant, | ||
| 112 | pub soft_expired: bool, // Bare repo deleted, event retained | ||
| 113 | } | ||
| 114 | ``` | ||
| 115 | |||
| 116 | **Indexed by `(pubkey, identifier)`** because identifier is not unique across different owners. Lookups are primarily from nostr events which have pubkey and identifier readily available. | ||
| 117 | |||
| 118 | ## Flows | ||
| 119 | |||
| 120 | ### New Announcement Flow | ||
| 121 | |||
| 122 | ``` | ||
| 123 | Announcement arrives | ||
| 124 | | | ||
| 125 | v | ||
| 126 | Is there an active announcement for (pubkey, identifier)? | ||
| 127 | | | ||
| 128 | +-- YES --> Accept immediately (replacement) | ||
| 129 | | | ||
| 130 | +-- NO --> Create bare repo | ||
| 131 | Add to purgatory | ||
| 132 | Return OK to client (but don't serve) | ||
| 133 | ``` | ||
| 134 | |||
| 135 | ### Git Data Arrival Flow | ||
| 136 | |||
| 137 | ``` | ||
| 138 | Git push/fetch completes with data | ||
| 139 | | | ||
| 140 | v | ||
| 141 | Is there a purgatory announcement for (pubkey, identifier)? | ||
| 142 | | | ||
| 143 | +-- YES --> Promote to active (move to database) | ||
| 144 | | Now served to clients | ||
| 145 | | | ||
| 146 | +-- NO --> Normal processing | ||
| 147 | ``` | ||
| 148 | |||
| 149 | ### State Event Arrival Flow | ||
| 150 | |||
| 151 | ``` | ||
| 152 | State event arrives | ||
| 153 | | | ||
| 154 | v | ||
| 155 | Is there an active announcement? | ||
| 156 | | | ||
| 157 | +-- YES --> Normal validation | ||
| 158 | | | ||
| 159 | +-- NO --> Check purgatory for announcement | ||
| 160 | | | ||
| 161 | +-- Found --> Validate against purgatory announcement | ||
| 162 | | Extend purgatory expiry | ||
| 163 | | State event goes to state purgatory | ||
| 164 | | | ||
| 165 | +-- Not found --> Reject or state purgatory | ||
| 166 | ``` | ||
| 167 | |||
| 168 | ## Edge Cases | ||
| 169 | |||
| 170 | | Scenario | Behavior | | ||
| 171 | | ------------------------------------------------------ | ------------------------------------------------------------------------------------------------------- | | ||
| 172 | | Git data before announcement | Push fails (no repo exists) | | ||
| 173 | | Announcement expires, no git data | Delete bare repo, set `soft_expired` flag, retain event for extended period | | ||
| 174 | | Soft-expired announcement fully expires | Remove from purgatory entirely | | ||
| 175 | | State event arrives for soft-expired announcement | Recreate bare repo, clear `soft_expired`, extend expiry | | ||
| 176 | | State expires, announcement in purgatory | Announcement keeps its own expiry | | ||
| 177 | | Multiple owners, same identifier | Each tracked separately by `(pubkey, identifier)` | | ||
| 178 | | **Newer announcement replaces older (same pubkey)** | Replace purgatory entry, extend expiry, and state event expiry | | ||
| 179 | | **Newer announcement changes services (unacceptable)** | Clear older announcement from purgatory, delete bare repo, remove state events from purgatory if exists | | ||
| 180 | | Deletion event for purgatory announcement | Remove from purgatory, delete bare repo | | ||
| 181 | |||
| 182 | ## Purgatory Lifecycle | ||
| 183 | |||
| 184 | An announcement progresses through purgatory states: | ||
| 185 | |||
| 186 | ``` | ||
| 187 | ┌─────────────────────────────────────┐ | ||
| 188 | │ │ | ||
| 189 | v │ | ||
| 190 | Announcement ──> ACTIVE ──────────────────────────────────┤ | ||
| 191 | arrives (bare repo exists) │ | ||
| 192 | │ │ | ||
| 193 | ├── Git data ──> PROMOTED (exit) │ | ||
| 194 | │ │ | ||
| 195 | ├── Deletion ──> REMOVED (exit) │ | ||
| 196 | │ │ | ||
| 197 | v │ | ||
| 198 | SOFT_EXPIRED ──────────────────────────────┘ | ||
| 199 | (bare repo deleted, ^ | ||
| 200 | event retained) │ | ||
| 201 | │ │ | ||
| 202 | ├── State event arrives (revival) | ||
| 203 | │ | ||
| 204 | └── Extended expiry ──> REMOVED (exit) | ||
| 205 | ``` | ||
| 206 | |||
| 207 | | Exit | Trigger | Action | | ||
| 208 | | ------------------ | -------------------------------------------- | --------------------------------------------- | | ||
| 209 | | **Promotion** | Git data arrives | Move to database, upgrade sync to Full | | ||
| 210 | | **Soft expiry** | Initial timeout | Delete bare repo, retain event, continue sync | | ||
| 211 | | **Full expiry** | Extended timeout (soft-expired) | Remove from purgatory entirely | | ||
| 212 | | **Deletion** | Kind 5 event | Delete bare repo, remove from purgatory | | ||
| 213 | | **Replacement** | Newer announcement (same pubkey, identifier) | Replace entry | | ||
| 214 | | **Service change** | Newer announcement removes our service | Remove from purgatory | | ||
| 215 | |||
| 216 | ## Integration Points | ||
| 217 | |||
| 218 | | File | Change | | ||
| 219 | | ---------------------------------- | ---------------------------------------------------------- | | ||
| 220 | | `src/purgatory/mod.rs` | Add `announcement_purgatory` store | | ||
| 221 | | `src/purgatory/types.rs` | Add `AnnouncementPurgatoryEntry` | | ||
| 222 | | `src/nostr/policy/announcement.rs` | Route new announcements to purgatory | | ||
| 223 | | `src/git/receive.rs` | Promote on git data arrival | | ||
| 224 | | `src/git/auth.rs` | Extend purgatory expiry when extending state event expiry | | ||
| 225 | | `src/git/authorization.rs` | Check purgatory announcements for maintainer authorization | | ||
| 226 | | `src/nostr/policy/state.rs` | Check purgatory for authorization | | ||
| 227 | | `src/sync/mod.rs` | Add `SyncLevel` to `RepoSyncNeeds` | | ||
| 228 | | `src/sync/filters.rs` | Respect sync level when building filters | | ||
| 229 | | `src/sync/self_subscriber.rs` | Register purgatory announcements with `StateOnly` level | | ||
| 230 | |||
| 231 | See [announcements-purgatory-implementation.md](./announcements-purgatory-implementation.md) for detailed implementation notes. | ||
| 232 | |||
| 233 | ## Testing | ||
| 234 | |||
| 235 | - Announcement to purgatory, git data promotes it | ||
| 236 | - Announcement soft-expires without git data (repo deleted, event retained) | ||
| 237 | - State event revives soft-expired announcement (repo recreated) | ||
| 238 | - Soft-expired announcement fully expires after extended period | ||
| 239 | - State event extends purgatory expiry | ||
| 240 | - Git auth extends purgatory expiry | ||
| 241 | - Newer announcement replaces older in purgatory | ||
| 242 | - Service change clears purgatory entry | ||
| 243 | - `(pubkey, identifier)` indexing with multiple owners | ||
| 244 | - Sync requests only state events for purgatory announcements | ||
| 245 | - Sync upgrades to full on promotion | ||
| 246 | |||
| 247 | ## Risks | ||
| 248 | |||
| 249 | | Risk | Mitigation | | ||
| 250 | | ------------------------------------ | ------------------------------------------------------ | | ||
| 251 | | Disk exhaustion from purgatory repos | Short expiry, soft expiry deletes repo early | | ||
| 252 | | Race between promotion and expiry | Atomic operations | | ||
| 253 | | Sync re-fetching expired events | Soft expiry retains event; no need for `failed_events` | | ||
| 254 | | Filter explosion from many purgatory | Existing consolidation handles this (threshold at 70) | | ||
diff --git a/docs/explanation/announcements-purgatory-implementation.md b/docs/explanation/announcements-purgatory-implementation.md deleted file mode 100644 index 263c253..0000000 --- a/docs/explanation/announcements-purgatory-implementation.md +++ /dev/null | |||
| @@ -1,296 +0,0 @@ | |||
| 1 | # Announcements Purgatory Implementation Details | ||
| 2 | |||
| 3 | This document provides detailed implementation notes for the [Announcements Purgatory Design](./announcements-purgatory-design.md). | ||
| 4 | |||
| 5 | ## Sync Integration | ||
| 6 | |||
| 7 | ### Current Sync Architecture | ||
| 8 | |||
| 9 | The sync system uses a two-index approach: | ||
| 10 | |||
| 11 | ```rust | ||
| 12 | // What we WANT to sync - source of truth from self-subscription | ||
| 13 | // Key: repo addressable ref (30617:pubkey:identifier) | ||
| 14 | pub type RepoSyncIndex = Arc<RwLock<HashMap<String, RepoSyncNeeds>>>; | ||
| 15 | |||
| 16 | pub struct RepoSyncNeeds { | ||
| 17 | pub relays: HashSet<String>, // Relay URLs from announcement | ||
| 18 | pub root_events: HashSet<EventId>, // 1617/1618/1621 event IDs | ||
| 19 | } | ||
| 20 | |||
| 21 | // What we have CONFIRMED syncing + connection state | ||
| 22 | // Key: relay URL | ||
| 23 | pub type RelaySyncIndex = Arc<RwLock<HashMap<String, RelayState>>>; | ||
| 24 | ``` | ||
| 25 | |||
| 26 | **Three-Layer Sync Strategy:** | ||
| 27 | 1. **Layer 1:** Announcements (kinds 30617, 10317) | ||
| 28 | 2. **Layer 2:** Repo-tagging events (events with `a`/`A`/`q` tags + kind 30618 by identifier) | ||
| 29 | 3. **Layer 3:** Root-event-tagging events (events with `e`/`E`/`q` tags) | ||
| 30 | |||
| 31 | ### Adding SyncLevel | ||
| 32 | |||
| 33 | Add a `sync_level` field to distinguish purgatory from promoted repos: | ||
| 34 | |||
| 35 | ```rust | ||
| 36 | #[derive(Debug, Clone, Copy, PartialEq, Eq, Default)] | ||
| 37 | pub enum SyncLevel { | ||
| 38 | #[default] | ||
| 39 | Full, // L2 + L3 (promoted repos) | ||
| 40 | StateOnly, // Only state events (purgatory announcements) | ||
| 41 | } | ||
| 42 | |||
| 43 | pub struct RepoSyncNeeds { | ||
| 44 | pub relays: HashSet<String>, | ||
| 45 | pub root_events: HashSet<EventId>, | ||
| 46 | pub sync_level: SyncLevel, // NEW | ||
| 47 | } | ||
| 48 | ``` | ||
| 49 | |||
| 50 | ### Filter Building Changes | ||
| 51 | |||
| 52 | In `src/sync/filters.rs`, modify filter building to respect sync level: | ||
| 53 | |||
| 54 | ```rust | ||
| 55 | // For StateOnly repos, only build state event filters | ||
| 56 | pub fn build_layer2_and_layer3_filters( | ||
| 57 | repos: &HashMap<String, RepoSyncNeeds>, | ||
| 58 | // ... | ||
| 59 | ) -> Vec<Filter> { | ||
| 60 | let (full_repos, state_only_repos): (Vec<_>, Vec<_>) = repos | ||
| 61 | .iter() | ||
| 62 | .partition(|(_, needs)| needs.sync_level == SyncLevel::Full); | ||
| 63 | |||
| 64 | let mut filters = Vec::new(); | ||
| 65 | |||
| 66 | // Full repos get all L2/L3 filters | ||
| 67 | if !full_repos.is_empty() { | ||
| 68 | filters.extend(tagged_one_of_our_repo_event_filters(&full_repos)); | ||
| 69 | filters.extend(state_event_filters_for_our_repos(&full_repos)); | ||
| 70 | filters.extend(tagged_one_of_our_root_event_filters(&full_repos)); | ||
| 71 | } | ||
| 72 | |||
| 73 | // StateOnly repos get only state event filters | ||
| 74 | if !state_only_repos.is_empty() { | ||
| 75 | filters.extend(state_event_filters_for_our_repos(&state_only_repos)); | ||
| 76 | } | ||
| 77 | |||
| 78 | filters | ||
| 79 | } | ||
| 80 | ``` | ||
| 81 | |||
| 82 | The existing `state_event_filters_for_our_repos()` function already builds kind 30618 filters with `#d` tags, which is exactly what we need. | ||
| 83 | |||
| 84 | ### Self-Subscriber Changes | ||
| 85 | |||
| 86 | In `src/sync/self_subscriber.rs`, add purgatory announcements to the sync index: | ||
| 87 | |||
| 88 | ```rust | ||
| 89 | // When announcement enters purgatory | ||
| 90 | fn on_announcement_to_purgatory( | ||
| 91 | &self, | ||
| 92 | event: &Event, | ||
| 93 | identifier: &str, | ||
| 94 | relays: HashSet<String>, | ||
| 95 | ) { | ||
| 96 | let key = format!("30617:{}:{}", event.pubkey, identifier); | ||
| 97 | let mut index = self.repo_sync_index.write().unwrap(); | ||
| 98 | index.insert(key, RepoSyncNeeds { | ||
| 99 | relays, | ||
| 100 | root_events: HashSet::new(), | ||
| 101 | sync_level: SyncLevel::StateOnly, | ||
| 102 | }); | ||
| 103 | } | ||
| 104 | |||
| 105 | // When announcement promotes to database | ||
| 106 | fn on_announcement_promoted( | ||
| 107 | &self, | ||
| 108 | event: &Event, | ||
| 109 | identifier: &str, | ||
| 110 | ) { | ||
| 111 | let key = format!("30617:{}:{}", event.pubkey, identifier); | ||
| 112 | let mut index = self.repo_sync_index.write().unwrap(); | ||
| 113 | if let Some(needs) = index.get_mut(&key) { | ||
| 114 | needs.sync_level = SyncLevel::Full; | ||
| 115 | } | ||
| 116 | } | ||
| 117 | ``` | ||
| 118 | |||
| 119 | ### Algorithm Changes | ||
| 120 | |||
| 121 | In `src/sync/algorithms.rs`, preserve sync level when inverting repo->relay: | ||
| 122 | |||
| 123 | ```rust | ||
| 124 | pub fn derive_relay_targets( | ||
| 125 | repo_index: &RepoSyncIndex, | ||
| 126 | ) -> HashMap<String, RelaySyncNeeds> { | ||
| 127 | // ... existing inversion logic ... | ||
| 128 | // Ensure sync_level is preserved/aggregated per relay | ||
| 129 | // A relay gets Full if ANY of its repos are Full | ||
| 130 | } | ||
| 131 | ``` | ||
| 132 | |||
| 133 | ## Authorization Integration | ||
| 134 | |||
| 135 | ### Current Authorization Flow | ||
| 136 | |||
| 137 | Authorization lookups happen in `src/git/authorization.rs`: | ||
| 138 | |||
| 139 | | Function | Purpose | Currently Queries | | ||
| 140 | |----------|---------|-------------------| | ||
| 141 | | `fetch_repository_data()` | Get announcements + states by identifier | DB only | | ||
| 142 | | `collect_authorized_maintainers()` | Build maintainer set from announcements | DB only | | ||
| 143 | | `pubkey_authorised_for_repo_owners()` | Check if pubkey authorized | DB only | | ||
| 144 | |||
| 145 | ### Required Changes | ||
| 146 | |||
| 147 | Modify `fetch_repository_data()` to also query purgatory: | ||
| 148 | |||
| 149 | ```rust | ||
| 150 | pub async fn fetch_repository_data( | ||
| 151 | db: &Database, | ||
| 152 | purgatory: &Purgatory, // NEW parameter | ||
| 153 | identifier: &str, | ||
| 154 | ) -> Result<RepositoryData> { | ||
| 155 | // Existing DB query | ||
| 156 | let db_events = db.query(/* kind 30617, 30618 by identifier */).await?; | ||
| 157 | |||
| 158 | // NEW: Also check purgatory for announcements | ||
| 159 | let purgatory_announcements = purgatory | ||
| 160 | .get_announcements_by_identifier(identifier); | ||
| 161 | |||
| 162 | // Merge results | ||
| 163 | let mut announcements = parse_announcements(db_events); | ||
| 164 | announcements.extend(purgatory_announcements); | ||
| 165 | |||
| 166 | // ... rest of function | ||
| 167 | } | ||
| 168 | ``` | ||
| 169 | |||
| 170 | This affects: | ||
| 171 | - `StatePolicy::process_state_event()` - state event validation | ||
| 172 | - `get_state_authorization_for_specific_owner_repo()` - git push authorization | ||
| 173 | - `AnnouncementPolicy::is_maintainer_in_any_announcement()` - maintainer exception | ||
| 174 | |||
| 175 | ## Purgatory Store Changes | ||
| 176 | |||
| 177 | ### New Fields | ||
| 178 | |||
| 179 | ```rust | ||
| 180 | pub struct AnnouncementPurgatoryEntry { | ||
| 181 | pub event: Event, | ||
| 182 | pub identifier: String, | ||
| 183 | pub owner: PublicKey, | ||
| 184 | pub repo_path: PathBuf, | ||
| 185 | pub relays: HashSet<String>, // For sync registration | ||
| 186 | pub created_at: Instant, | ||
| 187 | pub expires_at: Instant, | ||
| 188 | pub soft_expired: bool, // Bare repo deleted, event retained | ||
| 189 | } | ||
| 190 | ``` | ||
| 191 | |||
| 192 | ### New Methods | ||
| 193 | |||
| 194 | ```rust | ||
| 195 | impl Purgatory { | ||
| 196 | /// Get announcements by identifier (for authorization) | ||
| 197 | pub fn get_announcements_by_identifier( | ||
| 198 | &self, | ||
| 199 | identifier: &str, | ||
| 200 | ) -> Vec<&AnnouncementPurgatoryEntry> { | ||
| 201 | self.announcement_purgatory | ||
| 202 | .iter() | ||
| 203 | .filter(|entry| entry.identifier == identifier) | ||
| 204 | .collect() | ||
| 205 | } | ||
| 206 | |||
| 207 | /// Transition to soft-expired state (protocol's 30min expiry reached) | ||
| 208 | pub fn soft_expire_announcement( | ||
| 209 | &self, | ||
| 210 | key: &(PublicKey, String), | ||
| 211 | ) -> Option<PathBuf> { | ||
| 212 | if let Some(mut entry) = self.announcement_purgatory.get_mut(key) { | ||
| 213 | entry.soft_expired = true; | ||
| 214 | entry.expires_at = Instant::now() + SOFT_EXPIRY_DURATION; // e.g., 24h extended retention | ||
| 215 | Some(entry.repo_path.clone()) // Return path for bare repo deletion | ||
| 216 | } else { | ||
| 217 | None | ||
| 218 | } | ||
| 219 | } | ||
| 220 | |||
| 221 | /// Revive soft-expired announcement when state event arrives | ||
| 222 | /// (caller must recreate bare repo) | ||
| 223 | pub fn revive_announcement( | ||
| 224 | &self, | ||
| 225 | key: &(PublicKey, String), | ||
| 226 | ) -> Option<PathBuf> { | ||
| 227 | if let Some(mut entry) = self.announcement_purgatory.get_mut(key) { | ||
| 228 | if entry.soft_expired { | ||
| 229 | entry.soft_expired = false; | ||
| 230 | entry.expires_at = Instant::now() + ACTIVE_EXPIRY_DURATION; // Reset 30min protocol timer | ||
| 231 | return Some(entry.repo_path.clone()); // Caller recreates bare repo | ||
| 232 | } | ||
| 233 | } | ||
| 234 | None | ||
| 235 | } | ||
| 236 | } | ||
| 237 | ``` | ||
| 238 | |||
| 239 | ## Expiry Cleanup Task | ||
| 240 | |||
| 241 | The existing cleanup task needs to handle the two-phase expiry: | ||
| 242 | |||
| 243 | ```rust | ||
| 244 | async fn cleanup_expired_announcements(&self) { | ||
| 245 | let now = Instant::now(); | ||
| 246 | |||
| 247 | for entry in self.announcement_purgatory.iter() { | ||
| 248 | if entry.expires_at <= now { | ||
| 249 | let key = (entry.owner.clone(), entry.identifier.clone()); | ||
| 250 | |||
| 251 | if entry.soft_expired { | ||
| 252 | // Fully expired - remove entirely | ||
| 253 | self.announcement_purgatory.remove(&key); | ||
| 254 | self.unregister_from_sync(&key); | ||
| 255 | } else { | ||
| 256 | // First expiry - transition to soft-expired | ||
| 257 | if let Some(repo_path) = self.soft_expire_announcement(&key) { | ||
| 258 | delete_bare_repo(&repo_path).await; | ||
| 259 | } | ||
| 260 | // Note: stays in sync index with StateOnly level | ||
| 261 | } | ||
| 262 | } | ||
| 263 | } | ||
| 264 | } | ||
| 265 | ``` | ||
| 266 | |||
| 267 | ## State Event Revival Flow | ||
| 268 | |||
| 269 | When a state event arrives for a soft-expired announcement, the state policy must: | ||
| 270 | |||
| 271 | 1. Check purgatory for a matching announcement (in addition to DB) | ||
| 272 | 2. Validate authorization against the purgatory announcement | ||
| 273 | 3. If soft-expired, call `revive_announcement()` and recreate the bare repo | ||
| 274 | 4. Extend the announcement's expiry (reset the 30-minute protocol timer) | ||
| 275 | 5. Route the state event to state purgatory | ||
| 276 | |||
| 277 | **Why revival is necessary:** Without soft expiry + revival, late-arriving state events would either be permanently rejected (if we added the announcement to `failed_events`) or cause constant re-syncing of the announcement event. Revival allows us to respect the protocol's 30-minute expiry while still handling delayed state events gracefully. | ||
| 278 | |||
| 279 | The exact integration will depend on the current structure of `StatePolicy::process_state_event()` - see implementation phase for details. | ||
| 280 | |||
| 281 | ## File Change Summary | ||
| 282 | |||
| 283 | | File | Estimated Lines | Changes | | ||
| 284 | |------|-----------------|---------| | ||
| 285 | | `src/sync/mod.rs` | ~10 | Add `SyncLevel` enum, field to `RepoSyncNeeds` | | ||
| 286 | | `src/sync/filters.rs` | ~20 | Partition repos by sync level, build appropriate filters | | ||
| 287 | | `src/sync/algorithms.rs` | ~15 | Preserve sync level in relay target derivation | | ||
| 288 | | `src/sync/self_subscriber.rs` | ~40 | Register purgatory announcements, handle promotion | | ||
| 289 | | `src/purgatory/mod.rs` | ~80 | Add announcement store, soft expiry methods | | ||
| 290 | | `src/purgatory/types.rs` | ~20 | Add `AnnouncementPurgatoryEntry` | | ||
| 291 | | `src/git/authorization.rs` | ~30 | Query purgatory in `fetch_repository_data()` | | ||
| 292 | | `src/nostr/policy/state.rs` | ~40 | Handle soft-expired revival | | ||
| 293 | | `src/nostr/policy/announcement.rs` | ~30 | Route to purgatory, check for replacements | | ||
| 294 | | `src/git/receive.rs` | ~20 | Trigger promotion on git data | | ||
| 295 | |||
| 296 | **Total: ~305 lines of changes** | ||
diff --git a/docs/explanation/grasp-02-proactive-sync-purgatory-git-data.md b/docs/explanation/grasp-02-proactive-sync-purgatory-git-data.md index 31c3e46..8fb5798 100644 --- a/docs/explanation/grasp-02-proactive-sync-purgatory-git-data.md +++ b/docs/explanation/grasp-02-proactive-sync-purgatory-git-data.md | |||
| @@ -12,7 +12,13 @@ | |||
| 12 | 12 | ||
| 13 | ## Overview | 13 | ## Overview |
| 14 | 14 | ||
| 15 | When Nostr events arrive before their git data, they enter **purgatory** waiting to be served. But they don't wait passively—ngit-grasp **actively hunts** for the missing git data across all git servers assoicated with the repo until it finds what it needs. | 15 | When Nostr events arrive before their git data, they enter **purgatory** waiting to be served. But they don't wait passively—ngit-grasp **actively hunts** for the missing git data across all git servers associated with the repo until it finds what it needs. |
| 16 | |||
| 17 | This applies to three types of purgatory entries: | ||
| 18 | |||
| 19 | - **Announcement purgatory** — kind 30617 announcements waiting for a git push to prove the repo has content | ||
| 20 | - **State event purgatory** — kind 30618 state events waiting for their referenced git objects | ||
| 21 | - **PR event purgatory** — kind 1617/1618 PR events waiting for their referenced commits | ||
| 16 | 22 | ||
| 17 | ### How It Works | 23 | ### How It Works |
| 18 | 24 | ||
| @@ -42,6 +48,7 @@ We respect remote server capacity with: | |||
| 42 | ✅ **Respectful throttling** - 5 concurrent + 30/min per domain, plays nice with other implementations | 48 | ✅ **Respectful throttling** - 5 concurrent + 30/min per domain, plays nice with other implementations |
| 43 | ✅ **Smart timing** - 3min delay for user pushes, 500ms for synced events | 49 | ✅ **Smart timing** - 3min delay for user pushes, 500ms for synced events |
| 44 | ✅ **30min expiry** - Auto-cleanup of events when data never arrives | 50 | ✅ **30min expiry** - Auto-cleanup of events when data never arrives |
| 51 | ✅ **Soft expiry for announcements** - Bare repo deleted at 30min, event retained 24h to allow revival | ||
| 45 | ✅ **Fully testable** - Mock-based architecture for reliable unit tests | 52 | ✅ **Fully testable** - Mock-based architecture for reliable unit tests |
| 46 | 53 | ||
| 47 | --- | 54 | --- |
| @@ -73,6 +80,16 @@ Timeline D: Data never arrives | |||
| 73 | t=60s: Retry → all servers checked, no data | 80 | t=60s: Retry → all servers checked, no data |
| 74 | ... | 81 | ... |
| 75 | t=1800s: 30 minutes expired → event discarded, purgatory cleaned up 🗑️ | 82 | t=1800s: 30 minutes expired → event discarded, purgatory cleaned up 🗑️ |
| 83 | |||
| 84 | Timeline E: Announcement purgatory (no git data within 30 min) | ||
| 85 | t=0s: Announcement received → bare repo created, enters announcement purgatory | ||
| 86 | t=0.5s: Start hunting git servers for any content | ||
| 87 | ... | ||
| 88 | t=1800s: 30 minutes expired → bare repo deleted, event retained (soft_expired=true) | ||
| 89 | t=3600s: State event arrives (slow sync) → bare repo recreated, expiry reset ✅ | ||
| 90 | t=5400s: Git push arrives → announcement promoted to DB, served to clients ✅ | ||
| 91 | OR | ||
| 92 | t=86400s: 24 hours elapsed, no revival → event added to expired_events, removed 🗑️ | ||
| 76 | ``` | 93 | ``` |
| 77 | 94 | ||
| 78 | **Without proactive sync**: Events in Timeline C would wait indefinitely (or until manual git push). | 95 | **Without proactive sync**: Events in Timeline C would wait indefinitely (or until manual git push). |
| @@ -330,11 +347,11 @@ Both methods check `has_capacity()` and trigger `try_process_next()` if true. | |||
| 330 | 347 | ||
| 331 | --- | 348 | --- |
| 332 | 349 | ||
| 333 | ## 30-Minute Purgatory Expiry | 350 | ## Purgatory Expiry |
| 334 | 351 | ||
| 335 | Purgatory entries **automatically expire** after 30 minutes to prevent unbounded memory growth. | 352 | ### State and PR Events: 30-Minute Hard Expiry |
| 336 | 353 | ||
| 337 | ### Why 30 Minutes? | 354 | State and PR purgatory entries **automatically expire** after 30 minutes. |
| 338 | 355 | ||
| 339 | From the [GRASP-01 spec](https://github.com/DanConwayDev/grasp/blob/main/01.md#purgatory): | 356 | From the [GRASP-01 spec](https://github.com/DanConwayDev/grasp/blob/main/01.md#purgatory): |
| 340 | 357 | ||
| @@ -346,25 +363,40 @@ This balances: | |||
| 346 | - 🧹 **Short enough** to prevent memory leaks from abandoned events | 363 | - 🧹 **Short enough** to prevent memory leaks from abandoned events |
| 347 | - 🔄 **Recoverable** events are still on other relays and can be re-submitted | 364 | - 🔄 **Recoverable** events are still on other relays and can be re-submitted |
| 348 | 365 | ||
| 349 | ### Implementation | 366 | Each entry tracks `expires_at: Instant` (30 min from creation). The sync loop checks expiry before processing via `has_pending_events()`. If all events for an identifier have expired, the identifier is removed from the sync queue. |
| 350 | 367 | ||
| 351 | Each purgatory entry tracks: | 368 | To prevent infinite re-sync loops, expired event IDs are added to an `expired_events` set. If a sync delivers an event that previously expired, it is rejected with `"previously expired from purgatory without git data"`. |
| 352 | 369 | ||
| 353 | - `created_at: Instant` - When added to purgatory | 370 | **Implementation**: [`src/purgatory/mod.rs:DEFAULT_EXPIRY`](../../src/purgatory/mod.rs) |
| 354 | - `expires_at: Instant` - When to discard (created_at + 30min) | ||
| 355 | 371 | ||
| 356 | The main sync loop checks expiry before processing: | 372 | ### Announcement Purgatory: Two-Phase Soft Expiry |
| 357 | 373 | ||
| 358 | ```rust | 374 | Announcements use a different expiry strategy because they have an additional concern: the bare git repo created on arrival must be cleaned up, but we also need to avoid re-syncing the announcement event on every sync cycle. |
| 359 | if !self.has_pending_events(&identifier) { | ||
| 360 | // No events remain (expired or released) → remove from sync queue | ||
| 361 | self.sync_queue.remove(&identifier); | ||
| 362 | } | ||
| 363 | ``` | ||
| 364 | 375 | ||
| 365 | **Note**: Expiry is checked implicitly via `has_pending_events()`. If all events for an identifier have expired, the identifier is removed from the sync queue. | 376 | **Phase 1 — Initial 30-minute expiry:** |
| 366 | 377 | ||
| 367 | **Implementation**: [`src/purgatory/mod.rs:DEFAULT_EXPIRY`](../../src/purgatory/mod.rs) | 378 | - Delete the bare git repo (frees disk space, respects the protocol's 30-minute expiry) |
| 379 | - Set `soft_expired = true` on the entry | ||
| 380 | - Extend `expires_at` by **24 hours** (`SOFT_EXPIRY_EXTENDED`) | ||
| 381 | - Continue syncing state events for this repo (same as active purgatory) | ||
| 382 | |||
| 383 | **Phase 2 — 24-hour soft expiry:** | ||
| 384 | |||
| 385 | - Add event ID to `expired_events` (prevents re-sync loops) | ||
| 386 | - Remove entry completely from `announcement_purgatory` | ||
| 387 | |||
| 388 | **Why not just hard-expire at 30 minutes?** | ||
| 389 | |||
| 390 | The protocol's 30-minute expiry creates a dilemma for announcements: | ||
| 391 | |||
| 392 | - **Option A: Add to `failed_events` at 30 min** → Permanently rejects future state events, losing potential revival when state events arrive late (e.g. from a slow sync) | ||
| 393 | - **Option B: Remove entirely at 30 min** → The announcement gets re-fetched on every subsequent sync cycle, wasting bandwidth indefinitely | ||
| 394 | |||
| 395 | Soft expiry is the solution: the bare repo is deleted at 30 minutes (respecting the protocol), but the event is retained for 24 hours. During this window, a late-arriving state event can **revive** the announcement—`extend_announcement_expiry()` recreates the bare repo, clears `soft_expired`, and resets the 30-minute timer. After 24 hours with no revival, the event is added to `expired_events` and fully removed. | ||
| 396 | |||
| 397 | **Why 24 hours specifically?** This covers the worst-case sync delay. A relay that was offline for up to 24 hours will re-sync state events when it reconnects. The 24-hour window ensures announcements remain revivable throughout that period without permanently occupying disk space. | ||
| 398 | |||
| 399 | **Implementation**: [`src/purgatory/mod.rs:SOFT_EXPIRY_EXTENDED`](../../src/purgatory/mod.rs) | ||
| 368 | 400 | ||
| 369 | --- | 401 | --- |
| 370 | 402 | ||
| @@ -670,6 +702,7 @@ The purgatory sync system is a sophisticated, production-ready implementation th | |||
| 670 | ✅ **Throttles respectfully** - 5 concurrent + 30/min per domain, round-robin fairness | 702 | ✅ **Throttles respectfully** - 5 concurrent + 30/min per domain, round-robin fairness |
| 671 | ✅ **Times strategically** - 3min for user events, 500ms for synced events | 703 | ✅ **Times strategically** - 3min for user events, 500ms for synced events |
| 672 | ✅ **Expires responsibly** - 30min auto-cleanup prevents memory leaks | 704 | ✅ **Expires responsibly** - 30min auto-cleanup prevents memory leaks |
| 705 | ✅ **Soft-expires announcements** - Bare repo deleted at 30min, event retained 24h for revival | ||
| 673 | ✅ **Tests thoroughly** - Mock-based architecture enables comprehensive unit tests | 706 | ✅ **Tests thoroughly** - Mock-based architecture enables comprehensive unit tests |
| 674 | 707 | ||
| 675 | This design ensures ngit-grasp can serve repositories reliably even when git data and Nostr events arrive out-of-order or from different sources, while respecting remote server capacity and providing excellent observability. | 708 | This design ensures ngit-grasp can serve repositories reliably even when git data and Nostr events arrive out-of-order or from different sources, while respecting remote server capacity and providing excellent observability. |
diff --git a/docs/explanation/grasp-02-proactive-sync.md b/docs/explanation/grasp-02-proactive-sync.md index ed8fdbf..6696e27 100644 --- a/docs/explanation/grasp-02-proactive-sync.md +++ b/docs/explanation/grasp-02-proactive-sync.md | |||
| @@ -47,20 +47,37 @@ This state starts afresh when the binary loads. | |||
| 47 | ### RepoSyncIndex (Source of Truth) | 47 | ### RepoSyncIndex (Source of Truth) |
| 48 | 48 | ||
| 49 | ```rust | 49 | ```rust |
| 50 | /// What we WANT to sync - derived from events received via self-subscription. | 50 | /// What we WANT to sync - derived from events received via self-subscription |
| 51 | /// Updated immediately when self-subscriber batch fires. | 51 | /// and from purgatory announcements. |
| 52 | /// Updated immediately when self-subscriber batch fires or purgatory sync timer runs. | ||
| 52 | /// Key: repo addressable ref - 30617:pubkey:identifier | 53 | /// Key: repo addressable ref - 30617:pubkey:identifier |
| 53 | pub type RepoSyncIndex = Arc<RwLock<HashMap<String, RepoSyncNeeds>>>; | 54 | pub type RepoSyncIndex = Arc<RwLock<HashMap<String, RepoSyncNeeds>>>; |
| 54 | 55 | ||
| 56 | /// Controls which sync filters are built for a repo | ||
| 57 | #[derive(Debug, Clone, Copy, PartialEq, Eq, Default)] | ||
| 58 | pub enum SyncLevel { | ||
| 59 | #[default] | ||
| 60 | Full, // Full L2 + L3 sync (promoted repos with git data) | ||
| 61 | StateOnly, // Only state events (kind 30618) — for purgatory announcements | ||
| 62 | } | ||
| 63 | |||
| 55 | #[derive(Debug, Clone, Default)] | 64 | #[derive(Debug, Clone, Default)] |
| 56 | pub struct RepoSyncNeeds { | 65 | pub struct RepoSyncNeeds { |
| 57 | /// Relay URLs listed in this repo's 30617 announcement | 66 | /// Relay URLs listed in this repo's 30617 announcement |
| 58 | pub relays: HashSet<String>, | 67 | pub relays: HashSet<String>, |
| 59 | /// Root event IDs - 1617/1618/1621 - that reference this repo | 68 | /// Root event IDs - 1617/1618/1621 - that reference this repo |
| 60 | pub root_events: HashSet<EventId>, | 69 | pub root_events: HashSet<EventId>, |
| 70 | /// Controls which filters are built: Full (L2+L3) or StateOnly (kind 30618 only) | ||
| 71 | pub sync_level: SyncLevel, | ||
| 61 | } | 72 | } |
| 62 | ``` | 73 | ``` |
| 63 | 74 | ||
| 75 | **Two sources populate `RepoSyncIndex`:** | ||
| 76 | |||
| 77 | 1. **`SelfSubscriber`** — monitors the relay's own event stream for accepted announcements (kinds 30617, 1617, 1618, 1621). Adds entries with `SyncLevel::Full`. When an announcement is promoted from purgatory to the database, the SelfSubscriber sees it and upgrades the entry to `Full`. | ||
| 78 | |||
| 79 | 2. **Purgatory announcement sync timer** (`run_purgatory_announcement_sync`, every 5 seconds) — iterates `purgatory.announcements_for_sync()` and ensures each purgatory announcement has a `SyncLevel::StateOnly` entry in `RepoSyncIndex`. This is the only registration path for purgatory announcements because they are not saved to the database and therefore never seen by the SelfSubscriber. | ||
| 80 | |||
| 64 | ### RelaySyncIndex (Confirmed State + Connection) | 81 | ### RelaySyncIndex (Confirmed State + Connection) |
| 65 | 82 | ||
| 66 | ```rust | 83 | ```rust |
| @@ -336,7 +353,23 @@ The sync system uses three background tasks that run continuously: | |||
| 336 | 353 | ||
| 337 | 1. Queue events to `PendingUpdates` | 354 | 1. Queue events to `PendingUpdates` |
| 338 | 2. Timer fires (interval, does not reset on events) | 355 | 2. Timer fires (interval, does not reset on events) |
| 339 | 3. Process batch: update RepoSyncIndex → derive targets → send AddFilters to SyncManager | 356 | 3. Process batch: update RepoSyncIndex with `SyncLevel::Full` → derive targets → send AddFilters to SyncManager |
| 357 | |||
| 358 | **Note**: The SelfSubscriber only sees announcements that have been accepted to the database (promoted from purgatory). Purgatory announcements are registered separately by the purgatory sync timer (see below). | ||
| 359 | |||
| 360 | ### 4. Purgatory Announcement Sync Timer (`run_purgatory_announcement_sync`) | ||
| 361 | |||
| 362 | **Purpose**: Register purgatory announcements in `RepoSyncIndex` so state events are synced for them | ||
| 363 | |||
| 364 | **Interval**: Every 5 seconds (200ms in test mode) | ||
| 365 | |||
| 366 | **Flow**: | ||
| 367 | |||
| 368 | 1. Iterate `purgatory.announcements_for_sync()` | ||
| 369 | 2. For each announcement not already in `RepoSyncIndex`: insert with `SyncLevel::StateOnly` | ||
| 370 | 3. When an announcement is promoted (git data arrives), the SelfSubscriber sees the newly accepted event and upgrades the entry to `SyncLevel::Full` | ||
| 371 | |||
| 372 | **Why a separate timer?** Purgatory announcements are never saved to the database, so the SelfSubscriber never sees them. The timer bridges this gap, ensuring state events are synced for repos that may still receive git data. | ||
| 340 | 373 | ||
| 341 | --- | 374 | --- |
| 342 | 375 | ||
| @@ -602,9 +635,10 @@ flowchart TB | |||
| 602 | 635 | ||
| 603 | - Self-subscriber monitors own relay for 30617, 1617, 1618, 1621 (NOT 1619 or 30618) | 636 | - Self-subscriber monitors own relay for 30617, 1617, 1618, 1621 (NOT 1619 or 30618) |
| 604 | - Batches events in `PendingUpdates` (5 second window via interval timer) | 637 | - Batches events in `PendingUpdates` (5 second window via interval timer) |
| 605 | - `process_batch()` updates RepoSyncIndex, then builds AddFilters **directly** (no compute_actions) | 638 | - `process_batch()` updates RepoSyncIndex with `SyncLevel::Full`, then builds AddFilters **directly** (no compute_actions) |
| 606 | - AddFilters sent via channel to SyncManager, which calls `handle_new_sync_filters()` | 639 | - AddFilters sent via channel to SyncManager, which calls `handle_new_sync_filters()` |
| 607 | - This path does NOT use compute_actions because it's building fresh filters from the updated index | 640 | - This path does NOT use compute_actions because it's building fresh filters from the updated index |
| 641 | - Purgatory announcements (not in DB) are registered separately by the purgatory sync timer with `SyncLevel::StateOnly` | ||
| 608 | 642 | ||
| 609 | --- | 643 | --- |
| 610 | 644 | ||
| @@ -687,16 +721,23 @@ fn compute_actions( | |||
| 687 | - **Tags**: lowercase `a`, uppercase `A`, and `q` tags for comprehensive coverage | 721 | - **Tags**: lowercase `a`, uppercase `A`, and `q` tags for comprehensive coverage |
| 688 | - **Batching**: Per 100 repo refs | 722 | - **Batching**: Per 100 repo refs |
| 689 | - **Function**: `build_repo_tag_filters(repos, since)` | 723 | - **Function**: `build_repo_tag_filters(repos, since)` |
| 724 | - **Only for `SyncLevel::Full` repos** — purgatory announcements (`StateOnly`) skip this layer | ||
| 690 | 725 | ||
| 691 | ### Layer 3: Events Tagging Our Root Events | 726 | ### Layer 3: Events Tagging Our Root Events |
| 692 | 727 | ||
| 693 | - **Tags**: lowercase `e`, uppercase `E`, and `q` tags for comprehensive coverage | 728 | - **Tags**: lowercase `e`, uppercase `E`, and `q` tags for comprehensive coverage |
| 694 | - **Batching**: Per 100 event IDs | 729 | - **Batching**: Per 100 event IDs |
| 695 | - **Function**: `build_root_event_tag_filters(root_events, since)` | 730 | - **Function**: `build_root_event_tag_filters(root_events, since)` |
| 731 | - **Only for `SyncLevel::Full` repos** — purgatory announcements (`StateOnly`) skip this layer | ||
| 732 | |||
| 733 | ### Combined Layer 2+3 (SyncLevel-Aware) | ||
| 734 | |||
| 735 | The `build_sync_level_aware_filters()` function combines both layers, partitioning repos by `SyncLevel`: | ||
| 696 | 736 | ||
| 697 | ### Combined Layer 2+3 | 737 | - **`Full` repos**: state event filters + repo-tag filters + root-event-tag filters |
| 738 | - **`StateOnly` repos**: state event filters only (kind 30618 with `#d` tags) | ||
| 698 | 739 | ||
| 699 | The `build_layer2_and_layer3_filters()` function combines both layers. Used by: | 740 | Used by: |
| 700 | 741 | ||
| 701 | - `recompute_new_sync_filters_for_relay` for new item subscriptions | 742 | - `recompute_new_sync_filters_for_relay` for new item subscriptions |
| 702 | - `reconstruct_filters` for rebuilding from confirmed state | 743 | - `reconstruct_filters` for rebuilding from confirmed state |
| @@ -871,9 +912,9 @@ flowchart TB | |||
| 871 | 912 | ||
| 872 | ``` | 913 | ``` |
| 873 | src/sync/ | 914 | src/sync/ |
| 874 | ├── mod.rs # SyncManager, main loop, data structures | 915 | ├── mod.rs # SyncManager, main loop, data structures, SyncLevel, run_purgatory_announcement_sync |
| 875 | ├── algorithms.rs # derive_relay_targets(), compute_actions() | 916 | ├── algorithms.rs # derive_relay_targets(), compute_actions() |
| 876 | ├── filters.rs # build_announcement_filter(), build_layer2_and_layer3_filters() | 917 | ├── filters.rs # build_announcement_filter(), build_sync_level_aware_filters() |
| 877 | ├── health.rs # RelayHealthTracker with exponential backoff | 918 | ├── health.rs # RelayHealthTracker with exponential backoff |
| 878 | ├── relay_connection.rs # RelayConnection, RelayEvent handling | 919 | ├── relay_connection.rs # RelayConnection, RelayEvent handling |
| 879 | ├── self_subscriber.rs # SelfSubscriber with batching | 920 | ├── self_subscriber.rs # SelfSubscriber with batching |
diff --git a/docs/explanation/purgatory-design.md b/docs/explanation/purgatory-design.md index b984745..bd792d4 100644 --- a/docs/explanation/purgatory-design.md +++ b/docs/explanation/purgatory-design.md | |||
| @@ -8,7 +8,11 @@ | |||
| 8 | 8 | ||
| 9 | ## Overview | 9 | ## Overview |
| 10 | 10 | ||
| 11 | Purgatory is an in-memory holding area that solves the **"which arrives first?"** problem in GRASP. Either nostr events or git pushes can arrive in any order: | 11 | Purgatory is an in-memory holding area that solves two related problems in GRASP: |
| 12 | |||
| 13 | ### Problem 1: "Which arrives first?" (State and PR events) | ||
| 14 | |||
| 15 | Either nostr events or git pushes can arrive in any order: | ||
| 12 | 16 | ||
| 13 | - **Event first**: Event waits in purgatory until git data arrives | 17 | - **Event first**: Event waits in purgatory until git data arrives |
| 14 | - **Git first**: Placeholder waits in purgatory until event arrives | 18 | - **Git first**: Placeholder waits in purgatory until event arrives |
| @@ -19,6 +23,18 @@ When both halves arrive, they are processed together and saved to the database. | |||
| 19 | 23 | ||
| 20 | > Accepted repo state announcements, PRs and PR Updates SHOULD be accepted with message "purgatory: won't be served until git data arrives" and kept in purgatory (not served) until the related git data arrives and otherwise discarded after 30 minutes. | 24 | > Accepted repo state announcements, PRs and PR Updates SHOULD be accepted with message "purgatory: won't be served until git data arrives" and kept in purgatory (not served) until the related git data arrives and otherwise discarded after 30 minutes. |
| 21 | 25 | ||
| 26 | ### Problem 2: Misleading empty repository announcements | ||
| 27 | |||
| 28 | When a repository announcement arrives, we must create the bare git repo immediately so pushes can succeed. But if no git data ever arrives, we would serve an empty repo and its announcement indefinitely—clients see the announcement, try to clone, and get nothing. | ||
| 29 | |||
| 30 | **Solution**: New announcements go to **announcement purgatory** instead of being immediately accepted: | ||
| 31 | |||
| 32 | 1. **Announcement arrives** → Create bare repo immediately, add announcement to purgatory | ||
| 33 | 2. **Git data arrives** → Promote announcement from purgatory to active (now served to clients) | ||
| 34 | 3. **No git data before expiry** → Delete bare repo, discard announcement (never served) | ||
| 35 | |||
| 36 | This ensures we only serve announcements for repos that actually have content. | ||
| 37 | |||
| 22 | --- | 38 | --- |
| 23 | 39 | ||
| 24 | ## Key Design Principles | 40 | ## Key Design Principles |
| @@ -31,16 +47,15 @@ Purgatory data is **not persisted** to disk. On restart, all purgatory entries a | |||
| 31 | - Git data can be re-pushed | 47 | - Git data can be re-pushed |
| 32 | - 30-minute expiry means data is transient anyway | 48 | - 30-minute expiry means data is transient anyway |
| 33 | 49 | ||
| 34 | ### 2. Separate Storage for State vs PR Events | 50 | ### 2. Separate Storage for Each Event Type |
| 35 | |||
| 36 | State events (kind 30618) and PR events (kind 1617/1618) have fundamentally different matching patterns: | ||
| 37 | 51 | ||
| 38 | | Event Type | Index | Matching Strategy | | 52 | | Store | Index | Purpose | |
| 39 | |------------|-------|-------------------| | 53 | |-------|-------|---------| |
| 40 | | **State Events** | `identifier` (d tag) | Compare refs at push time | | 54 | | `announcement_purgatory` | `(PublicKey, String)` — `(owner, identifier)` | Announcements awaiting git data | |
| 41 | | **PR Events** | `event_id` (hex string) | Direct match via `refs/nostr/<event-id>` | | 55 | | `state_events` | `identifier` (d tag) | State events awaiting git data | |
| 56 | | `pr_events` | `event_id` (hex string) | PR events awaiting git data | | ||
| 42 | 57 | ||
| 43 | They use **separate DashMap stores** for efficient concurrent access. | 58 | Announcement purgatory uses `(pubkey, identifier)` because identifier alone is not unique across different owners. |
| 44 | 59 | ||
| 45 | ### 3. Late Binding for State Events | 60 | ### 3. Late Binding for State Events |
| 46 | 61 | ||
| @@ -78,7 +93,23 @@ With purgatory checking during authorization: | |||
| 78 | 2. Git push arrives → Checks **database + purgatory** → State found → **AUTHORIZED** ✅ | 93 | 2. Git push arrives → Checks **database + purgatory** → State found → **AUTHORIZED** ✅ |
| 79 | 3. After push succeeds → Save event to database → Remove from purgatory | 94 | 3. After push succeeds → Save event to database → Remove from purgatory |
| 80 | 95 | ||
| 81 | See [`src/git/authorization.rs:51-162`](../../src/git/authorization.rs) for implementation. | 96 | See [`src/git/authorization.rs`](../../src/git/authorization.rs) for implementation. |
| 97 | |||
| 98 | ### 6. Announcement Purgatory: Bare Repo Created Immediately | ||
| 99 | |||
| 100 | **Decision:** Create the bare git repo when announcement enters purgatory. | ||
| 101 | |||
| 102 | **Why:** Git pushes may arrive at any time. Without a repo, pushes fail. | ||
| 103 | |||
| 104 | **Consequence:** We allocate disk space for repos that may expire unused. Must delete repos on expiry. | ||
| 105 | |||
| 106 | ### 7. Replacement Announcements Skip Purgatory | ||
| 107 | |||
| 108 | **Decision:** Announcements replacing an existing active (database) announcement are accepted immediately. | ||
| 109 | |||
| 110 | **Why:** The repository is already proven active with content. | ||
| 111 | |||
| 112 | **How:** Check if active announcement exists for `(pubkey, identifier)` before routing to purgatory. | ||
| 82 | 113 | ||
| 83 | --- | 114 | --- |
| 84 | 115 | ||
| @@ -103,22 +134,54 @@ pub struct RefUpdate { | |||
| 103 | } | 134 | } |
| 104 | ``` | 135 | ``` |
| 105 | 136 | ||
| 137 | ### Announcement Purgatory Entry | ||
| 138 | |||
| 139 | ```rust | ||
| 140 | pub struct AnnouncementPurgatoryEntry { | ||
| 141 | /// The kind 30617 announcement event | ||
| 142 | pub event: Event, | ||
| 143 | |||
| 144 | /// Repository identifier from 'd' tag | ||
| 145 | pub identifier: String, | ||
| 146 | |||
| 147 | /// Event author pubkey | ||
| 148 | pub owner: PublicKey, | ||
| 149 | |||
| 150 | /// Path to the bare git repo on disk (created immediately on entry) | ||
| 151 | pub repo_path: PathBuf, | ||
| 152 | |||
| 153 | /// Relay URLs from 'relays'/'clone' tags — for sync registration | ||
| 154 | pub relays: HashSet<String>, | ||
| 155 | |||
| 156 | /// When added to purgatory | ||
| 157 | pub created_at: Instant, | ||
| 158 | |||
| 159 | /// Expiry deadline (30 min from creation, may be extended) | ||
| 160 | pub expires_at: Instant, | ||
| 161 | |||
| 162 | /// Whether the bare repo has been deleted (soft expiry phase) | ||
| 163 | pub soft_expired: bool, | ||
| 164 | } | ||
| 165 | ``` | ||
| 166 | |||
| 167 | **Indexed by `(pubkey, identifier)`** because identifier is not unique across different owners. | ||
| 168 | |||
| 106 | ### State Purgatory Entry | 169 | ### State Purgatory Entry |
| 107 | 170 | ||
| 108 | ```rust | 171 | ```rust |
| 109 | pub struct StatePurgatoryEntry { | 172 | pub struct StatePurgatoryEntry { |
| 110 | /// The nostr state event (kind 30618) awaiting git data | 173 | /// The nostr state event (kind 30618) awaiting git data |
| 111 | pub event: Event, | 174 | pub event: Event, |
| 112 | 175 | ||
| 113 | /// Repository identifier from 'd' tag | 176 | /// Repository identifier from 'd' tag |
| 114 | pub identifier: String, | 177 | pub identifier: String, |
| 115 | 178 | ||
| 116 | /// Event author pubkey | 179 | /// Event author pubkey |
| 117 | pub author: PublicKey, | 180 | pub author: PublicKey, |
| 118 | 181 | ||
| 119 | /// When added to purgatory | 182 | /// When added to purgatory |
| 120 | pub created_at: Instant, | 183 | pub created_at: Instant, |
| 121 | 184 | ||
| 122 | /// Expiry deadline (30 min from creation, may be extended) | 185 | /// Expiry deadline (30 min from creation, may be extended) |
| 123 | pub expires_at: Instant, | 186 | pub expires_at: Instant, |
| 124 | } | 187 | } |
| @@ -132,14 +195,14 @@ pub struct StatePurgatoryEntry { | |||
| 132 | pub struct PrPurgatoryEntry { | 195 | pub struct PrPurgatoryEntry { |
| 133 | /// The nostr PR event, if received (None = git data arrived first) | 196 | /// The nostr PR event, if received (None = git data arrived first) |
| 134 | pub event: Option<Event>, | 197 | pub event: Option<Event>, |
| 135 | 198 | ||
| 136 | /// Expected commit SHA from 'c' tag (if event exists) | 199 | /// Expected commit SHA from 'c' tag (if event exists) |
| 137 | /// or actual commit pushed (if git arrived first) | 200 | /// or actual commit pushed (if git arrived first) |
| 138 | pub commit: String, | 201 | pub commit: String, |
| 139 | 202 | ||
| 140 | /// When added to purgatory | 203 | /// When added to purgatory |
| 141 | pub created_at: Instant, | 204 | pub created_at: Instant, |
| 142 | 205 | ||
| 143 | /// Expiry deadline (30 min from creation) | 206 | /// Expiry deadline (30 min from creation) |
| 144 | pub expires_at: Instant, | 207 | pub expires_at: Instant, |
| 145 | } | 208 | } |
| @@ -151,24 +214,155 @@ pub struct PrPurgatoryEntry { | |||
| 151 | 214 | ||
| 152 | ```rust | 215 | ```rust |
| 153 | pub struct Purgatory { | 216 | pub struct Purgatory { |
| 217 | /// Announcement events indexed by (owner, identifier) | ||
| 218 | announcement_purgatory: DashMap<(PublicKey, String), AnnouncementPurgatoryEntry>, | ||
| 219 | |||
| 154 | /// State events indexed by identifier (d tag) | 220 | /// State events indexed by identifier (d tag) |
| 155 | /// Multiple state events per identifier allowed (different authors) | 221 | /// Multiple state events per identifier allowed (different authors) |
| 156 | state_events: Arc<DashMap<String, Vec<StatePurgatoryEntry>>>, | 222 | state_events: DashMap<String, Vec<StatePurgatoryEntry>>, |
| 157 | 223 | ||
| 158 | /// PR events indexed by event_id (hex string) | 224 | /// PR events indexed by event_id (hex string) |
| 159 | /// Single entry per event ID | 225 | /// Single entry per event ID |
| 160 | pr_events: Arc<DashMap<String, PrPurgatoryEntry>>, | 226 | pr_events: DashMap<String, PrPurgatoryEntry>, |
| 161 | 227 | ||
| 162 | /// Sync queue for background git data fetching | 228 | /// Sync queue for background git data fetching |
| 163 | sync_queue: Arc<DashMap<String, SyncQueueEntry>>, | 229 | sync_queue: DashMap<String, SyncQueueEntry>, |
| 164 | 230 | ||
| 165 | _git_data_path: PathBuf, | 231 | /// Events that previously expired without git data (prevents re-sync loops) |
| 232 | expired_events: DashMap<EventId, Instant>, | ||
| 166 | } | 233 | } |
| 167 | ``` | 234 | ``` |
| 168 | 235 | ||
| 169 | --- | 236 | --- |
| 170 | 237 | ||
| 171 | ## Event Flows | 238 | ## Announcement Purgatory Flows |
| 239 | |||
| 240 | ### New Announcement Flow | ||
| 241 | |||
| 242 | ``` | ||
| 243 | Announcement arrives | ||
| 244 | | | ||
| 245 | v | ||
| 246 | Is there an active announcement for (pubkey, identifier) in DB? | ||
| 247 | | | ||
| 248 | +-- YES --> Accept immediately (replacement, repo already proven) | ||
| 249 | | | ||
| 250 | +-- NO --> Is there a purgatory entry for (pubkey, identifier)? | ||
| 251 | | | ||
| 252 | +-- YES --> Replace purgatory entry, extend expiry 30 min | ||
| 253 | | Return OK to client (but don't serve) | ||
| 254 | | | ||
| 255 | +-- NO --> Create bare repo | ||
| 256 | Add to purgatory | ||
| 257 | Return OK to client (but don't serve) | ||
| 258 | ``` | ||
| 259 | |||
| 260 | ### Git Data Arrival → Promotion | ||
| 261 | |||
| 262 | ``` | ||
| 263 | Git push/fetch completes with data | ||
| 264 | | | ||
| 265 | v | ||
| 266 | process_purgatory_announcements() called | ||
| 267 | | | ||
| 268 | v | ||
| 269 | Is there a purgatory announcement for (owner, identifier)? | ||
| 270 | | | ||
| 271 | +-- YES --> promote_announcement() removes from purgatory | ||
| 272 | | Save event to database | ||
| 273 | | Notify WebSocket clients | ||
| 274 | | (Sync upgrades to Full automatically via SelfSubscriber) | ||
| 275 | | | ||
| 276 | +-- NO --> Normal processing | ||
| 277 | ``` | ||
| 278 | |||
| 279 | ### State Event Arrival for Purgatory Announcement | ||
| 280 | |||
| 281 | ``` | ||
| 282 | State event arrives | ||
| 283 | | | ||
| 284 | v | ||
| 285 | fetch_repository_data_with_purgatory() checks DB + purgatory | ||
| 286 | | | ||
| 287 | +-- Announcement found in purgatory --> | ||
| 288 | | Validate authorization against purgatory announcement | ||
| 289 | | Extend purgatory announcement expiry (reset 30-min timer) | ||
| 290 | | If soft-expired: recreate bare repo, clear soft_expired flag | ||
| 291 | | Route state event to state purgatory | ||
| 292 | | | ||
| 293 | +-- No announcement anywhere --> Reject | ||
| 294 | ``` | ||
| 295 | |||
| 296 | ### Announcement Expiry (Two-Phase Soft Expiry) | ||
| 297 | |||
| 298 | The protocol specifies 30-minute expiry for announcements. We implement a two-phase soft expiry: | ||
| 299 | |||
| 300 | **Phase 1 — Initial 30-minute expiry (`soft_expired == false`):** | ||
| 301 | - Delete the bare git repo (frees disk space, respects protocol expiry) | ||
| 302 | - Set `soft_expired = true` | ||
| 303 | - Extend `expires_at` by 24 hours (`SOFT_EXPIRY_EXTENDED`) | ||
| 304 | - Continue syncing state events (same as active purgatory) | ||
| 305 | |||
| 306 | **Phase 2 — 24-hour soft expiry (`soft_expired == true`):** | ||
| 307 | - Add event ID to `expired_events` (prevents re-sync loops) | ||
| 308 | - Remove entry completely from `announcement_purgatory` | ||
| 309 | |||
| 310 | **Why soft expiry?** Without it, we'd face a dilemma: | ||
| 311 | |||
| 312 | - Add expired announcements to `failed_events` → permanently reject future state events, losing potential revival when state events arrive late | ||
| 313 | - Re-fetch the announcement event on every sync cycle → wasting bandwidth and creating unnecessary sync traffic | ||
| 314 | |||
| 315 | Soft expiry retains the event for 24 hours so that late-arriving state events (e.g. from a slow sync) can revive the announcement without forcing a full re-announcement flow. | ||
| 316 | |||
| 317 | **Revival:** If a state event arrives for a soft-expired announcement, `extend_announcement_expiry()` recreates the bare repo, clears `soft_expired`, and resets the 30-minute timer. | ||
| 318 | |||
| 319 | ### Expiry Extension Triggers | ||
| 320 | |||
| 321 | The 30-minute purgatory timer is reset (extended) in three scenarios: | ||
| 322 | |||
| 323 | | Trigger | Location | Why | | ||
| 324 | |---------|----------|-----| | ||
| 325 | | State event arrives | `StatePolicy::process_state_event()` | Repo is actively receiving metadata | | ||
| 326 | | Git push authorized against purgatory state | `get_state_authorization_for_specific_owner_repo()` | Repo is actively receiving git data | | ||
| 327 | | Replacement announcement arrives | `AnnouncementPolicy::validate()` | Announcement updated | | ||
| 328 | |||
| 329 | All three call `purgatory.extend_announcement_expiry(owner, identifier, 1800s)`. | ||
| 330 | |||
| 331 | ### Purgatory Lifecycle | ||
| 332 | |||
| 333 | ``` | ||
| 334 | ┌─────────────────────────────────────┐ | ||
| 335 | │ │ | ||
| 336 | v │ | ||
| 337 | Announcement ──> ACTIVE ──────────────────────────────────┤ | ||
| 338 | arrives (bare repo exists) │ | ||
| 339 | │ │ | ||
| 340 | ├── Git data ──> PROMOTED (exit) │ | ||
| 341 | │ │ | ||
| 342 | ├── Deletion ──> REMOVED (exit) │ | ||
| 343 | │ │ | ||
| 344 | v │ | ||
| 345 | SOFT_EXPIRED ──────────────────────────────┘ | ||
| 346 | (bare repo deleted, ^ | ||
| 347 | event retained) │ | ||
| 348 | │ │ | ||
| 349 | ├── State event arrives (revival) | ||
| 350 | │ | ||
| 351 | └── Extended expiry ──> REMOVED (exit) | ||
| 352 | ``` | ||
| 353 | |||
| 354 | | Exit | Trigger | Action | | ||
| 355 | |------|---------|--------| | ||
| 356 | | **Promotion** | Git data arrives | Move to database, sync upgrades to Full | | ||
| 357 | | **Soft expiry** | Initial 30-min timeout | Delete bare repo, retain event, continue sync | | ||
| 358 | | **Full expiry** | 24-hour soft expiry | Add to expired_events, remove from purgatory | | ||
| 359 | | **Deletion** | Kind 5 event | Delete bare repo, remove from purgatory | | ||
| 360 | | **Replacement** | Newer announcement (same pubkey, identifier) | Replace entry, extend expiry | | ||
| 361 | | **Service change** | Newer announcement removes our service | Remove from purgatory | | ||
| 362 | |||
| 363 | --- | ||
| 364 | |||
| 365 | ## State and PR Event Flows | ||
| 172 | 366 | ||
| 173 | ### State Event Arrival (Kind 30618) | 367 | ### State Event Arrival (Kind 30618) |
| 174 | 368 | ||
| @@ -377,11 +571,12 @@ Purgatory includes a background sync system that fetches git data from remote se | |||
| 377 | ▼ | 571 | ▼ |
| 378 | ┌─────────────────────────────────────────────────────┐ | 572 | ┌─────────────────────────────────────────────────────┐ |
| 379 | │ process_newly_available_git_data(repo, oids) │ | 573 | │ process_newly_available_git_data(repo, oids) │ |
| 380 | │ 1. Find satisfiable state events in purgatory │ | 574 | │ 1. Find satisfiable announcement in purgatory │ |
| 381 | │ 2. Find satisfiable PR events in purgatory │ | 575 | │ 2. Find satisfiable state events in purgatory │ |
| 382 | │ 3. Save events to database │ | 576 | │ 3. Find satisfiable PR events in purgatory │ |
| 383 | │ 4. Sync git data to other owner repos │ | 577 | │ 4. Save events to database │ |
| 384 | │ 5. Remove from purgatory │ | 578 | │ 5. Sync git data to other owner repos │ |
| 579 | │ 6. Remove from purgatory │ | ||
| 385 | └─────────────────────────────────────────────────────┘ | 580 | └─────────────────────────────────────────────────────┘ |
| 386 | ``` | 581 | ``` |
| 387 | 582 | ||
| @@ -402,8 +597,8 @@ pub struct SyncQueueEntry { | |||
| 402 | 597 | ||
| 403 | **Backoff strategy:** | 598 | **Backoff strategy:** |
| 404 | - First attempt: 20 seconds | 599 | - First attempt: 20 seconds |
| 405 | - Second attempt: 2 minutes | 600 | - Second attempt: 40 seconds |
| 406 | - Subsequent attempts: 2 minutes | 601 | - Subsequent attempts: capped at 2 minutes |
| 407 | 602 | ||
| 408 | ### Sync Delays | 603 | ### Sync Delays |
| 409 | 604 | ||
| @@ -428,7 +623,7 @@ pub struct ThrottleManager { | |||
| 428 | ``` | 623 | ``` |
| 429 | 624 | ||
| 430 | **Rate limiting:** | 625 | **Rate limiting:** |
| 431 | - Default: 5 requests per domain per 30 seconds | 626 | - Default: 5 concurrent requests per domain, 30 requests per minute |
| 432 | - Tracks request timestamps in a sliding window | 627 | - Tracks request timestamps in a sliding window |
| 433 | - Queues identifiers when domain is throttled | 628 | - Queues identifiers when domain is throttled |
| 434 | - Processes queue when capacity frees up | 629 | - Processes queue when capacity frees up |
| @@ -439,7 +634,47 @@ See [`src/purgatory/sync/throttle.rs`](../../src/purgatory/sync/throttle.rs) for | |||
| 439 | 634 | ||
| 440 | ## Purgatory API | 635 | ## Purgatory API |
| 441 | 636 | ||
| 442 | ### Adding Entries | 637 | ### Announcement Purgatory |
| 638 | |||
| 639 | ```rust | ||
| 640 | impl Purgatory { | ||
| 641 | /// Add an announcement to purgatory (bare repo already created by caller) | ||
| 642 | pub fn add_announcement( | ||
| 643 | &self, | ||
| 644 | event: Event, | ||
| 645 | identifier: String, | ||
| 646 | owner: PublicKey, | ||
| 647 | repo_path: PathBuf, | ||
| 648 | relays: HashSet<String>, | ||
| 649 | ); | ||
| 650 | |||
| 651 | /// Promote announcement: remove from purgatory, return event for DB save | ||
| 652 | pub fn promote_announcement( | ||
| 653 | &self, | ||
| 654 | owner: &PublicKey, | ||
| 655 | identifier: &str, | ||
| 656 | ) -> Option<Event>; | ||
| 657 | |||
| 658 | /// Get announcements by identifier (for authorization checks) | ||
| 659 | pub fn get_announcements_by_identifier( | ||
| 660 | &self, | ||
| 661 | identifier: &str, | ||
| 662 | ) -> Vec<AnnouncementPurgatoryEntry>; | ||
| 663 | |||
| 664 | /// Extend expiry (and revive soft-expired entries, recreating bare repo) | ||
| 665 | pub fn extend_announcement_expiry( | ||
| 666 | &self, | ||
| 667 | owner: &PublicKey, | ||
| 668 | identifier: &str, | ||
| 669 | duration: Duration, | ||
| 670 | ); | ||
| 671 | |||
| 672 | /// Get all announcements for sync registration | ||
| 673 | pub fn announcements_for_sync(&self) -> Vec<AnnouncementPurgatoryEntry>; | ||
| 674 | } | ||
| 675 | ``` | ||
| 676 | |||
| 677 | ### State and PR Purgatory | ||
| 443 | 678 | ||
| 444 | ```rust | 679 | ```rust |
| 445 | impl Purgatory { | 680 | impl Purgatory { |
| @@ -453,13 +688,7 @@ impl Purgatory { | |||
| 453 | 688 | ||
| 454 | /// Add a PR placeholder (git-data-first scenario) | 689 | /// Add a PR placeholder (git-data-first scenario) |
| 455 | pub fn add_pr_placeholder(&self, event_id: String, commit: String); | 690 | pub fn add_pr_placeholder(&self, event_id: String, commit: String); |
| 456 | } | ||
| 457 | ``` | ||
| 458 | 691 | ||
| 459 | ### Finding Entries | ||
| 460 | |||
| 461 | ```rust | ||
| 462 | impl Purgatory { | ||
| 463 | /// Find state events waiting for an identifier | 692 | /// Find state events waiting for an identifier |
| 464 | pub fn find_state(&self, identifier: &str) -> Vec<StatePurgatoryEntry>; | 693 | pub fn find_state(&self, identifier: &str) -> Vec<StatePurgatoryEntry>; |
| 465 | 694 | ||
| @@ -476,13 +705,7 @@ impl Purgatory { | |||
| 476 | 705 | ||
| 477 | /// Find a PR placeholder specifically (git-data-first) | 706 | /// Find a PR placeholder specifically (git-data-first) |
| 478 | pub fn find_pr_placeholder(&self, event_id: &str) -> Option<String>; | 707 | pub fn find_pr_placeholder(&self, event_id: &str) -> Option<String>; |
| 479 | } | ||
| 480 | ``` | ||
| 481 | 708 | ||
| 482 | ### Removing Entries | ||
| 483 | |||
| 484 | ```rust | ||
| 485 | impl Purgatory { | ||
| 486 | /// Remove all state events for an identifier | 709 | /// Remove all state events for an identifier |
| 487 | pub fn remove_state(&self, identifier: &str); | 710 | pub fn remove_state(&self, identifier: &str); |
| 488 | 711 | ||
| @@ -499,36 +722,14 @@ impl Purgatory { | |||
| 499 | ```rust | 722 | ```rust |
| 500 | impl Purgatory { | 723 | impl Purgatory { |
| 501 | /// Remove expired entries (called every 60 seconds) | 724 | /// Remove expired entries (called every 60 seconds) |
| 502 | /// Returns (state_removed, pr_removed) | 725 | /// Handles two-phase soft expiry for announcements |
| 503 | pub fn cleanup(&self) -> (usize, usize); | 726 | pub fn cleanup(&self); |
| 504 | 727 | ||
| 505 | /// Extend expiry for entries about to be processed | 728 | /// Extend expiry for state/PR entries about to be processed |
| 506 | /// Ensures at least `duration` remaining | ||
| 507 | pub fn extend_expiry(&self, identifier: &str, event_ids: &[EventId], duration: Duration); | 729 | pub fn extend_expiry(&self, identifier: &str, event_ids: &[EventId], duration: Duration); |
| 508 | 730 | ||
| 509 | /// Get current counts for metrics | 731 | /// Check if an event previously expired (prevents re-sync loops) |
| 510 | pub fn count(&self) -> (usize, usize); | 732 | pub fn is_expired(&self, event_id: &EventId) -> bool; |
| 511 | } | ||
| 512 | ``` | ||
| 513 | |||
| 514 | ### Sync Queue Management | ||
| 515 | |||
| 516 | ```rust | ||
| 517 | impl Purgatory { | ||
| 518 | /// Enqueue identifier for sync with custom delay | ||
| 519 | pub fn enqueue_sync(&self, identifier: &str, delay: Duration); | ||
| 520 | |||
| 521 | /// Enqueue with default delay (3 minutes) | ||
| 522 | pub fn enqueue_sync_default(&self, identifier: &str); | ||
| 523 | |||
| 524 | /// Enqueue with immediate delay (500ms) | ||
| 525 | pub fn enqueue_sync_immediate(&self, identifier: &str); | ||
| 526 | |||
| 527 | /// Check if identifier has pending events | ||
| 528 | pub fn has_pending_events(&self, identifier: &str) -> bool; | ||
| 529 | |||
| 530 | /// Remove identifier from sync queue | ||
| 531 | pub fn remove_from_sync_queue(&self, identifier: &str); | ||
| 532 | } | 733 | } |
| 533 | ``` | 734 | ``` |
| 534 | 735 | ||
| @@ -558,12 +759,6 @@ pub fn can_apply_state( | |||
| 558 | event: &Event, | 759 | event: &Event, |
| 559 | repo_path: &Path, | 760 | repo_path: &Path, |
| 560 | ) -> Result<bool>; | 761 | ) -> Result<bool>; |
| 561 | |||
| 562 | /// Get refs from state that aren't being pushed | ||
| 563 | pub fn get_unpushed_refs( | ||
| 564 | state_refs: &[RefPair], | ||
| 565 | pushed_refs: &[RefPair], | ||
| 566 | ) -> Vec<RefPair>; | ||
| 567 | ``` | 762 | ``` |
| 568 | 763 | ||
| 569 | See [`src/purgatory/helpers.rs`](../../src/purgatory/helpers.rs) for implementation. | 764 | See [`src/purgatory/helpers.rs`](../../src/purgatory/helpers.rs) for implementation. |
| @@ -572,123 +767,37 @@ See [`src/purgatory/helpers.rs`](../../src/purgatory/helpers.rs) for implementat | |||
| 572 | 767 | ||
| 573 | ## Integration Points | 768 | ## Integration Points |
| 574 | 769 | ||
| 575 | ### 1. Event Policy (Nip34WritePolicy) | 770 | ### 1. Announcement Policy (`src/nostr/policy/announcement.rs`) |
| 576 | 771 | ||
| 577 | State and PR events are added to purgatory when git data doesn't exist: | 772 | Routes new announcements to purgatory or accepts replacements: |
| 578 | 773 | ||
| 579 | ```rust | 774 | - If active DB announcement exists for `(pubkey, identifier)` → `Accept` immediately |
| 580 | // From src/nostr/policy/state.rs | 775 | - If purgatory entry exists → replace it, extend expiry, return `Accept` |
| 581 | async fn handle_state(&self, event: &Event) -> WritePolicyResult { | 776 | - Otherwise → return `AcceptPurgatory`, caller calls `add_to_purgatory()` which creates bare repo and adds to purgatory |
| 582 | let identifier = extract_identifier(event)?; | ||
| 583 | |||
| 584 | // Check if we have matching git data | ||
| 585 | if self.has_matching_git_data(&identifier, event).await? { | ||
| 586 | return WritePolicyResult::Accept; | ||
| 587 | } | ||
| 588 | |||
| 589 | // Add to purgatory | ||
| 590 | self.purgatory.add_state( | ||
| 591 | event.clone(), | ||
| 592 | identifier.clone(), | ||
| 593 | event.pubkey, | ||
| 594 | ); | ||
| 595 | |||
| 596 | WritePolicyResult::Reject { | ||
| 597 | status: true, // Client sees OK | ||
| 598 | message: "purgatory: awaiting git data".into() | ||
| 599 | } | ||
| 600 | } | ||
| 601 | ``` | ||
| 602 | 777 | ||
| 603 | ### 2. Git Push Authorization | 778 | ### 2. State Event Policy (`src/nostr/policy/state.rs`) |
| 604 | 779 | ||
| 605 | Authorization checks both database and purgatory: | 780 | Checks purgatory announcements for authorization and extends their expiry: |
| 606 | 781 | ||
| 607 | ```rust | 782 | ```rust |
| 608 | // From src/git/authorization.rs | 783 | // Fetch announcements from both DB and purgatory |
| 609 | pub async fn authorize_push( | 784 | let repo_data = fetch_repository_data_with_purgatory(db, purgatory, identifier).await?; |
| 610 | database: &SharedDatabase, | 785 | |
| 611 | identifier: &str, | 786 | // For each authorized owner with a purgatory announcement, extend expiry |
| 612 | owner_pubkey: &str, | 787 | purgatory.extend_announcement_expiry(&owner_pk, &identifier, Duration::from_secs(1800)); |
| 613 | request_body: &Bytes, | ||
| 614 | purgatory: &Arc<Purgatory>, // Critical! | ||
| 615 | repo_path: &std::path::Path, | ||
| 616 | ) -> anyhow::Result<AuthorizationResult> { | ||
| 617 | // Parse pushed refs | ||
| 618 | let pushed_refs = parse_pushed_refs(request_body); | ||
| 619 | |||
| 620 | // Check database for state events | ||
| 621 | let db_result = get_authorization_from_db(database, identifier).await?; | ||
| 622 | |||
| 623 | if !db_result.authorized { | ||
| 624 | // No state in database - check purgatory | ||
| 625 | let purgatory_result = get_state_authorization_for_specific_owner_repo( | ||
| 626 | database, | ||
| 627 | identifier, | ||
| 628 | owner_pubkey, | ||
| 629 | purgatory, | ||
| 630 | &pushed_refs, | ||
| 631 | repo_path, | ||
| 632 | ).await?; | ||
| 633 | |||
| 634 | return purgatory_result; | ||
| 635 | } | ||
| 636 | |||
| 637 | db_result | ||
| 638 | } | ||
| 639 | ``` | 788 | ``` |
| 640 | 789 | ||
| 641 | ### 3. Post-Push Processing | 790 | ### 3. Git Push Authorization (`src/git/authorization.rs`) |
| 642 | 791 | ||
| 643 | After successful push, events from purgatory are saved to database: | 792 | `fetch_repository_data_with_purgatory()` merges DB announcements with purgatory announcements for authorization. On successful authorization via purgatory state events, also extends announcement expiry. |
| 644 | 793 | ||
| 645 | ```rust | 794 | ### 4. Git Data Processing (`src/git/sync.rs`) |
| 646 | // From src/git/handlers.rs | ||
| 647 | if from_purgatory { | ||
| 648 | if let (Some(db), Some(purg)) = (&database, &purgatory) { | ||
| 649 | // Save state event to database | ||
| 650 | db.save_event(&state.event).await?; | ||
| 651 | |||
| 652 | // Remove from purgatory | ||
| 653 | purg.remove_state_event(identifier, &state.event.id); | ||
| 654 | } | ||
| 655 | } | ||
| 656 | ``` | ||
| 657 | 795 | ||
| 658 | ### 4. Background Sync Loop | 796 | `process_purgatory_announcements()` is called after any git push or background sync fetch. It promotes announcements from purgatory to the database and notifies WebSocket clients. |
| 659 | 797 | ||
| 660 | Started during application initialization: | 798 | ### 5. Sync Registration (`src/sync/`) |
| 661 | 799 | ||
| 662 | ```rust | 800 | A background timer (`run_purgatory_announcement_sync`, every 5 seconds) ensures purgatory announcements are registered in `RepoSyncIndex` with `SyncLevel::StateOnly`. When an announcement is promoted, the `SelfSubscriber` upgrades it to `SyncLevel::Full`. |
| 663 | // From src/main.rs | ||
| 664 | let purgatory = Arc::new(Purgatory::new(git_data_path)); | ||
| 665 | let ctx = Arc::new(RealSyncContext::new( | ||
| 666 | database.clone(), | ||
| 667 | purgatory.clone(), | ||
| 668 | config.domain.clone(), | ||
| 669 | git_data_path.clone(), | ||
| 670 | )); | ||
| 671 | let throttle_manager = Arc::new(ThrottleManager::new(5, 30)); | ||
| 672 | throttle_manager.set_context(ctx.clone()); | ||
| 673 | |||
| 674 | // Start sync loop | ||
| 675 | let sync_handle = purgatory.clone().start_sync_loop(ctx, throttle_manager); | ||
| 676 | |||
| 677 | // Start cleanup task | ||
| 678 | let cleanup_handle = tokio::spawn(async move { | ||
| 679 | let mut interval = tokio::time::interval(Duration::from_secs(60)); | ||
| 680 | loop { | ||
| 681 | interval.tick().await; | ||
| 682 | let (state_removed, pr_removed) = purgatory.cleanup(); | ||
| 683 | if state_removed + pr_removed > 0 { | ||
| 684 | tracing::debug!( | ||
| 685 | "Purgatory cleanup removed {} state, {} PR entries", | ||
| 686 | state_removed, pr_removed | ||
| 687 | ); | ||
| 688 | } | ||
| 689 | } | ||
| 690 | }); | ||
| 691 | ``` | ||
| 692 | 801 | ||
| 693 | --- | 802 | --- |
| 694 | 803 | ||
| @@ -698,7 +807,7 @@ let cleanup_handle = tokio::spawn(async move { | |||
| 698 | src/ | 807 | src/ |
| 699 | ├── purgatory/ | 808 | ├── purgatory/ |
| 700 | │ ├── mod.rs # Main Purgatory struct and API | 809 | │ ├── mod.rs # Main Purgatory struct and API |
| 701 | │ ├── types.rs # RefPair, StatePurgatoryEntry, PrPurgatoryEntry | 810 | │ ├── types.rs # RefPair, AnnouncementPurgatoryEntry, StatePurgatoryEntry, PrPurgatoryEntry |
| 702 | │ ├── helpers.rs # Ref extraction and matching functions | 811 | │ ├── helpers.rs # Ref extraction and matching functions |
| 703 | │ └── sync/ | 812 | │ └── sync/ |
| 704 | │ ├── mod.rs # Sync module exports | 813 | │ ├── mod.rs # Sync module exports |
| @@ -710,9 +819,10 @@ src/ | |||
| 710 | ├── git/ | 819 | ├── git/ |
| 711 | │ ├── authorization.rs # authorize_push with purgatory checking | 820 | │ ├── authorization.rs # authorize_push with purgatory checking |
| 712 | │ ├── handlers.rs # handle_receive_pack with post-push processing | 821 | │ ├── handlers.rs # handle_receive_pack with post-push processing |
| 713 | │ └── sync.rs # process_newly_available_git_data | 822 | │ └── sync.rs # process_newly_available_git_data, process_purgatory_announcements |
| 714 | └── nostr/ | 823 | └── nostr/ |
| 715 | └── policy/ | 824 | └── policy/ |
| 825 | ├── announcement.rs # Route announcements to purgatory | ||
| 716 | ├── state.rs # State event policy with purgatory | 826 | ├── state.rs # State event policy with purgatory |
| 717 | └── pr_event.rs # PR event policy with purgatory | 827 | └── pr_event.rs # PR event policy with purgatory |
| 718 | ``` | 828 | ``` |
| @@ -725,7 +835,7 @@ src/ | |||
| 725 | 835 | ||
| 726 | Located in each module: | 836 | Located in each module: |
| 727 | 837 | ||
| 728 | - **[`src/purgatory/mod.rs`](../../src/purgatory/mod.rs)** - Core purgatory operations | 838 | - **[`src/purgatory/mod.rs`](../../src/purgatory/mod.rs)** - Core purgatory operations including announcement purgatory |
| 729 | - **[`src/purgatory/helpers.rs`](../../src/purgatory/helpers.rs)** - Ref matching logic | 839 | - **[`src/purgatory/helpers.rs`](../../src/purgatory/helpers.rs)** - Ref matching logic |
| 730 | - **[`src/purgatory/sync/functions.rs`](../../src/purgatory/sync/functions.rs)** - Sync functions with MockSyncContext | 840 | - **[`src/purgatory/sync/functions.rs`](../../src/purgatory/sync/functions.rs)** - Sync functions with MockSyncContext |
| 731 | - **[`src/purgatory/sync/throttle.rs`](../../src/purgatory/sync/throttle.rs)** - Throttle manager | 841 | - **[`src/purgatory/sync/throttle.rs`](../../src/purgatory/sync/throttle.rs)** - Throttle manager |
| @@ -734,6 +844,9 @@ Located in each module: | |||
| 734 | 844 | ||
| 735 | Located in [`tests/`](../../tests/): | 845 | Located in [`tests/`](../../tests/): |
| 736 | 846 | ||
| 847 | - **Announcement purgatory flow** - Announcement enters purgatory, git data promotes it | ||
| 848 | - **Announcement soft expiry** - Bare repo deleted after 30 min, event retained 24h | ||
| 849 | - **Announcement revival** - State event revives soft-expired announcement | ||
| 737 | - **State event purgatory flow** - Event arrives, git push releases it | 850 | - **State event purgatory flow** - Event arrives, git push releases it |
| 738 | - **PR event purgatory flow** - Event arrives, git push releases it | 851 | - **PR event purgatory flow** - Event arrives, git push releases it |
| 739 | - **Git-data-first flow** - Git push creates placeholder, event completes it | 852 | - **Git-data-first flow** - Git push creates placeholder, event completes it |
| @@ -744,7 +857,19 @@ Located in [`tests/`](../../tests/): | |||
| 744 | 857 | ||
| 745 | ## Key Learnings | 858 | ## Key Learnings |
| 746 | 859 | ||
| 747 | ### 1. Purgatory Authorization is Critical | 860 | ### 1. Announcement Purgatory Prevents Misleading Empty Repos |
| 861 | |||
| 862 | Without announcement purgatory, we'd serve announcements for repos with no content. Clients would see the announcement, try to clone, and get nothing. | ||
| 863 | |||
| 864 | **Solution:** Announcements wait in purgatory until git data proves content exists. | ||
| 865 | |||
| 866 | ### 2. Soft Expiry Avoids Sync Loops | ||
| 867 | |||
| 868 | The protocol's 30-minute expiry creates a problem: without soft expiry, we'd either permanently block repositories or constantly re-sync expired announcement events. | ||
| 869 | |||
| 870 | **Solution:** Soft expiry retains the event for 24 hours after deleting the bare repo, allowing revival without re-fetching. | ||
| 871 | |||
| 872 | ### 3. Purgatory Authorization is Critical | ||
| 748 | 873 | ||
| 749 | Without checking purgatory during authorization, we have a deadlock: | 874 | Without checking purgatory during authorization, we have a deadlock: |
| 750 | - State event goes to purgatory (no git data) | 875 | - State event goes to purgatory (no git data) |
| @@ -753,7 +878,7 @@ Without checking purgatory during authorization, we have a deadlock: | |||
| 753 | 878 | ||
| 754 | **Solution:** `authorize_push()` checks both database and purgatory. | 879 | **Solution:** `authorize_push()` checks both database and purgatory. |
| 755 | 880 | ||
| 756 | ### 2. Late Binding for State Events | 881 | ### 4. Late Binding for State Events |
| 757 | 882 | ||
| 758 | Extracting refs at event arrival time doesn't work when: | 883 | Extracting refs at event arrival time doesn't work when: |
| 759 | - Multiple state events arrive for same identifier | 884 | - Multiple state events arrive for same identifier |
| @@ -761,7 +886,7 @@ Extracting refs at event arrival time doesn't work when: | |||
| 761 | 886 | ||
| 762 | **Solution:** Extract and match refs at push time via `find_matching_states()`. | 887 | **Solution:** Extract and match refs at push time via `find_matching_states()`. |
| 763 | 888 | ||
| 764 | ### 3. Bidirectional Waiting for PR Events | 889 | ### 5. Bidirectional Waiting for PR Events |
| 765 | 890 | ||
| 766 | PR events can arrive before or after git data: | 891 | PR events can arrive before or after git data: |
| 767 | - Event first → Wait for git push | 892 | - Event first → Wait for git push |
| @@ -769,26 +894,13 @@ PR events can arrive before or after git data: | |||
| 769 | 894 | ||
| 770 | **Solution:** `PrPurgatoryEntry.event: Option<Event>` with `None` = placeholder. | 895 | **Solution:** `PrPurgatoryEntry.event: Option<Event>` with `None` = placeholder. |
| 771 | 896 | ||
| 772 | ### 4. Sync Queue Debouncing | ||
| 773 | |||
| 774 | When events arrive in bursts (e.g., negentropy sync), we don't want to spawn a sync task for each event. | ||
| 775 | |||
| 776 | **Solution:** `enqueue_sync()` resets `attempt_count` and updates `next_attempt` if already queued. | ||
| 777 | |||
| 778 | ### 5. Domain Throttling with Queues | ||
| 779 | |||
| 780 | When a domain is throttled, we still want to eventually sync from it. | ||
| 781 | |||
| 782 | **Solution:** `ThrottleManager` maintains per-domain queues and processes them when capacity frees. | ||
| 783 | |||
| 784 | --- | 897 | --- |
| 785 | 898 | ||
| 786 | ## Related Documentation | 899 | ## Related Documentation |
| 787 | 900 | ||
| 788 | - [Inline Authorization](inline-authorization.md) - Why purgatory checking during authorization is essential | ||
| 789 | - [Architecture Overview](architecture.md) - Full system design | 901 | - [Architecture Overview](architecture.md) - Full system design |
| 790 | - [Background Sync](../how-to/purgatory-sync.md) - How to configure and monitor sync | 902 | - [GRASP-02 Proactive Sync](grasp-02-proactive-sync.md) - Relay-to-relay event sync with SyncLevel |
| 791 | - [Test Strategy](../reference/test-strategy.md) - How we test purgatory | 903 | - [GRASP-02 Purgatory Git Data Fetching](grasp-02-proactive-sync-purgatory-git-data.md) - Background git data hunting |
| 792 | 904 | ||
| 793 | --- | 905 | --- |
| 794 | 906 | ||