upleb.uk

Public git repos — served from a NIP-34 GRASP relay at git.upleb.uk

summaryrefslogtreecommitdiff
path: root/docs/explanation
diff options
context:
space:
mode:
authorDanConwayDev <DanConwayDev@protonmail.com>2026-02-23 14:49:30 +0000
committerDanConwayDev <DanConwayDev@protonmail.com>2026-02-23 14:49:30 +0000
commit4848c4029fc58f6f310a2babeae1ee82a7e41656 (patch)
treeccdfdaae41dd2907794a47bbeff562824dd3915b /docs/explanation
parentf19b424e01fc5a682778c5e2bb194d242efd6987 (diff)
docs: update purgatory docs to reflect announcements purgatory implementation
Remove the pre-implementation planning docs (announcements-purgatory-design.md and announcements-purgatory-implementation.md) now that the feature is built. Update the three living docs to reflect what was actually implemented: - purgatory-design.md: expanded to cover all three purgatory stores (announcement, state, PR), including AnnouncementPurgatoryEntry structure, two-phase soft expiry lifecycle, expiry extension triggers, promotion flow, and updated integration points and file structure - grasp-02-proactive-sync.md: added SyncLevel enum (Full/StateOnly) to RepoSyncNeeds, documented the purgatory announcement sync timer as the registration path for purgatory announcements, updated filter building to describe build_sync_level_aware_filters() and StateOnly behaviour - grasp-02-proactive-sync-purgatory-git-data.md: expanded to cover announcement purgatory as a third entry type, added Timeline E showing soft-expiry and revival, replaced the single expiry section with separate hard-expiry (state/PR) and two-phase soft-expiry (announcements) sections with full justification for the 24-hour extended retention window
Diffstat (limited to 'docs/explanation')
-rw-r--r--docs/explanation/announcements-purgatory-design.md254
-rw-r--r--docs/explanation/announcements-purgatory-implementation.md296
-rw-r--r--docs/explanation/grasp-02-proactive-sync-purgatory-git-data.md67
-rw-r--r--docs/explanation/grasp-02-proactive-sync.md57
-rw-r--r--docs/explanation/purgatory-design.md520
5 files changed, 415 insertions, 779 deletions
diff --git a/docs/explanation/announcements-purgatory-design.md b/docs/explanation/announcements-purgatory-design.md
deleted file mode 100644
index 009547b..0000000
--- a/docs/explanation/announcements-purgatory-design.md
+++ /dev/null
@@ -1,254 +0,0 @@
1# Announcements Purgatory Design
2
3## Problem Statement
4
5**Primary problem:** Serving announcement events alongside empty bare git repos misleads clients into thinking we host content.
6
7When an announcement arrives, we must create the bare repo immediately (so git pushes can succeed). But if no git data ever arrives, we serve an empty repo and its announcement indefinitely. Clients see the announcement, try to clone, and get nothing. This is misleading.
8
9**Secondary problem:** Sync downloads events for repos that may never have content.
10
11Without purgatory, sync would fetch all L2/L3 events (patches, issues, etc.) for announcements that may never receive git data. This wastes bandwidth and creates orphaned events.
12
13## Solution Overview
14
15New announcements go to **purgatory** instead of being immediately accepted:
16
171. **Announcement arrives** - Create bare repo immediately, add announcement to purgatory
182. **Git data arrives** - Promote announcement from purgatory to active (now served to clients)
193. **No git data before expiry** - Delete bare repo, discard announcement (never served)
20
21This ensures we only serve announcements for repos that actually have content.
22
23## Key Design Decisions
24
25### 1. Bare Repo Created Immediately
26
27**Decision:** Create the bare git repo when announcement enters purgatory.
28
29**Why:** Git pushes may arrive at any time. Without a repo, pushes fail.
30
31**Consequence:** We allocate disk space for repos that may expire unused. Must delete repos on expiry.
32
33### 2. Git Data Triggers Promotion
34
35**Decision:** Git data arrival promotes the announcement to active status.
36
37**Why:** Git data proves the repository has content. State events alone don't prove content exists - they could reference empty repos.
38
39**Where:** Promotion happens in the git receive path after successful push/fetch with data.
40
41### 3. Replacement Announcements Skip Purgatory
42
43**Decision:** Announcements replacing an existing active announcement are accepted immediately.
44
45**Why:** The repository is already proven active with content.
46
47**How:** Check if active announcement exists for `(pubkey, identifier)` before routing to purgatory.
48
49### 4. Expiry Extension (Two Places)
50
51**Decision:** Extend purgatory announcement expiry (reset the 30-minute protocol timer) in two scenarios:
52
53| Trigger | Location | Why |
54| ---------------------------- | ------------------------------------ | ----------------------------------- |
55| State event arrives | `StatePolicy::process_state_event()` | Repo is actively receiving metadata |
56| Git auth extends state event | `src/git/auth.rs` | Repo is actively receiving git data |
57
58**Why:** Prevents premature expiry during slow sync operations or multi-step pushes. The protocol's 30-minute expiry is intended for abandoned repositories, not active ones receiving data.
59
60### 5. Authorization Must Check Purgatory Announcements
61
62**Decision:** When validating state events or git operations, check purgatory announcements in addition to the database.
63
64**Why:** State events and git pushes may arrive before git data promotes the announcement. They still need authorization from the announcement's maintainer set.
65
66**Where:** `fetch_repository_data()` and related authorization functions must query both DB and purgatory.
67
68### 6. Sync Only State Events for Purgatory Announcements
69
70**Decision:** Purgatory announcements trigger sync for state events only, not other L2/L3 events (patches, issues, PRs, etc.).
71
72**Why:** Other L2/L3 events would be rejected anyway (no promoted announcement in DB). Syncing them wastes bandwidth and creates work for announcements that may never promote.
73
74**How:** Sync uses a `SyncLevel` concept - `Full` for promoted repos, `StateOnly` for purgatory. On promotion, upgrade to `Full`.
75
76### 7. Soft Expiry Preserves Event Without Bare Repo
77
78**Decision:** When a purgatory announcement expires (30 minutes per protocol spec), delete the bare repo but retain the announcement event for an extended period (e.g., 24h).
79
80**Why the protocol specifies 30 minutes:** The grasp protocol defines a 30-minute expiry for announcement events to ensure clients don't indefinitely cache stale repository information.
81
82**Why we implement soft expiry:** The protocol's 30-minute expiry creates a sync/storage problem. Without soft expiry, we'd either:
83
84- Add expired announcements to `failed_events` and permanently reject future state events (losing potential revival when state events arrive late)
85- Re-fetch the announcement event repeatedly on every sync cycle (wasting bandwidth and creating unnecessary sync traffic)
86
87**Behavior during soft expiry:**
88
89- Bare repo is deleted (saves disk space, respects protocol expiry)
90- Announcement event retained in purgatory with `soft_expired` flag
91- Sync continues requesting state events (same as active purgatory)
92- If state event arrives: recreate bare repo, clear `soft_expired`, extend expiry
93- If announcement republished directly to us: treat as fresh arrival
94- After extended expiry: fully remove from purgatory
95
96**In summary:** Soft expiry is an implementation optimization that prevents us from constantly re-syncing announcement events or permanently blocking repositories that receive delayed state events.
97
98## Data Structure
99
100```rust
101// Key: (owner pubkey, identifier) - identifier alone is NOT unique
102announcement_purgatory: Arc<DashMap<(PublicKey, String), AnnouncementPurgatoryEntry>>
103
104pub struct AnnouncementPurgatoryEntry {
105 pub event: Event,
106 pub identifier: String,
107 pub owner: PublicKey,
108 pub repo_path: PathBuf,
109 pub relays: HashSet<String>, // For sync registration
110 pub created_at: Instant,
111 pub expires_at: Instant,
112 pub soft_expired: bool, // Bare repo deleted, event retained
113}
114```
115
116**Indexed by `(pubkey, identifier)`** because identifier is not unique across different owners. Lookups are primarily from nostr events which have pubkey and identifier readily available.
117
118## Flows
119
120### New Announcement Flow
121
122```
123Announcement arrives
124 |
125 v
126Is there an active announcement for (pubkey, identifier)?
127 |
128 +-- YES --> Accept immediately (replacement)
129 |
130 +-- NO --> Create bare repo
131 Add to purgatory
132 Return OK to client (but don't serve)
133```
134
135### Git Data Arrival Flow
136
137```
138Git push/fetch completes with data
139 |
140 v
141Is there a purgatory announcement for (pubkey, identifier)?
142 |
143 +-- YES --> Promote to active (move to database)
144 | Now served to clients
145 |
146 +-- NO --> Normal processing
147```
148
149### State Event Arrival Flow
150
151```
152State event arrives
153 |
154 v
155Is there an active announcement?
156 |
157 +-- YES --> Normal validation
158 |
159 +-- NO --> Check purgatory for announcement
160 |
161 +-- Found --> Validate against purgatory announcement
162 | Extend purgatory expiry
163 | State event goes to state purgatory
164 |
165 +-- Not found --> Reject or state purgatory
166```
167
168## Edge Cases
169
170| Scenario | Behavior |
171| ------------------------------------------------------ | ------------------------------------------------------------------------------------------------------- |
172| Git data before announcement | Push fails (no repo exists) |
173| Announcement expires, no git data | Delete bare repo, set `soft_expired` flag, retain event for extended period |
174| Soft-expired announcement fully expires | Remove from purgatory entirely |
175| State event arrives for soft-expired announcement | Recreate bare repo, clear `soft_expired`, extend expiry |
176| State expires, announcement in purgatory | Announcement keeps its own expiry |
177| Multiple owners, same identifier | Each tracked separately by `(pubkey, identifier)` |
178| **Newer announcement replaces older (same pubkey)** | Replace purgatory entry, extend expiry, and state event expiry |
179| **Newer announcement changes services (unacceptable)** | Clear older announcement from purgatory, delete bare repo, remove state events from purgatory if exists |
180| Deletion event for purgatory announcement | Remove from purgatory, delete bare repo |
181
182## Purgatory Lifecycle
183
184An announcement progresses through purgatory states:
185
186```
187 ┌─────────────────────────────────────┐
188 │ │
189 v │
190Announcement ──> ACTIVE ──────────────────────────────────┤
191 arrives (bare repo exists) │
192 │ │
193 ├── Git data ──> PROMOTED (exit) │
194 │ │
195 ├── Deletion ──> REMOVED (exit) │
196 │ │
197 v │
198 SOFT_EXPIRED ──────────────────────────────┘
199 (bare repo deleted, ^
200 event retained) │
201 │ │
202 ├── State event arrives (revival)
203
204 └── Extended expiry ──> REMOVED (exit)
205```
206
207| Exit | Trigger | Action |
208| ------------------ | -------------------------------------------- | --------------------------------------------- |
209| **Promotion** | Git data arrives | Move to database, upgrade sync to Full |
210| **Soft expiry** | Initial timeout | Delete bare repo, retain event, continue sync |
211| **Full expiry** | Extended timeout (soft-expired) | Remove from purgatory entirely |
212| **Deletion** | Kind 5 event | Delete bare repo, remove from purgatory |
213| **Replacement** | Newer announcement (same pubkey, identifier) | Replace entry |
214| **Service change** | Newer announcement removes our service | Remove from purgatory |
215
216## Integration Points
217
218| File | Change |
219| ---------------------------------- | ---------------------------------------------------------- |
220| `src/purgatory/mod.rs` | Add `announcement_purgatory` store |
221| `src/purgatory/types.rs` | Add `AnnouncementPurgatoryEntry` |
222| `src/nostr/policy/announcement.rs` | Route new announcements to purgatory |
223| `src/git/receive.rs` | Promote on git data arrival |
224| `src/git/auth.rs` | Extend purgatory expiry when extending state event expiry |
225| `src/git/authorization.rs` | Check purgatory announcements for maintainer authorization |
226| `src/nostr/policy/state.rs` | Check purgatory for authorization |
227| `src/sync/mod.rs` | Add `SyncLevel` to `RepoSyncNeeds` |
228| `src/sync/filters.rs` | Respect sync level when building filters |
229| `src/sync/self_subscriber.rs` | Register purgatory announcements with `StateOnly` level |
230
231See [announcements-purgatory-implementation.md](./announcements-purgatory-implementation.md) for detailed implementation notes.
232
233## Testing
234
235- Announcement to purgatory, git data promotes it
236- Announcement soft-expires without git data (repo deleted, event retained)
237- State event revives soft-expired announcement (repo recreated)
238- Soft-expired announcement fully expires after extended period
239- State event extends purgatory expiry
240- Git auth extends purgatory expiry
241- Newer announcement replaces older in purgatory
242- Service change clears purgatory entry
243- `(pubkey, identifier)` indexing with multiple owners
244- Sync requests only state events for purgatory announcements
245- Sync upgrades to full on promotion
246
247## Risks
248
249| Risk | Mitigation |
250| ------------------------------------ | ------------------------------------------------------ |
251| Disk exhaustion from purgatory repos | Short expiry, soft expiry deletes repo early |
252| Race between promotion and expiry | Atomic operations |
253| Sync re-fetching expired events | Soft expiry retains event; no need for `failed_events` |
254| Filter explosion from many purgatory | Existing consolidation handles this (threshold at 70) |
diff --git a/docs/explanation/announcements-purgatory-implementation.md b/docs/explanation/announcements-purgatory-implementation.md
deleted file mode 100644
index 263c253..0000000
--- a/docs/explanation/announcements-purgatory-implementation.md
+++ /dev/null
@@ -1,296 +0,0 @@
1# Announcements Purgatory Implementation Details
2
3This document provides detailed implementation notes for the [Announcements Purgatory Design](./announcements-purgatory-design.md).
4
5## Sync Integration
6
7### Current Sync Architecture
8
9The sync system uses a two-index approach:
10
11```rust
12// What we WANT to sync - source of truth from self-subscription
13// Key: repo addressable ref (30617:pubkey:identifier)
14pub type RepoSyncIndex = Arc<RwLock<HashMap<String, RepoSyncNeeds>>>;
15
16pub struct RepoSyncNeeds {
17 pub relays: HashSet<String>, // Relay URLs from announcement
18 pub root_events: HashSet<EventId>, // 1617/1618/1621 event IDs
19}
20
21// What we have CONFIRMED syncing + connection state
22// Key: relay URL
23pub type RelaySyncIndex = Arc<RwLock<HashMap<String, RelayState>>>;
24```
25
26**Three-Layer Sync Strategy:**
271. **Layer 1:** Announcements (kinds 30617, 10317)
282. **Layer 2:** Repo-tagging events (events with `a`/`A`/`q` tags + kind 30618 by identifier)
293. **Layer 3:** Root-event-tagging events (events with `e`/`E`/`q` tags)
30
31### Adding SyncLevel
32
33Add a `sync_level` field to distinguish purgatory from promoted repos:
34
35```rust
36#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
37pub enum SyncLevel {
38 #[default]
39 Full, // L2 + L3 (promoted repos)
40 StateOnly, // Only state events (purgatory announcements)
41}
42
43pub struct RepoSyncNeeds {
44 pub relays: HashSet<String>,
45 pub root_events: HashSet<EventId>,
46 pub sync_level: SyncLevel, // NEW
47}
48```
49
50### Filter Building Changes
51
52In `src/sync/filters.rs`, modify filter building to respect sync level:
53
54```rust
55// For StateOnly repos, only build state event filters
56pub fn build_layer2_and_layer3_filters(
57 repos: &HashMap<String, RepoSyncNeeds>,
58 // ...
59) -> Vec<Filter> {
60 let (full_repos, state_only_repos): (Vec<_>, Vec<_>) = repos
61 .iter()
62 .partition(|(_, needs)| needs.sync_level == SyncLevel::Full);
63
64 let mut filters = Vec::new();
65
66 // Full repos get all L2/L3 filters
67 if !full_repos.is_empty() {
68 filters.extend(tagged_one_of_our_repo_event_filters(&full_repos));
69 filters.extend(state_event_filters_for_our_repos(&full_repos));
70 filters.extend(tagged_one_of_our_root_event_filters(&full_repos));
71 }
72
73 // StateOnly repos get only state event filters
74 if !state_only_repos.is_empty() {
75 filters.extend(state_event_filters_for_our_repos(&state_only_repos));
76 }
77
78 filters
79}
80```
81
82The existing `state_event_filters_for_our_repos()` function already builds kind 30618 filters with `#d` tags, which is exactly what we need.
83
84### Self-Subscriber Changes
85
86In `src/sync/self_subscriber.rs`, add purgatory announcements to the sync index:
87
88```rust
89// When announcement enters purgatory
90fn on_announcement_to_purgatory(
91 &self,
92 event: &Event,
93 identifier: &str,
94 relays: HashSet<String>,
95) {
96 let key = format!("30617:{}:{}", event.pubkey, identifier);
97 let mut index = self.repo_sync_index.write().unwrap();
98 index.insert(key, RepoSyncNeeds {
99 relays,
100 root_events: HashSet::new(),
101 sync_level: SyncLevel::StateOnly,
102 });
103}
104
105// When announcement promotes to database
106fn on_announcement_promoted(
107 &self,
108 event: &Event,
109 identifier: &str,
110) {
111 let key = format!("30617:{}:{}", event.pubkey, identifier);
112 let mut index = self.repo_sync_index.write().unwrap();
113 if let Some(needs) = index.get_mut(&key) {
114 needs.sync_level = SyncLevel::Full;
115 }
116}
117```
118
119### Algorithm Changes
120
121In `src/sync/algorithms.rs`, preserve sync level when inverting repo->relay:
122
123```rust
124pub fn derive_relay_targets(
125 repo_index: &RepoSyncIndex,
126) -> HashMap<String, RelaySyncNeeds> {
127 // ... existing inversion logic ...
128 // Ensure sync_level is preserved/aggregated per relay
129 // A relay gets Full if ANY of its repos are Full
130}
131```
132
133## Authorization Integration
134
135### Current Authorization Flow
136
137Authorization lookups happen in `src/git/authorization.rs`:
138
139| Function | Purpose | Currently Queries |
140|----------|---------|-------------------|
141| `fetch_repository_data()` | Get announcements + states by identifier | DB only |
142| `collect_authorized_maintainers()` | Build maintainer set from announcements | DB only |
143| `pubkey_authorised_for_repo_owners()` | Check if pubkey authorized | DB only |
144
145### Required Changes
146
147Modify `fetch_repository_data()` to also query purgatory:
148
149```rust
150pub async fn fetch_repository_data(
151 db: &Database,
152 purgatory: &Purgatory, // NEW parameter
153 identifier: &str,
154) -> Result<RepositoryData> {
155 // Existing DB query
156 let db_events = db.query(/* kind 30617, 30618 by identifier */).await?;
157
158 // NEW: Also check purgatory for announcements
159 let purgatory_announcements = purgatory
160 .get_announcements_by_identifier(identifier);
161
162 // Merge results
163 let mut announcements = parse_announcements(db_events);
164 announcements.extend(purgatory_announcements);
165
166 // ... rest of function
167}
168```
169
170This affects:
171- `StatePolicy::process_state_event()` - state event validation
172- `get_state_authorization_for_specific_owner_repo()` - git push authorization
173- `AnnouncementPolicy::is_maintainer_in_any_announcement()` - maintainer exception
174
175## Purgatory Store Changes
176
177### New Fields
178
179```rust
180pub struct AnnouncementPurgatoryEntry {
181 pub event: Event,
182 pub identifier: String,
183 pub owner: PublicKey,
184 pub repo_path: PathBuf,
185 pub relays: HashSet<String>, // For sync registration
186 pub created_at: Instant,
187 pub expires_at: Instant,
188 pub soft_expired: bool, // Bare repo deleted, event retained
189}
190```
191
192### New Methods
193
194```rust
195impl Purgatory {
196 /// Get announcements by identifier (for authorization)
197 pub fn get_announcements_by_identifier(
198 &self,
199 identifier: &str,
200 ) -> Vec<&AnnouncementPurgatoryEntry> {
201 self.announcement_purgatory
202 .iter()
203 .filter(|entry| entry.identifier == identifier)
204 .collect()
205 }
206
207 /// Transition to soft-expired state (protocol's 30min expiry reached)
208 pub fn soft_expire_announcement(
209 &self,
210 key: &(PublicKey, String),
211 ) -> Option<PathBuf> {
212 if let Some(mut entry) = self.announcement_purgatory.get_mut(key) {
213 entry.soft_expired = true;
214 entry.expires_at = Instant::now() + SOFT_EXPIRY_DURATION; // e.g., 24h extended retention
215 Some(entry.repo_path.clone()) // Return path for bare repo deletion
216 } else {
217 None
218 }
219 }
220
221 /// Revive soft-expired announcement when state event arrives
222 /// (caller must recreate bare repo)
223 pub fn revive_announcement(
224 &self,
225 key: &(PublicKey, String),
226 ) -> Option<PathBuf> {
227 if let Some(mut entry) = self.announcement_purgatory.get_mut(key) {
228 if entry.soft_expired {
229 entry.soft_expired = false;
230 entry.expires_at = Instant::now() + ACTIVE_EXPIRY_DURATION; // Reset 30min protocol timer
231 return Some(entry.repo_path.clone()); // Caller recreates bare repo
232 }
233 }
234 None
235 }
236}
237```
238
239## Expiry Cleanup Task
240
241The existing cleanup task needs to handle the two-phase expiry:
242
243```rust
244async fn cleanup_expired_announcements(&self) {
245 let now = Instant::now();
246
247 for entry in self.announcement_purgatory.iter() {
248 if entry.expires_at <= now {
249 let key = (entry.owner.clone(), entry.identifier.clone());
250
251 if entry.soft_expired {
252 // Fully expired - remove entirely
253 self.announcement_purgatory.remove(&key);
254 self.unregister_from_sync(&key);
255 } else {
256 // First expiry - transition to soft-expired
257 if let Some(repo_path) = self.soft_expire_announcement(&key) {
258 delete_bare_repo(&repo_path).await;
259 }
260 // Note: stays in sync index with StateOnly level
261 }
262 }
263 }
264}
265```
266
267## State Event Revival Flow
268
269When a state event arrives for a soft-expired announcement, the state policy must:
270
2711. Check purgatory for a matching announcement (in addition to DB)
2722. Validate authorization against the purgatory announcement
2733. If soft-expired, call `revive_announcement()` and recreate the bare repo
2744. Extend the announcement's expiry (reset the 30-minute protocol timer)
2755. Route the state event to state purgatory
276
277**Why revival is necessary:** Without soft expiry + revival, late-arriving state events would either be permanently rejected (if we added the announcement to `failed_events`) or cause constant re-syncing of the announcement event. Revival allows us to respect the protocol's 30-minute expiry while still handling delayed state events gracefully.
278
279The exact integration will depend on the current structure of `StatePolicy::process_state_event()` - see implementation phase for details.
280
281## File Change Summary
282
283| File | Estimated Lines | Changes |
284|------|-----------------|---------|
285| `src/sync/mod.rs` | ~10 | Add `SyncLevel` enum, field to `RepoSyncNeeds` |
286| `src/sync/filters.rs` | ~20 | Partition repos by sync level, build appropriate filters |
287| `src/sync/algorithms.rs` | ~15 | Preserve sync level in relay target derivation |
288| `src/sync/self_subscriber.rs` | ~40 | Register purgatory announcements, handle promotion |
289| `src/purgatory/mod.rs` | ~80 | Add announcement store, soft expiry methods |
290| `src/purgatory/types.rs` | ~20 | Add `AnnouncementPurgatoryEntry` |
291| `src/git/authorization.rs` | ~30 | Query purgatory in `fetch_repository_data()` |
292| `src/nostr/policy/state.rs` | ~40 | Handle soft-expired revival |
293| `src/nostr/policy/announcement.rs` | ~30 | Route to purgatory, check for replacements |
294| `src/git/receive.rs` | ~20 | Trigger promotion on git data |
295
296**Total: ~305 lines of changes**
diff --git a/docs/explanation/grasp-02-proactive-sync-purgatory-git-data.md b/docs/explanation/grasp-02-proactive-sync-purgatory-git-data.md
index 31c3e46..8fb5798 100644
--- a/docs/explanation/grasp-02-proactive-sync-purgatory-git-data.md
+++ b/docs/explanation/grasp-02-proactive-sync-purgatory-git-data.md
@@ -12,7 +12,13 @@
12 12
13## Overview 13## Overview
14 14
15When Nostr events arrive before their git data, they enter **purgatory** waiting to be served. But they don't wait passively—ngit-grasp **actively hunts** for the missing git data across all git servers assoicated with the repo until it finds what it needs. 15When Nostr events arrive before their git data, they enter **purgatory** waiting to be served. But they don't wait passively—ngit-grasp **actively hunts** for the missing git data across all git servers associated with the repo until it finds what it needs.
16
17This applies to three types of purgatory entries:
18
19- **Announcement purgatory** — kind 30617 announcements waiting for a git push to prove the repo has content
20- **State event purgatory** — kind 30618 state events waiting for their referenced git objects
21- **PR event purgatory** — kind 1617/1618 PR events waiting for their referenced commits
16 22
17### How It Works 23### How It Works
18 24
@@ -42,6 +48,7 @@ We respect remote server capacity with:
42✅ **Respectful throttling** - 5 concurrent + 30/min per domain, plays nice with other implementations 48✅ **Respectful throttling** - 5 concurrent + 30/min per domain, plays nice with other implementations
43✅ **Smart timing** - 3min delay for user pushes, 500ms for synced events 49✅ **Smart timing** - 3min delay for user pushes, 500ms for synced events
44✅ **30min expiry** - Auto-cleanup of events when data never arrives 50✅ **30min expiry** - Auto-cleanup of events when data never arrives
51✅ **Soft expiry for announcements** - Bare repo deleted at 30min, event retained 24h to allow revival
45✅ **Fully testable** - Mock-based architecture for reliable unit tests 52✅ **Fully testable** - Mock-based architecture for reliable unit tests
46 53
47--- 54---
@@ -73,6 +80,16 @@ Timeline D: Data never arrives
73 t=60s: Retry → all servers checked, no data 80 t=60s: Retry → all servers checked, no data
74 ... 81 ...
75 t=1800s: 30 minutes expired → event discarded, purgatory cleaned up 🗑️ 82 t=1800s: 30 minutes expired → event discarded, purgatory cleaned up 🗑️
83
84Timeline E: Announcement purgatory (no git data within 30 min)
85 t=0s: Announcement received → bare repo created, enters announcement purgatory
86 t=0.5s: Start hunting git servers for any content
87 ...
88 t=1800s: 30 minutes expired → bare repo deleted, event retained (soft_expired=true)
89 t=3600s: State event arrives (slow sync) → bare repo recreated, expiry reset ✅
90 t=5400s: Git push arrives → announcement promoted to DB, served to clients ✅
91 OR
92 t=86400s: 24 hours elapsed, no revival → event added to expired_events, removed 🗑️
76``` 93```
77 94
78**Without proactive sync**: Events in Timeline C would wait indefinitely (or until manual git push). 95**Without proactive sync**: Events in Timeline C would wait indefinitely (or until manual git push).
@@ -330,11 +347,11 @@ Both methods check `has_capacity()` and trigger `try_process_next()` if true.
330 347
331--- 348---
332 349
333## 30-Minute Purgatory Expiry 350## Purgatory Expiry
334 351
335Purgatory entries **automatically expire** after 30 minutes to prevent unbounded memory growth. 352### State and PR Events: 30-Minute Hard Expiry
336 353
337### Why 30 Minutes? 354State and PR purgatory entries **automatically expire** after 30 minutes.
338 355
339From the [GRASP-01 spec](https://github.com/DanConwayDev/grasp/blob/main/01.md#purgatory): 356From the [GRASP-01 spec](https://github.com/DanConwayDev/grasp/blob/main/01.md#purgatory):
340 357
@@ -346,25 +363,40 @@ This balances:
346- 🧹 **Short enough** to prevent memory leaks from abandoned events 363- 🧹 **Short enough** to prevent memory leaks from abandoned events
347- 🔄 **Recoverable** events are still on other relays and can be re-submitted 364- 🔄 **Recoverable** events are still on other relays and can be re-submitted
348 365
349### Implementation 366Each entry tracks `expires_at: Instant` (30 min from creation). The sync loop checks expiry before processing via `has_pending_events()`. If all events for an identifier have expired, the identifier is removed from the sync queue.
350 367
351Each purgatory entry tracks: 368To prevent infinite re-sync loops, expired event IDs are added to an `expired_events` set. If a sync delivers an event that previously expired, it is rejected with `"previously expired from purgatory without git data"`.
352 369
353- `created_at: Instant` - When added to purgatory 370**Implementation**: [`src/purgatory/mod.rs:DEFAULT_EXPIRY`](../../src/purgatory/mod.rs)
354- `expires_at: Instant` - When to discard (created_at + 30min)
355 371
356The main sync loop checks expiry before processing: 372### Announcement Purgatory: Two-Phase Soft Expiry
357 373
358```rust 374Announcements use a different expiry strategy because they have an additional concern: the bare git repo created on arrival must be cleaned up, but we also need to avoid re-syncing the announcement event on every sync cycle.
359if !self.has_pending_events(&identifier) {
360 // No events remain (expired or released) → remove from sync queue
361 self.sync_queue.remove(&identifier);
362}
363```
364 375
365**Note**: Expiry is checked implicitly via `has_pending_events()`. If all events for an identifier have expired, the identifier is removed from the sync queue. 376**Phase 1 — Initial 30-minute expiry:**
366 377
367**Implementation**: [`src/purgatory/mod.rs:DEFAULT_EXPIRY`](../../src/purgatory/mod.rs) 378- Delete the bare git repo (frees disk space, respects the protocol's 30-minute expiry)
379- Set `soft_expired = true` on the entry
380- Extend `expires_at` by **24 hours** (`SOFT_EXPIRY_EXTENDED`)
381- Continue syncing state events for this repo (same as active purgatory)
382
383**Phase 2 — 24-hour soft expiry:**
384
385- Add event ID to `expired_events` (prevents re-sync loops)
386- Remove entry completely from `announcement_purgatory`
387
388**Why not just hard-expire at 30 minutes?**
389
390The protocol's 30-minute expiry creates a dilemma for announcements:
391
392- **Option A: Add to `failed_events` at 30 min** → Permanently rejects future state events, losing potential revival when state events arrive late (e.g. from a slow sync)
393- **Option B: Remove entirely at 30 min** → The announcement gets re-fetched on every subsequent sync cycle, wasting bandwidth indefinitely
394
395Soft expiry is the solution: the bare repo is deleted at 30 minutes (respecting the protocol), but the event is retained for 24 hours. During this window, a late-arriving state event can **revive** the announcement—`extend_announcement_expiry()` recreates the bare repo, clears `soft_expired`, and resets the 30-minute timer. After 24 hours with no revival, the event is added to `expired_events` and fully removed.
396
397**Why 24 hours specifically?** This covers the worst-case sync delay. A relay that was offline for up to 24 hours will re-sync state events when it reconnects. The 24-hour window ensures announcements remain revivable throughout that period without permanently occupying disk space.
398
399**Implementation**: [`src/purgatory/mod.rs:SOFT_EXPIRY_EXTENDED`](../../src/purgatory/mod.rs)
368 400
369--- 401---
370 402
@@ -670,6 +702,7 @@ The purgatory sync system is a sophisticated, production-ready implementation th
670✅ **Throttles respectfully** - 5 concurrent + 30/min per domain, round-robin fairness 702✅ **Throttles respectfully** - 5 concurrent + 30/min per domain, round-robin fairness
671✅ **Times strategically** - 3min for user events, 500ms for synced events 703✅ **Times strategically** - 3min for user events, 500ms for synced events
672✅ **Expires responsibly** - 30min auto-cleanup prevents memory leaks 704✅ **Expires responsibly** - 30min auto-cleanup prevents memory leaks
705✅ **Soft-expires announcements** - Bare repo deleted at 30min, event retained 24h for revival
673✅ **Tests thoroughly** - Mock-based architecture enables comprehensive unit tests 706✅ **Tests thoroughly** - Mock-based architecture enables comprehensive unit tests
674 707
675This design ensures ngit-grasp can serve repositories reliably even when git data and Nostr events arrive out-of-order or from different sources, while respecting remote server capacity and providing excellent observability. 708This design ensures ngit-grasp can serve repositories reliably even when git data and Nostr events arrive out-of-order or from different sources, while respecting remote server capacity and providing excellent observability.
diff --git a/docs/explanation/grasp-02-proactive-sync.md b/docs/explanation/grasp-02-proactive-sync.md
index ed8fdbf..6696e27 100644
--- a/docs/explanation/grasp-02-proactive-sync.md
+++ b/docs/explanation/grasp-02-proactive-sync.md
@@ -47,20 +47,37 @@ This state starts afresh when the binary loads.
47### RepoSyncIndex (Source of Truth) 47### RepoSyncIndex (Source of Truth)
48 48
49```rust 49```rust
50/// What we WANT to sync - derived from events received via self-subscription. 50/// What we WANT to sync - derived from events received via self-subscription
51/// Updated immediately when self-subscriber batch fires. 51/// and from purgatory announcements.
52/// Updated immediately when self-subscriber batch fires or purgatory sync timer runs.
52/// Key: repo addressable ref - 30617:pubkey:identifier 53/// Key: repo addressable ref - 30617:pubkey:identifier
53pub type RepoSyncIndex = Arc<RwLock<HashMap<String, RepoSyncNeeds>>>; 54pub type RepoSyncIndex = Arc<RwLock<HashMap<String, RepoSyncNeeds>>>;
54 55
56/// Controls which sync filters are built for a repo
57#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
58pub enum SyncLevel {
59 #[default]
60 Full, // Full L2 + L3 sync (promoted repos with git data)
61 StateOnly, // Only state events (kind 30618) — for purgatory announcements
62}
63
55#[derive(Debug, Clone, Default)] 64#[derive(Debug, Clone, Default)]
56pub struct RepoSyncNeeds { 65pub struct RepoSyncNeeds {
57 /// Relay URLs listed in this repo's 30617 announcement 66 /// Relay URLs listed in this repo's 30617 announcement
58 pub relays: HashSet<String>, 67 pub relays: HashSet<String>,
59 /// Root event IDs - 1617/1618/1621 - that reference this repo 68 /// Root event IDs - 1617/1618/1621 - that reference this repo
60 pub root_events: HashSet<EventId>, 69 pub root_events: HashSet<EventId>,
70 /// Controls which filters are built: Full (L2+L3) or StateOnly (kind 30618 only)
71 pub sync_level: SyncLevel,
61} 72}
62``` 73```
63 74
75**Two sources populate `RepoSyncIndex`:**
76
771. **`SelfSubscriber`** — monitors the relay's own event stream for accepted announcements (kinds 30617, 1617, 1618, 1621). Adds entries with `SyncLevel::Full`. When an announcement is promoted from purgatory to the database, the SelfSubscriber sees it and upgrades the entry to `Full`.
78
792. **Purgatory announcement sync timer** (`run_purgatory_announcement_sync`, every 5 seconds) — iterates `purgatory.announcements_for_sync()` and ensures each purgatory announcement has a `SyncLevel::StateOnly` entry in `RepoSyncIndex`. This is the only registration path for purgatory announcements because they are not saved to the database and therefore never seen by the SelfSubscriber.
80
64### RelaySyncIndex (Confirmed State + Connection) 81### RelaySyncIndex (Confirmed State + Connection)
65 82
66```rust 83```rust
@@ -336,7 +353,23 @@ The sync system uses three background tasks that run continuously:
336 353
3371. Queue events to `PendingUpdates` 3541. Queue events to `PendingUpdates`
3382. Timer fires (interval, does not reset on events) 3552. Timer fires (interval, does not reset on events)
3393. Process batch: update RepoSyncIndex → derive targets → send AddFilters to SyncManager 3563. Process batch: update RepoSyncIndex with `SyncLevel::Full` → derive targets → send AddFilters to SyncManager
357
358**Note**: The SelfSubscriber only sees announcements that have been accepted to the database (promoted from purgatory). Purgatory announcements are registered separately by the purgatory sync timer (see below).
359
360### 4. Purgatory Announcement Sync Timer (`run_purgatory_announcement_sync`)
361
362**Purpose**: Register purgatory announcements in `RepoSyncIndex` so state events are synced for them
363
364**Interval**: Every 5 seconds (200ms in test mode)
365
366**Flow**:
367
3681. Iterate `purgatory.announcements_for_sync()`
3692. For each announcement not already in `RepoSyncIndex`: insert with `SyncLevel::StateOnly`
3703. When an announcement is promoted (git data arrives), the SelfSubscriber sees the newly accepted event and upgrades the entry to `SyncLevel::Full`
371
372**Why a separate timer?** Purgatory announcements are never saved to the database, so the SelfSubscriber never sees them. The timer bridges this gap, ensuring state events are synced for repos that may still receive git data.
340 373
341--- 374---
342 375
@@ -602,9 +635,10 @@ flowchart TB
602 635
603- Self-subscriber monitors own relay for 30617, 1617, 1618, 1621 (NOT 1619 or 30618) 636- Self-subscriber monitors own relay for 30617, 1617, 1618, 1621 (NOT 1619 or 30618)
604- Batches events in `PendingUpdates` (5 second window via interval timer) 637- Batches events in `PendingUpdates` (5 second window via interval timer)
605- `process_batch()` updates RepoSyncIndex, then builds AddFilters **directly** (no compute_actions) 638- `process_batch()` updates RepoSyncIndex with `SyncLevel::Full`, then builds AddFilters **directly** (no compute_actions)
606- AddFilters sent via channel to SyncManager, which calls `handle_new_sync_filters()` 639- AddFilters sent via channel to SyncManager, which calls `handle_new_sync_filters()`
607- This path does NOT use compute_actions because it's building fresh filters from the updated index 640- This path does NOT use compute_actions because it's building fresh filters from the updated index
641- Purgatory announcements (not in DB) are registered separately by the purgatory sync timer with `SyncLevel::StateOnly`
608 642
609--- 643---
610 644
@@ -687,16 +721,23 @@ fn compute_actions(
687- **Tags**: lowercase `a`, uppercase `A`, and `q` tags for comprehensive coverage 721- **Tags**: lowercase `a`, uppercase `A`, and `q` tags for comprehensive coverage
688- **Batching**: Per 100 repo refs 722- **Batching**: Per 100 repo refs
689- **Function**: `build_repo_tag_filters(repos, since)` 723- **Function**: `build_repo_tag_filters(repos, since)`
724- **Only for `SyncLevel::Full` repos** — purgatory announcements (`StateOnly`) skip this layer
690 725
691### Layer 3: Events Tagging Our Root Events 726### Layer 3: Events Tagging Our Root Events
692 727
693- **Tags**: lowercase `e`, uppercase `E`, and `q` tags for comprehensive coverage 728- **Tags**: lowercase `e`, uppercase `E`, and `q` tags for comprehensive coverage
694- **Batching**: Per 100 event IDs 729- **Batching**: Per 100 event IDs
695- **Function**: `build_root_event_tag_filters(root_events, since)` 730- **Function**: `build_root_event_tag_filters(root_events, since)`
731- **Only for `SyncLevel::Full` repos** — purgatory announcements (`StateOnly`) skip this layer
732
733### Combined Layer 2+3 (SyncLevel-Aware)
734
735The `build_sync_level_aware_filters()` function combines both layers, partitioning repos by `SyncLevel`:
696 736
697### Combined Layer 2+3 737- **`Full` repos**: state event filters + repo-tag filters + root-event-tag filters
738- **`StateOnly` repos**: state event filters only (kind 30618 with `#d` tags)
698 739
699The `build_layer2_and_layer3_filters()` function combines both layers. Used by: 740Used by:
700 741
701- `recompute_new_sync_filters_for_relay` for new item subscriptions 742- `recompute_new_sync_filters_for_relay` for new item subscriptions
702- `reconstruct_filters` for rebuilding from confirmed state 743- `reconstruct_filters` for rebuilding from confirmed state
@@ -871,9 +912,9 @@ flowchart TB
871 912
872``` 913```
873src/sync/ 914src/sync/
874├── mod.rs # SyncManager, main loop, data structures 915├── mod.rs # SyncManager, main loop, data structures, SyncLevel, run_purgatory_announcement_sync
875├── algorithms.rs # derive_relay_targets(), compute_actions() 916├── algorithms.rs # derive_relay_targets(), compute_actions()
876├── filters.rs # build_announcement_filter(), build_layer2_and_layer3_filters() 917├── filters.rs # build_announcement_filter(), build_sync_level_aware_filters()
877├── health.rs # RelayHealthTracker with exponential backoff 918├── health.rs # RelayHealthTracker with exponential backoff
878├── relay_connection.rs # RelayConnection, RelayEvent handling 919├── relay_connection.rs # RelayConnection, RelayEvent handling
879├── self_subscriber.rs # SelfSubscriber with batching 920├── self_subscriber.rs # SelfSubscriber with batching
diff --git a/docs/explanation/purgatory-design.md b/docs/explanation/purgatory-design.md
index b984745..bd792d4 100644
--- a/docs/explanation/purgatory-design.md
+++ b/docs/explanation/purgatory-design.md
@@ -8,7 +8,11 @@
8 8
9## Overview 9## Overview
10 10
11Purgatory is an in-memory holding area that solves the **"which arrives first?"** problem in GRASP. Either nostr events or git pushes can arrive in any order: 11Purgatory is an in-memory holding area that solves two related problems in GRASP:
12
13### Problem 1: "Which arrives first?" (State and PR events)
14
15Either nostr events or git pushes can arrive in any order:
12 16
13- **Event first**: Event waits in purgatory until git data arrives 17- **Event first**: Event waits in purgatory until git data arrives
14- **Git first**: Placeholder waits in purgatory until event arrives 18- **Git first**: Placeholder waits in purgatory until event arrives
@@ -19,6 +23,18 @@ When both halves arrive, they are processed together and saved to the database.
19 23
20> Accepted repo state announcements, PRs and PR Updates SHOULD be accepted with message "purgatory: won't be served until git data arrives" and kept in purgatory (not served) until the related git data arrives and otherwise discarded after 30 minutes. 24> Accepted repo state announcements, PRs and PR Updates SHOULD be accepted with message "purgatory: won't be served until git data arrives" and kept in purgatory (not served) until the related git data arrives and otherwise discarded after 30 minutes.
21 25
26### Problem 2: Misleading empty repository announcements
27
28When a repository announcement arrives, we must create the bare git repo immediately so pushes can succeed. But if no git data ever arrives, we would serve an empty repo and its announcement indefinitely—clients see the announcement, try to clone, and get nothing.
29
30**Solution**: New announcements go to **announcement purgatory** instead of being immediately accepted:
31
321. **Announcement arrives** → Create bare repo immediately, add announcement to purgatory
332. **Git data arrives** → Promote announcement from purgatory to active (now served to clients)
343. **No git data before expiry** → Delete bare repo, discard announcement (never served)
35
36This ensures we only serve announcements for repos that actually have content.
37
22--- 38---
23 39
24## Key Design Principles 40## Key Design Principles
@@ -31,16 +47,15 @@ Purgatory data is **not persisted** to disk. On restart, all purgatory entries a
31- Git data can be re-pushed 47- Git data can be re-pushed
32- 30-minute expiry means data is transient anyway 48- 30-minute expiry means data is transient anyway
33 49
34### 2. Separate Storage for State vs PR Events 50### 2. Separate Storage for Each Event Type
35
36State events (kind 30618) and PR events (kind 1617/1618) have fundamentally different matching patterns:
37 51
38| Event Type | Index | Matching Strategy | 52| Store | Index | Purpose |
39|------------|-------|-------------------| 53|-------|-------|---------|
40| **State Events** | `identifier` (d tag) | Compare refs at push time | 54| `announcement_purgatory` | `(PublicKey, String)` — `(owner, identifier)` | Announcements awaiting git data |
41| **PR Events** | `event_id` (hex string) | Direct match via `refs/nostr/<event-id>` | 55| `state_events` | `identifier` (d tag) | State events awaiting git data |
56| `pr_events` | `event_id` (hex string) | PR events awaiting git data |
42 57
43They use **separate DashMap stores** for efficient concurrent access. 58Announcement purgatory uses `(pubkey, identifier)` because identifier alone is not unique across different owners.
44 59
45### 3. Late Binding for State Events 60### 3. Late Binding for State Events
46 61
@@ -78,7 +93,23 @@ With purgatory checking during authorization:
782. Git push arrives → Checks **database + purgatory** → State found → **AUTHORIZED** ✅ 932. Git push arrives → Checks **database + purgatory** → State found → **AUTHORIZED** ✅
793. After push succeeds → Save event to database → Remove from purgatory 943. After push succeeds → Save event to database → Remove from purgatory
80 95
81See [`src/git/authorization.rs:51-162`](../../src/git/authorization.rs) for implementation. 96See [`src/git/authorization.rs`](../../src/git/authorization.rs) for implementation.
97
98### 6. Announcement Purgatory: Bare Repo Created Immediately
99
100**Decision:** Create the bare git repo when announcement enters purgatory.
101
102**Why:** Git pushes may arrive at any time. Without a repo, pushes fail.
103
104**Consequence:** We allocate disk space for repos that may expire unused. Must delete repos on expiry.
105
106### 7. Replacement Announcements Skip Purgatory
107
108**Decision:** Announcements replacing an existing active (database) announcement are accepted immediately.
109
110**Why:** The repository is already proven active with content.
111
112**How:** Check if active announcement exists for `(pubkey, identifier)` before routing to purgatory.
82 113
83--- 114---
84 115
@@ -103,22 +134,54 @@ pub struct RefUpdate {
103} 134}
104``` 135```
105 136
137### Announcement Purgatory Entry
138
139```rust
140pub struct AnnouncementPurgatoryEntry {
141 /// The kind 30617 announcement event
142 pub event: Event,
143
144 /// Repository identifier from 'd' tag
145 pub identifier: String,
146
147 /// Event author pubkey
148 pub owner: PublicKey,
149
150 /// Path to the bare git repo on disk (created immediately on entry)
151 pub repo_path: PathBuf,
152
153 /// Relay URLs from 'relays'/'clone' tags — for sync registration
154 pub relays: HashSet<String>,
155
156 /// When added to purgatory
157 pub created_at: Instant,
158
159 /// Expiry deadline (30 min from creation, may be extended)
160 pub expires_at: Instant,
161
162 /// Whether the bare repo has been deleted (soft expiry phase)
163 pub soft_expired: bool,
164}
165```
166
167**Indexed by `(pubkey, identifier)`** because identifier is not unique across different owners.
168
106### State Purgatory Entry 169### State Purgatory Entry
107 170
108```rust 171```rust
109pub struct StatePurgatoryEntry { 172pub struct StatePurgatoryEntry {
110 /// The nostr state event (kind 30618) awaiting git data 173 /// The nostr state event (kind 30618) awaiting git data
111 pub event: Event, 174 pub event: Event,
112 175
113 /// Repository identifier from 'd' tag 176 /// Repository identifier from 'd' tag
114 pub identifier: String, 177 pub identifier: String,
115 178
116 /// Event author pubkey 179 /// Event author pubkey
117 pub author: PublicKey, 180 pub author: PublicKey,
118 181
119 /// When added to purgatory 182 /// When added to purgatory
120 pub created_at: Instant, 183 pub created_at: Instant,
121 184
122 /// Expiry deadline (30 min from creation, may be extended) 185 /// Expiry deadline (30 min from creation, may be extended)
123 pub expires_at: Instant, 186 pub expires_at: Instant,
124} 187}
@@ -132,14 +195,14 @@ pub struct StatePurgatoryEntry {
132pub struct PrPurgatoryEntry { 195pub struct PrPurgatoryEntry {
133 /// The nostr PR event, if received (None = git data arrived first) 196 /// The nostr PR event, if received (None = git data arrived first)
134 pub event: Option<Event>, 197 pub event: Option<Event>,
135 198
136 /// Expected commit SHA from 'c' tag (if event exists) 199 /// Expected commit SHA from 'c' tag (if event exists)
137 /// or actual commit pushed (if git arrived first) 200 /// or actual commit pushed (if git arrived first)
138 pub commit: String, 201 pub commit: String,
139 202
140 /// When added to purgatory 203 /// When added to purgatory
141 pub created_at: Instant, 204 pub created_at: Instant,
142 205
143 /// Expiry deadline (30 min from creation) 206 /// Expiry deadline (30 min from creation)
144 pub expires_at: Instant, 207 pub expires_at: Instant,
145} 208}
@@ -151,24 +214,155 @@ pub struct PrPurgatoryEntry {
151 214
152```rust 215```rust
153pub struct Purgatory { 216pub struct Purgatory {
217 /// Announcement events indexed by (owner, identifier)
218 announcement_purgatory: DashMap<(PublicKey, String), AnnouncementPurgatoryEntry>,
219
154 /// State events indexed by identifier (d tag) 220 /// State events indexed by identifier (d tag)
155 /// Multiple state events per identifier allowed (different authors) 221 /// Multiple state events per identifier allowed (different authors)
156 state_events: Arc<DashMap<String, Vec<StatePurgatoryEntry>>>, 222 state_events: DashMap<String, Vec<StatePurgatoryEntry>>,
157 223
158 /// PR events indexed by event_id (hex string) 224 /// PR events indexed by event_id (hex string)
159 /// Single entry per event ID 225 /// Single entry per event ID
160 pr_events: Arc<DashMap<String, PrPurgatoryEntry>>, 226 pr_events: DashMap<String, PrPurgatoryEntry>,
161 227
162 /// Sync queue for background git data fetching 228 /// Sync queue for background git data fetching
163 sync_queue: Arc<DashMap<String, SyncQueueEntry>>, 229 sync_queue: DashMap<String, SyncQueueEntry>,
164 230
165 _git_data_path: PathBuf, 231 /// Events that previously expired without git data (prevents re-sync loops)
232 expired_events: DashMap<EventId, Instant>,
166} 233}
167``` 234```
168 235
169--- 236---
170 237
171## Event Flows 238## Announcement Purgatory Flows
239
240### New Announcement Flow
241
242```
243Announcement arrives
244 |
245 v
246Is there an active announcement for (pubkey, identifier) in DB?
247 |
248 +-- YES --> Accept immediately (replacement, repo already proven)
249 |
250 +-- NO --> Is there a purgatory entry for (pubkey, identifier)?
251 |
252 +-- YES --> Replace purgatory entry, extend expiry 30 min
253 | Return OK to client (but don't serve)
254 |
255 +-- NO --> Create bare repo
256 Add to purgatory
257 Return OK to client (but don't serve)
258```
259
260### Git Data Arrival → Promotion
261
262```
263Git push/fetch completes with data
264 |
265 v
266process_purgatory_announcements() called
267 |
268 v
269Is there a purgatory announcement for (owner, identifier)?
270 |
271 +-- YES --> promote_announcement() removes from purgatory
272 | Save event to database
273 | Notify WebSocket clients
274 | (Sync upgrades to Full automatically via SelfSubscriber)
275 |
276 +-- NO --> Normal processing
277```
278
279### State Event Arrival for Purgatory Announcement
280
281```
282State event arrives
283 |
284 v
285fetch_repository_data_with_purgatory() checks DB + purgatory
286 |
287 +-- Announcement found in purgatory -->
288 | Validate authorization against purgatory announcement
289 | Extend purgatory announcement expiry (reset 30-min timer)
290 | If soft-expired: recreate bare repo, clear soft_expired flag
291 | Route state event to state purgatory
292 |
293 +-- No announcement anywhere --> Reject
294```
295
296### Announcement Expiry (Two-Phase Soft Expiry)
297
298The protocol specifies 30-minute expiry for announcements. We implement a two-phase soft expiry:
299
300**Phase 1 — Initial 30-minute expiry (`soft_expired == false`):**
301- Delete the bare git repo (frees disk space, respects protocol expiry)
302- Set `soft_expired = true`
303- Extend `expires_at` by 24 hours (`SOFT_EXPIRY_EXTENDED`)
304- Continue syncing state events (same as active purgatory)
305
306**Phase 2 — 24-hour soft expiry (`soft_expired == true`):**
307- Add event ID to `expired_events` (prevents re-sync loops)
308- Remove entry completely from `announcement_purgatory`
309
310**Why soft expiry?** Without it, we'd face a dilemma:
311
312- Add expired announcements to `failed_events` → permanently reject future state events, losing potential revival when state events arrive late
313- Re-fetch the announcement event on every sync cycle → wasting bandwidth and creating unnecessary sync traffic
314
315Soft expiry retains the event for 24 hours so that late-arriving state events (e.g. from a slow sync) can revive the announcement without forcing a full re-announcement flow.
316
317**Revival:** If a state event arrives for a soft-expired announcement, `extend_announcement_expiry()` recreates the bare repo, clears `soft_expired`, and resets the 30-minute timer.
318
319### Expiry Extension Triggers
320
321The 30-minute purgatory timer is reset (extended) in three scenarios:
322
323| Trigger | Location | Why |
324|---------|----------|-----|
325| State event arrives | `StatePolicy::process_state_event()` | Repo is actively receiving metadata |
326| Git push authorized against purgatory state | `get_state_authorization_for_specific_owner_repo()` | Repo is actively receiving git data |
327| Replacement announcement arrives | `AnnouncementPolicy::validate()` | Announcement updated |
328
329All three call `purgatory.extend_announcement_expiry(owner, identifier, 1800s)`.
330
331### Purgatory Lifecycle
332
333```
334 ┌─────────────────────────────────────┐
335 │ │
336 v │
337Announcement ──> ACTIVE ──────────────────────────────────┤
338 arrives (bare repo exists) │
339 │ │
340 ├── Git data ──> PROMOTED (exit) │
341 │ │
342 ├── Deletion ──> REMOVED (exit) │
343 │ │
344 v │
345 SOFT_EXPIRED ──────────────────────────────┘
346 (bare repo deleted, ^
347 event retained) │
348 │ │
349 ├── State event arrives (revival)
350
351 └── Extended expiry ──> REMOVED (exit)
352```
353
354| Exit | Trigger | Action |
355|------|---------|--------|
356| **Promotion** | Git data arrives | Move to database, sync upgrades to Full |
357| **Soft expiry** | Initial 30-min timeout | Delete bare repo, retain event, continue sync |
358| **Full expiry** | 24-hour soft expiry | Add to expired_events, remove from purgatory |
359| **Deletion** | Kind 5 event | Delete bare repo, remove from purgatory |
360| **Replacement** | Newer announcement (same pubkey, identifier) | Replace entry, extend expiry |
361| **Service change** | Newer announcement removes our service | Remove from purgatory |
362
363---
364
365## State and PR Event Flows
172 366
173### State Event Arrival (Kind 30618) 367### State Event Arrival (Kind 30618)
174 368
@@ -377,11 +571,12 @@ Purgatory includes a background sync system that fetches git data from remote se
377 571
378┌─────────────────────────────────────────────────────┐ 572┌─────────────────────────────────────────────────────┐
379│ process_newly_available_git_data(repo, oids) │ 573│ process_newly_available_git_data(repo, oids) │
380│ 1. Find satisfiable state events in purgatory │ 574│ 1. Find satisfiable announcement in purgatory │
381│ 2. Find satisfiable PR events in purgatory │ 575│ 2. Find satisfiable state events in purgatory │
382│ 3. Save events to database │ 576│ 3. Find satisfiable PR events in purgatory │
383│ 4. Sync git data to other owner repos │ 577│ 4. Save events to database │
384│ 5. Remove from purgatory │ 578│ 5. Sync git data to other owner repos │
579│ 6. Remove from purgatory │
385└─────────────────────────────────────────────────────┘ 580└─────────────────────────────────────────────────────┘
386``` 581```
387 582
@@ -402,8 +597,8 @@ pub struct SyncQueueEntry {
402 597
403**Backoff strategy:** 598**Backoff strategy:**
404- First attempt: 20 seconds 599- First attempt: 20 seconds
405- Second attempt: 2 minutes 600- Second attempt: 40 seconds
406- Subsequent attempts: 2 minutes 601- Subsequent attempts: capped at 2 minutes
407 602
408### Sync Delays 603### Sync Delays
409 604
@@ -428,7 +623,7 @@ pub struct ThrottleManager {
428``` 623```
429 624
430**Rate limiting:** 625**Rate limiting:**
431- Default: 5 requests per domain per 30 seconds 626- Default: 5 concurrent requests per domain, 30 requests per minute
432- Tracks request timestamps in a sliding window 627- Tracks request timestamps in a sliding window
433- Queues identifiers when domain is throttled 628- Queues identifiers when domain is throttled
434- Processes queue when capacity frees up 629- Processes queue when capacity frees up
@@ -439,7 +634,47 @@ See [`src/purgatory/sync/throttle.rs`](../../src/purgatory/sync/throttle.rs) for
439 634
440## Purgatory API 635## Purgatory API
441 636
442### Adding Entries 637### Announcement Purgatory
638
639```rust
640impl Purgatory {
641 /// Add an announcement to purgatory (bare repo already created by caller)
642 pub fn add_announcement(
643 &self,
644 event: Event,
645 identifier: String,
646 owner: PublicKey,
647 repo_path: PathBuf,
648 relays: HashSet<String>,
649 );
650
651 /// Promote announcement: remove from purgatory, return event for DB save
652 pub fn promote_announcement(
653 &self,
654 owner: &PublicKey,
655 identifier: &str,
656 ) -> Option<Event>;
657
658 /// Get announcements by identifier (for authorization checks)
659 pub fn get_announcements_by_identifier(
660 &self,
661 identifier: &str,
662 ) -> Vec<AnnouncementPurgatoryEntry>;
663
664 /// Extend expiry (and revive soft-expired entries, recreating bare repo)
665 pub fn extend_announcement_expiry(
666 &self,
667 owner: &PublicKey,
668 identifier: &str,
669 duration: Duration,
670 );
671
672 /// Get all announcements for sync registration
673 pub fn announcements_for_sync(&self) -> Vec<AnnouncementPurgatoryEntry>;
674}
675```
676
677### State and PR Purgatory
443 678
444```rust 679```rust
445impl Purgatory { 680impl Purgatory {
@@ -453,13 +688,7 @@ impl Purgatory {
453 688
454 /// Add a PR placeholder (git-data-first scenario) 689 /// Add a PR placeholder (git-data-first scenario)
455 pub fn add_pr_placeholder(&self, event_id: String, commit: String); 690 pub fn add_pr_placeholder(&self, event_id: String, commit: String);
456}
457```
458 691
459### Finding Entries
460
461```rust
462impl Purgatory {
463 /// Find state events waiting for an identifier 692 /// Find state events waiting for an identifier
464 pub fn find_state(&self, identifier: &str) -> Vec<StatePurgatoryEntry>; 693 pub fn find_state(&self, identifier: &str) -> Vec<StatePurgatoryEntry>;
465 694
@@ -476,13 +705,7 @@ impl Purgatory {
476 705
477 /// Find a PR placeholder specifically (git-data-first) 706 /// Find a PR placeholder specifically (git-data-first)
478 pub fn find_pr_placeholder(&self, event_id: &str) -> Option<String>; 707 pub fn find_pr_placeholder(&self, event_id: &str) -> Option<String>;
479}
480```
481 708
482### Removing Entries
483
484```rust
485impl Purgatory {
486 /// Remove all state events for an identifier 709 /// Remove all state events for an identifier
487 pub fn remove_state(&self, identifier: &str); 710 pub fn remove_state(&self, identifier: &str);
488 711
@@ -499,36 +722,14 @@ impl Purgatory {
499```rust 722```rust
500impl Purgatory { 723impl Purgatory {
501 /// Remove expired entries (called every 60 seconds) 724 /// Remove expired entries (called every 60 seconds)
502 /// Returns (state_removed, pr_removed) 725 /// Handles two-phase soft expiry for announcements
503 pub fn cleanup(&self) -> (usize, usize); 726 pub fn cleanup(&self);
504 727
505 /// Extend expiry for entries about to be processed 728 /// Extend expiry for state/PR entries about to be processed
506 /// Ensures at least `duration` remaining
507 pub fn extend_expiry(&self, identifier: &str, event_ids: &[EventId], duration: Duration); 729 pub fn extend_expiry(&self, identifier: &str, event_ids: &[EventId], duration: Duration);
508 730
509 /// Get current counts for metrics 731 /// Check if an event previously expired (prevents re-sync loops)
510 pub fn count(&self) -> (usize, usize); 732 pub fn is_expired(&self, event_id: &EventId) -> bool;
511}
512```
513
514### Sync Queue Management
515
516```rust
517impl Purgatory {
518 /// Enqueue identifier for sync with custom delay
519 pub fn enqueue_sync(&self, identifier: &str, delay: Duration);
520
521 /// Enqueue with default delay (3 minutes)
522 pub fn enqueue_sync_default(&self, identifier: &str);
523
524 /// Enqueue with immediate delay (500ms)
525 pub fn enqueue_sync_immediate(&self, identifier: &str);
526
527 /// Check if identifier has pending events
528 pub fn has_pending_events(&self, identifier: &str) -> bool;
529
530 /// Remove identifier from sync queue
531 pub fn remove_from_sync_queue(&self, identifier: &str);
532} 733}
533``` 734```
534 735
@@ -558,12 +759,6 @@ pub fn can_apply_state(
558 event: &Event, 759 event: &Event,
559 repo_path: &Path, 760 repo_path: &Path,
560) -> Result<bool>; 761) -> Result<bool>;
561
562/// Get refs from state that aren't being pushed
563pub fn get_unpushed_refs(
564 state_refs: &[RefPair],
565 pushed_refs: &[RefPair],
566) -> Vec<RefPair>;
567``` 762```
568 763
569See [`src/purgatory/helpers.rs`](../../src/purgatory/helpers.rs) for implementation. 764See [`src/purgatory/helpers.rs`](../../src/purgatory/helpers.rs) for implementation.
@@ -572,123 +767,37 @@ See [`src/purgatory/helpers.rs`](../../src/purgatory/helpers.rs) for implementat
572 767
573## Integration Points 768## Integration Points
574 769
575### 1. Event Policy (Nip34WritePolicy) 770### 1. Announcement Policy (`src/nostr/policy/announcement.rs`)
576 771
577State and PR events are added to purgatory when git data doesn't exist: 772Routes new announcements to purgatory or accepts replacements:
578 773
579```rust 774- If active DB announcement exists for `(pubkey, identifier)` → `Accept` immediately
580// From src/nostr/policy/state.rs 775- If purgatory entry exists → replace it, extend expiry, return `Accept`
581async fn handle_state(&self, event: &Event) -> WritePolicyResult { 776- Otherwise → return `AcceptPurgatory`, caller calls `add_to_purgatory()` which creates bare repo and adds to purgatory
582 let identifier = extract_identifier(event)?;
583
584 // Check if we have matching git data
585 if self.has_matching_git_data(&identifier, event).await? {
586 return WritePolicyResult::Accept;
587 }
588
589 // Add to purgatory
590 self.purgatory.add_state(
591 event.clone(),
592 identifier.clone(),
593 event.pubkey,
594 );
595
596 WritePolicyResult::Reject {
597 status: true, // Client sees OK
598 message: "purgatory: awaiting git data".into()
599 }
600}
601```
602 777
603### 2. Git Push Authorization 778### 2. State Event Policy (`src/nostr/policy/state.rs`)
604 779
605Authorization checks both database and purgatory: 780Checks purgatory announcements for authorization and extends their expiry:
606 781
607```rust 782```rust
608// From src/git/authorization.rs 783// Fetch announcements from both DB and purgatory
609pub async fn authorize_push( 784let repo_data = fetch_repository_data_with_purgatory(db, purgatory, identifier).await?;
610 database: &SharedDatabase, 785
611 identifier: &str, 786// For each authorized owner with a purgatory announcement, extend expiry
612 owner_pubkey: &str, 787purgatory.extend_announcement_expiry(&owner_pk, &identifier, Duration::from_secs(1800));
613 request_body: &Bytes,
614 purgatory: &Arc<Purgatory>, // Critical!
615 repo_path: &std::path::Path,
616) -> anyhow::Result<AuthorizationResult> {
617 // Parse pushed refs
618 let pushed_refs = parse_pushed_refs(request_body);
619
620 // Check database for state events
621 let db_result = get_authorization_from_db(database, identifier).await?;
622
623 if !db_result.authorized {
624 // No state in database - check purgatory
625 let purgatory_result = get_state_authorization_for_specific_owner_repo(
626 database,
627 identifier,
628 owner_pubkey,
629 purgatory,
630 &pushed_refs,
631 repo_path,
632 ).await?;
633
634 return purgatory_result;
635 }
636
637 db_result
638}
639``` 788```
640 789
641### 3. Post-Push Processing 790### 3. Git Push Authorization (`src/git/authorization.rs`)
642 791
643After successful push, events from purgatory are saved to database: 792`fetch_repository_data_with_purgatory()` merges DB announcements with purgatory announcements for authorization. On successful authorization via purgatory state events, also extends announcement expiry.
644 793
645```rust 794### 4. Git Data Processing (`src/git/sync.rs`)
646// From src/git/handlers.rs
647if from_purgatory {
648 if let (Some(db), Some(purg)) = (&database, &purgatory) {
649 // Save state event to database
650 db.save_event(&state.event).await?;
651
652 // Remove from purgatory
653 purg.remove_state_event(identifier, &state.event.id);
654 }
655}
656```
657 795
658### 4. Background Sync Loop 796`process_purgatory_announcements()` is called after any git push or background sync fetch. It promotes announcements from purgatory to the database and notifies WebSocket clients.
659 797
660Started during application initialization: 798### 5. Sync Registration (`src/sync/`)
661 799
662```rust 800A background timer (`run_purgatory_announcement_sync`, every 5 seconds) ensures purgatory announcements are registered in `RepoSyncIndex` with `SyncLevel::StateOnly`. When an announcement is promoted, the `SelfSubscriber` upgrades it to `SyncLevel::Full`.
663// From src/main.rs
664let purgatory = Arc::new(Purgatory::new(git_data_path));
665let ctx = Arc::new(RealSyncContext::new(
666 database.clone(),
667 purgatory.clone(),
668 config.domain.clone(),
669 git_data_path.clone(),
670));
671let throttle_manager = Arc::new(ThrottleManager::new(5, 30));
672throttle_manager.set_context(ctx.clone());
673
674// Start sync loop
675let sync_handle = purgatory.clone().start_sync_loop(ctx, throttle_manager);
676
677// Start cleanup task
678let cleanup_handle = tokio::spawn(async move {
679 let mut interval = tokio::time::interval(Duration::from_secs(60));
680 loop {
681 interval.tick().await;
682 let (state_removed, pr_removed) = purgatory.cleanup();
683 if state_removed + pr_removed > 0 {
684 tracing::debug!(
685 "Purgatory cleanup removed {} state, {} PR entries",
686 state_removed, pr_removed
687 );
688 }
689 }
690});
691```
692 801
693--- 802---
694 803
@@ -698,7 +807,7 @@ let cleanup_handle = tokio::spawn(async move {
698src/ 807src/
699├── purgatory/ 808├── purgatory/
700│ ├── mod.rs # Main Purgatory struct and API 809│ ├── mod.rs # Main Purgatory struct and API
701│ ├── types.rs # RefPair, StatePurgatoryEntry, PrPurgatoryEntry 810│ ├── types.rs # RefPair, AnnouncementPurgatoryEntry, StatePurgatoryEntry, PrPurgatoryEntry
702│ ├── helpers.rs # Ref extraction and matching functions 811│ ├── helpers.rs # Ref extraction and matching functions
703│ └── sync/ 812│ └── sync/
704│ ├── mod.rs # Sync module exports 813│ ├── mod.rs # Sync module exports
@@ -710,9 +819,10 @@ src/
710├── git/ 819├── git/
711│ ├── authorization.rs # authorize_push with purgatory checking 820│ ├── authorization.rs # authorize_push with purgatory checking
712│ ├── handlers.rs # handle_receive_pack with post-push processing 821│ ├── handlers.rs # handle_receive_pack with post-push processing
713│ └── sync.rs # process_newly_available_git_data 822│ └── sync.rs # process_newly_available_git_data, process_purgatory_announcements
714└── nostr/ 823└── nostr/
715 └── policy/ 824 └── policy/
825 ├── announcement.rs # Route announcements to purgatory
716 ├── state.rs # State event policy with purgatory 826 ├── state.rs # State event policy with purgatory
717 └── pr_event.rs # PR event policy with purgatory 827 └── pr_event.rs # PR event policy with purgatory
718``` 828```
@@ -725,7 +835,7 @@ src/
725 835
726Located in each module: 836Located in each module:
727 837
728- **[`src/purgatory/mod.rs`](../../src/purgatory/mod.rs)** - Core purgatory operations 838- **[`src/purgatory/mod.rs`](../../src/purgatory/mod.rs)** - Core purgatory operations including announcement purgatory
729- **[`src/purgatory/helpers.rs`](../../src/purgatory/helpers.rs)** - Ref matching logic 839- **[`src/purgatory/helpers.rs`](../../src/purgatory/helpers.rs)** - Ref matching logic
730- **[`src/purgatory/sync/functions.rs`](../../src/purgatory/sync/functions.rs)** - Sync functions with MockSyncContext 840- **[`src/purgatory/sync/functions.rs`](../../src/purgatory/sync/functions.rs)** - Sync functions with MockSyncContext
731- **[`src/purgatory/sync/throttle.rs`](../../src/purgatory/sync/throttle.rs)** - Throttle manager 841- **[`src/purgatory/sync/throttle.rs`](../../src/purgatory/sync/throttle.rs)** - Throttle manager
@@ -734,6 +844,9 @@ Located in each module:
734 844
735Located in [`tests/`](../../tests/): 845Located in [`tests/`](../../tests/):
736 846
847- **Announcement purgatory flow** - Announcement enters purgatory, git data promotes it
848- **Announcement soft expiry** - Bare repo deleted after 30 min, event retained 24h
849- **Announcement revival** - State event revives soft-expired announcement
737- **State event purgatory flow** - Event arrives, git push releases it 850- **State event purgatory flow** - Event arrives, git push releases it
738- **PR event purgatory flow** - Event arrives, git push releases it 851- **PR event purgatory flow** - Event arrives, git push releases it
739- **Git-data-first flow** - Git push creates placeholder, event completes it 852- **Git-data-first flow** - Git push creates placeholder, event completes it
@@ -744,7 +857,19 @@ Located in [`tests/`](../../tests/):
744 857
745## Key Learnings 858## Key Learnings
746 859
747### 1. Purgatory Authorization is Critical 860### 1. Announcement Purgatory Prevents Misleading Empty Repos
861
862Without announcement purgatory, we'd serve announcements for repos with no content. Clients would see the announcement, try to clone, and get nothing.
863
864**Solution:** Announcements wait in purgatory until git data proves content exists.
865
866### 2. Soft Expiry Avoids Sync Loops
867
868The protocol's 30-minute expiry creates a problem: without soft expiry, we'd either permanently block repositories or constantly re-sync expired announcement events.
869
870**Solution:** Soft expiry retains the event for 24 hours after deleting the bare repo, allowing revival without re-fetching.
871
872### 3. Purgatory Authorization is Critical
748 873
749Without checking purgatory during authorization, we have a deadlock: 874Without checking purgatory during authorization, we have a deadlock:
750- State event goes to purgatory (no git data) 875- State event goes to purgatory (no git data)
@@ -753,7 +878,7 @@ Without checking purgatory during authorization, we have a deadlock:
753 878
754**Solution:** `authorize_push()` checks both database and purgatory. 879**Solution:** `authorize_push()` checks both database and purgatory.
755 880
756### 2. Late Binding for State Events 881### 4. Late Binding for State Events
757 882
758Extracting refs at event arrival time doesn't work when: 883Extracting refs at event arrival time doesn't work when:
759- Multiple state events arrive for same identifier 884- Multiple state events arrive for same identifier
@@ -761,7 +886,7 @@ Extracting refs at event arrival time doesn't work when:
761 886
762**Solution:** Extract and match refs at push time via `find_matching_states()`. 887**Solution:** Extract and match refs at push time via `find_matching_states()`.
763 888
764### 3. Bidirectional Waiting for PR Events 889### 5. Bidirectional Waiting for PR Events
765 890
766PR events can arrive before or after git data: 891PR events can arrive before or after git data:
767- Event first → Wait for git push 892- Event first → Wait for git push
@@ -769,26 +894,13 @@ PR events can arrive before or after git data:
769 894
770**Solution:** `PrPurgatoryEntry.event: Option<Event>` with `None` = placeholder. 895**Solution:** `PrPurgatoryEntry.event: Option<Event>` with `None` = placeholder.
771 896
772### 4. Sync Queue Debouncing
773
774When events arrive in bursts (e.g., negentropy sync), we don't want to spawn a sync task for each event.
775
776**Solution:** `enqueue_sync()` resets `attempt_count` and updates `next_attempt` if already queued.
777
778### 5. Domain Throttling with Queues
779
780When a domain is throttled, we still want to eventually sync from it.
781
782**Solution:** `ThrottleManager` maintains per-domain queues and processes them when capacity frees.
783
784--- 897---
785 898
786## Related Documentation 899## Related Documentation
787 900
788- [Inline Authorization](inline-authorization.md) - Why purgatory checking during authorization is essential
789- [Architecture Overview](architecture.md) - Full system design 901- [Architecture Overview](architecture.md) - Full system design
790- [Background Sync](../how-to/purgatory-sync.md) - How to configure and monitor sync 902- [GRASP-02 Proactive Sync](grasp-02-proactive-sync.md) - Relay-to-relay event sync with SyncLevel
791- [Test Strategy](../reference/test-strategy.md) - How we test purgatory 903- [GRASP-02 Purgatory Git Data Fetching](grasp-02-proactive-sync-purgatory-git-data.md) - Background git data hunting
792 904
793--- 905---
794 906