upleb.uk

Public git repos — served from a NIP-34 GRASP relay at git.upleb.uk

summaryrefslogtreecommitdiff
path: root/docs/explanation/unify-git-data-sync.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/explanation/unify-git-data-sync.md')
-rw-r--r--docs/explanation/unify-git-data-sync.md481
1 files changed, 0 insertions, 481 deletions
diff --git a/docs/explanation/unify-git-data-sync.md b/docs/explanation/unify-git-data-sync.md
deleted file mode 100644
index fa1f983..0000000
--- a/docs/explanation/unify-git-data-sync.md
+++ /dev/null
@@ -1,481 +0,0 @@
1# Unified Git Data Sync
2
3## Status
4
5**Proposed** - January 2026
6
7## Context
8
9Currently, two separate code paths handle "git data is now available" scenarios:
10
111. **`handle_receive_pack`** (src/git/handlers.rs) - After a successful `git push`
122. **`sync_state_git_data`** (src/purgatory/mod.rs) - After purgatory sync fetches OIDs from remote servers
13
14Both paths perform essentially the same post-processing:
15
16| Step | `handle_receive_pack` | `sync_state_git_data` |
17|------|----------------------|----------------------|
18| Set HEAD | ✅ `try_set_head_if_available()` | ✅ (via `align_repository_with_state`) |
19| Save events to DB | ✅ `database.save_event()` | ✅ `database.save_event()` |
20| Remove from purgatory | ✅ `remove_state_event()` / `remove_pr()` | ✅ `remove_state_event()` |
21| Notify WebSocket | ✅ `relay.notify_event()` | ✅ `relay.notify_event()` |
22| Sync state to owner repos | ✅ `sync_to_owner_repos()` | ✅ `sync_to_owner_repos()` |
23| Sync PR refs to owner repos | ✅ `sync_pr_refs_to_tagged_owner_repos()` | ❌ Not implemented |
24
25This duplication creates maintenance burden and inconsistent behavior (e.g., PR sync missing from purgatory path).
26
27## Decision
28
29Create a single unified function that handles all post-git-data-available processing:
30
31```rust
32pub async fn process_newly_available_git_data(
33 source_repo_path: &Path,
34 new_oids: &HashSet<String>,
35 database: &SharedDatabase,
36 local_relay: Option<&nostr_relay_builder::LocalRelay>,
37 purgatory: &Purgatory,
38 git_data_path: &Path,
39) -> ProcessResult
40```
41
42### Key Design Principles
43
44**1. Always discover events from purgatory**
45
46Rather than accepting pre-authorized events (which may have changed since authorization), the function always scans purgatory to find satisfiable events. This ensures consistency and handles race conditions where events change between authorization and processing.
47
48**2. Minimal input, maximal output**
49
50Callers only need to provide:
51- `source_repo_path` - Where the git data landed
52- `new_oids` - Which OIDs are now available (for efficient filtering)
53
54The function handles everything else: finding events, syncing across repos, aligning refs, setting HEAD, saving to database, notifying subscribers, and cleaning up purgatory.
55
56**3. Process all event types uniformly**
57
58Both state events (kind 30618) and PR events (kind 1617/1618) are processed in the same flow, ensuring consistent behavior.
59
60## Architecture
61
62### Flow Overview
63
64```
65┌─────────────────────────────────────────────────────────────────────────────────┐
66│ Git Data Becomes Available │
67│ │
68│ ┌─────────────────────┐ ┌─────────────────────┐ │
69│ │ handle_receive_pack │ │ purgatory sync │ │
70│ │ (push received) │ │ (fetch completed) │ │
71│ └──────────┬──────────┘ └──────────┬──────────┘ │
72│ │ │ │
73│ │ source_repo_path │ source_repo_path │
74│ │ new_oids │ new_oids │
75│ │ │ │
76│ └────────────────┬───────────────────┘ │
77│ │ │
78│ ▼ │
79│ ┌────────────────────────────────────────┐ │
80│ │ process_newly_available_git_data() │ │
81│ │ │ │
82│ │ 1. Extract identifier from path │ │
83│ │ 2. Fetch repository data from DB │ │
84│ │ 3. Find satisfiable state events │ │
85│ │ 4. Find satisfiable PR events │ │
86│ │ 5. For each event: │ │
87│ │ - Sync OIDs to owner repos │ │
88│ │ - Align refs (+ set HEAD) │ │
89│ │ - Save to database │ │
90│ │ - Notify WebSocket │ │
91│ │ - Remove from purgatory │ │
92│ └────────────────────────────────────────┘ │
93└─────────────────────────────────────────────────────────────────────────────────┘
94```
95
96### Event Discovery
97
98The function discovers satisfiable events by scanning purgatory:
99
100**For State Events:**
1011. Get all state entries for the identifier from purgatory
1022. For each entry, check if ALL required OIDs exist in source repo
1033. Quick optimization: skip if none of `new_oids` are in the state's OID set
104
105**For PR Events:**
1061. Get all PR entries for the identifier from purgatory (via secondary index)
1072. For each entry with an event, check if the commit OID exists in source repo
1083. Quick optimization: skip if commit not in `new_oids`
109
110### Sync to Owner Repos
111
112**For State Events:**
113
114For each owner whose maintainer set authorizes the state author:
1151. Skip if a newer state already exists for that owner
1162. Copy missing OIDs from source repo to target repo
1173. Align refs (create/update/delete branches and tags)
1184. Set HEAD per state announcement
119
120**For PR Events:**
121
122For each owner whose maintainer set includes any tagged owner (from `a` tags):
1231. Copy commit from source repo to target repo (if missing)
1242. Create `refs/nostr/<event-id>` pointing to the commit
125
126## Data Structure Changes
127
128### PrPurgatoryEntry
129
130Add `identifier` field for secondary index lookup:
131
132```rust
133#[derive(Debug, Clone)]
134pub struct PrPurgatoryEntry {
135 /// The nostr PR event, if received (None = git data arrived first)
136 pub event: Option<Event>,
137
138 /// The expected commit SHA from 'c' tag or actual commit pushed
139 pub commit: String,
140
141 /// Repository identifier extracted from 'a' tag (30617:<owner>:<identifier>)
142 /// Used for lookup when git data arrives
143 pub identifier: Option<String>,
144
145 /// When this entry was added to purgatory
146 pub created_at: Instant,
147
148 /// Expiry deadline
149 pub expires_at: Instant,
150}
151```
152
153### Purgatory Secondary Index
154
155Add index for finding PR events by identifier:
156
157```rust
158pub struct Purgatory {
159 /// State events indexed by repository identifier
160 state_events: Arc<DashMap<String, Vec<StatePurgatoryEntry>>>,
161
162 /// PR events indexed by event ID (hex string)
163 pr_events: Arc<DashMap<String, PrPurgatoryEntry>>,
164
165 /// Secondary index: identifier -> event_ids for PR events
166 pr_events_by_identifier: Arc<DashMap<String, HashSet<String>>>,
167
168 git_data_path: PathBuf,
169}
170```
171
172### New Purgatory Methods
173
174```rust
175impl Purgatory {
176 /// Find all PR events for an identifier
177 pub fn find_prs_for_identifier(&self, identifier: &str) -> Vec<PrPurgatoryEntry>;
178
179 /// Add PR with automatic identifier extraction and indexing
180 pub fn add_pr(&self, event: Event, event_id: String, commit: String);
181
182 /// Add placeholder with optional identifier
183 pub fn add_pr_placeholder(&self, event_id: String, commit: String, identifier: Option<String>);
184
185 /// Remove PR (also cleans up secondary index)
186 pub fn remove_pr(&self, event_id: &str);
187}
188```
189
190## Implementation
191
192### Core Function
193
194```rust
195/// Unified processing of newly available git data.
196///
197/// Called whenever git data becomes available, whether from:
198/// - A successful `git push` (handle_receive_pack)
199/// - Purgatory sync fetching OIDs from remote servers
200///
201/// # What it does
202///
203/// 1. **Discover satisfiable events**: Scans purgatory for state and PR events
204/// whose required OIDs are now available in `source_repo_path`
205///
206/// 2. **For each satisfiable STATE event**:
207/// - Find all owner repos that authorize this state's author
208/// - Copy OIDs from source repo to each authorized owner repo
209/// - Align refs (create/update/delete) to match state
210/// - Set HEAD per state announcement
211/// - Save event to database
212/// - Notify WebSocket subscribers
213/// - Remove from purgatory
214///
215/// 3. **For each satisfiable PR event**:
216/// - Find all owner repos that list tagged owners as maintainers
217/// - Copy commit from source repo to each relevant owner repo
218/// - Create refs/nostr/<event-id> in each repo
219/// - Save event to database
220/// - Notify WebSocket subscribers
221/// - Remove from purgatory
222pub async fn process_newly_available_git_data(
223 source_repo_path: &Path,
224 new_oids: &HashSet<String>,
225 database: &SharedDatabase,
226 local_relay: Option<&nostr_relay_builder::LocalRelay>,
227 purgatory: &Purgatory,
228 git_data_path: &Path,
229) -> ProcessResult {
230 let mut result = ProcessResult::default();
231
232 // Extract identifier from repo path
233 let identifier = match extract_identifier_from_repo_path(source_repo_path, git_data_path) {
234 Some(id) => id,
235 None => return result,
236 };
237
238 // Fetch repository data once for all operations
239 let db_repo_data = match fetch_repository_data(database, &identifier).await {
240 Ok(data) => data,
241 Err(e) => {
242 result.errors.push(format!("Failed to fetch repo data: {}", e));
243 return result;
244 }
245 };
246
247 // Process satisfiable state events
248 let state_result = process_satisfiable_state_events(
249 source_repo_path,
250 &identifier,
251 new_oids,
252 &db_repo_data,
253 database,
254 local_relay,
255 purgatory,
256 git_data_path,
257 ).await;
258
259 result.merge_state_result(state_result);
260
261 // Process satisfiable PR events
262 let pr_result = process_satisfiable_pr_events(
263 source_repo_path,
264 &identifier,
265 new_oids,
266 &db_repo_data,
267 database,
268 local_relay,
269 purgatory,
270 git_data_path,
271 ).await;
272
273 result.merge_pr_result(pr_result);
274
275 result
276}
277```
278
279### Result Type
280
281```rust
282/// Result of processing newly available git data
283#[derive(Debug, Default)]
284pub struct ProcessResult {
285 /// Number of state events released from purgatory
286 pub states_released: usize,
287 /// Number of PR events released from purgatory
288 pub prs_released: usize,
289 /// Number of owner repositories synced
290 pub repos_synced: usize,
291 /// Number of refs created across all repos
292 pub refs_created: usize,
293 /// Number of refs updated across all repos
294 pub refs_updated: usize,
295 /// Number of refs deleted across all repos
296 pub refs_deleted: usize,
297 /// Errors encountered (non-fatal)
298 pub errors: Vec<String>,
299}
300```
301
302### Helper: Extract Identifier from PR Event
303
304```rust
305/// Extract identifier from PR event's `a` tag.
306/// Format: 30617:<owner_pubkey>:<identifier>
307fn extract_identifier_from_pr_event(event: &Event) -> Option<String> {
308 event.tags.iter().find_map(|tag| {
309 let tag_vec = tag.clone().to_vec();
310 if tag_vec.len() >= 2 && tag_vec[0] == "a" && tag_vec[1].starts_with("30617:") {
311 let parts: Vec<&str> = tag_vec[1].split(':').collect();
312 if parts.len() >= 3 {
313 Some(parts[2].to_string())
314 } else {
315 None
316 }
317 } else {
318 None
319 }
320 })
321}
322```
323
324### Helper: Extract Identifier from Repo Path
325
326```rust
327/// Extract identifier from repository path.
328/// Path format: {git_data_path}/{npub}/{identifier}.git
329fn extract_identifier_from_repo_path(repo_path: &Path, git_data_path: &Path) -> Option<String> {
330 let relative = repo_path.strip_prefix(git_data_path).ok()?;
331 let components: Vec<_> = relative.components().collect();
332
333 if components.len() >= 2 {
334 let identifier_with_git = components[1].as_os_str().to_str()?;
335 Some(identifier_with_git.trim_end_matches(".git").to_string())
336 } else {
337 None
338 }
339}
340```
341
342## Integration
343
344### handle_receive_pack (Simplified)
345
346```rust
347// After git receive-pack succeeds:
348
349// Collect new OIDs from the push
350let new_oids: HashSet<String> = pushed_refs
351 .iter()
352 .filter(|(_, new_oid, _)| new_oid != "0000000000000000000000000000000000000000")
353 .map(|(_, new_oid, _)| new_oid.clone())
354 .collect();
355
356// Single unified call handles everything
357let result = process_newly_available_git_data(
358 &repo_path,
359 &new_oids,
360 &database,
361 Some(&relay),
362 &purgatory,
363 Path::new(git_data_path),
364).await;
365
366info!(
367 "Processed push: {} states, {} PRs released, {} repos synced",
368 result.states_released,
369 result.prs_released,
370 result.repos_synced
371);
372```
373
374### Purgatory Sync (Simplified)
375
376```rust
377// After fetching OIDs from remote:
378
379let new_oids: HashSet<String> = fetched_oids.into_iter().collect();
380
381let result = process_newly_available_git_data(
382 &source_repo_path,
383 &new_oids,
384 &database,
385 local_relay.as_ref(),
386 &purgatory,
387 &git_data_path,
388).await;
389```
390
391### Integration with Purgatory Sync Redesign
392
393The purgatory sync redesign (see `purgatory-sync-redesign.md`) uses this unified function in its `sync_identifier_from_url` implementation:
394
395```rust
396pub async fn sync_identifier_from_url<C: SyncContext>(
397 ctx: &C,
398 identifier: &str,
399 url: &str,
400 throttle_manager: &Arc<ThrottleManager>,
401) -> usize {
402 // ... fetch OIDs from URL ...
403
404 let fetched_oids = ctx.fetch_oids(&target_repo, url, &needed_oids).await?;
405
406 if !fetched_oids.is_empty() {
407 // Use unified processing
408 let new_oids: HashSet<String> = fetched_oids.into_iter().collect();
409
410 let result = process_newly_available_git_data(
411 &target_repo,
412 &new_oids,
413 ctx.database(),
414 ctx.local_relay(),
415 ctx.purgatory(),
416 ctx.git_data_path(),
417 ).await;
418
419 // Result already handled purgatory removal, DB saves, etc.
420 }
421
422 fetched_oids.len()
423}
424```
425
426The `SyncContext` trait wraps this function in its `process_newly_available_git_data` method for testability.
427
428## Benefits
429
4301. **Single source of truth** - One function handles all post-git-data processing
4312. **Always fresh discovery** - Events discovered from purgatory at processing time
4323. **Consistent behavior** - Push and sync paths behave identically
4334. **Simpler callers** - Just pass repo_path + new_oids
4345. **Complete processing** - Handles all event types, all repo syncing, HEAD, DB, WebSocket, purgatory
4356. **PR sync parity** - PR events now synced in purgatory path (was missing)
436
437## Code to Remove/Simplify
438
439After implementing the unified function:
440
4411. **Remove**: Most of `sync_state_git_data` in `src/purgatory/mod.rs`
4422. **Simplify**: Event handling in `handle_receive_pack` (replace ~100 lines with single call)
4433. **Internalize**: `sync_to_owner_repos` and `sync_pr_refs_to_tagged_owner_repos` become internal helpers
444
445## Testing Strategy
446
447### Unit Tests
448
4491. `extract_identifier_from_repo_path` - Various path formats
4502. `extract_identifier_from_pr_event` - Various tag formats
4513. Event discovery logic with mock purgatory
452
453### Integration Tests
454
4551. Push triggers processing and releases state event
4562. Push triggers processing and releases PR event
4573. Purgatory sync triggers processing
4584. Multiple events for same identifier processed correctly
4595. Cross-repo sync works for both state and PR events
460
461## Future Considerations
462
463### Batch Processing
464
465Currently processes events one at a time. Could batch database saves and WebSocket notifications for efficiency with many events.
466
467### Partial Failures
468
469Currently continues on errors and collects them in result. Could add retry logic or transaction semantics if needed.
470
471### Metrics
472
473Add Prometheus metrics for:
474- Events processed by type (state/PR)
475- Repos synced per processing call
476- Processing duration
477- Errors by type
478
479## Related Documents
480
481- [Purgatory Sync Redesign](purgatory-sync-redesign.md) - Uses this unified function for purgatory sync operations