upleb.uk

Public git repos — served from a NIP-34 GRASP relay at git.upleb.uk

summaryrefslogtreecommitdiff
path: root/docs/explanation/unify-git-data-sync.md
blob: fa1f983dbe6e4af597ea2d99b7728d36e81ddc9a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
# Unified Git Data Sync

## Status

**Proposed** - January 2026

## Context

Currently, two separate code paths handle "git data is now available" scenarios:

1. **`handle_receive_pack`** (src/git/handlers.rs) - After a successful `git push`
2. **`sync_state_git_data`** (src/purgatory/mod.rs) - After purgatory sync fetches OIDs from remote servers

Both paths perform essentially the same post-processing:

| Step | `handle_receive_pack` | `sync_state_git_data` |
|------|----------------------|----------------------|
| Set HEAD | ✅ `try_set_head_if_available()` | ✅ (via `align_repository_with_state`) |
| Save events to DB | ✅ `database.save_event()` | ✅ `database.save_event()` |
| Remove from purgatory | ✅ `remove_state_event()` / `remove_pr()` | ✅ `remove_state_event()` |
| Notify WebSocket | ✅ `relay.notify_event()` | ✅ `relay.notify_event()` |
| Sync state to owner repos | ✅ `sync_to_owner_repos()` | ✅ `sync_to_owner_repos()` |
| Sync PR refs to owner repos | ✅ `sync_pr_refs_to_tagged_owner_repos()` | ❌ Not implemented |

This duplication creates maintenance burden and inconsistent behavior (e.g., PR sync missing from purgatory path).

## Decision

Create a single unified function that handles all post-git-data-available processing:

```rust
pub async fn process_newly_available_git_data(
    source_repo_path: &Path,
    new_oids: &HashSet<String>,
    database: &SharedDatabase,
    local_relay: Option<&nostr_relay_builder::LocalRelay>,
    purgatory: &Purgatory,
    git_data_path: &Path,
) -> ProcessResult
```

### Key Design Principles

**1. Always discover events from purgatory**

Rather than accepting pre-authorized events (which may have changed since authorization), the function always scans purgatory to find satisfiable events. This ensures consistency and handles race conditions where events change between authorization and processing.

**2. Minimal input, maximal output**

Callers only need to provide:
- `source_repo_path` - Where the git data landed
- `new_oids` - Which OIDs are now available (for efficient filtering)

The function handles everything else: finding events, syncing across repos, aligning refs, setting HEAD, saving to database, notifying subscribers, and cleaning up purgatory.

**3. Process all event types uniformly**

Both state events (kind 30618) and PR events (kind 1617/1618) are processed in the same flow, ensuring consistent behavior.

## Architecture

### Flow Overview

```
┌─────────────────────────────────────────────────────────────────────────────────┐
│                         Git Data Becomes Available                               │
│                                                                                  │
│   ┌─────────────────────┐              ┌─────────────────────┐                   │
│   │  handle_receive_pack │              │  purgatory sync     │                   │
│   │  (push received)     │              │  (fetch completed)  │                   │
│   └──────────┬──────────┘              └──────────┬──────────┘                   │
│              │                                    │                              │
│              │  source_repo_path                  │  source_repo_path            │
│              │  new_oids                          │  new_oids                    │
│              │                                    │                              │
│              └────────────────┬───────────────────┘                              │
│                               │                                                  │
│                               ▼                                                  │
│              ┌────────────────────────────────────────┐                          │
│              │  process_newly_available_git_data()    │                          │
│              │                                        │                          │
│              │  1. Extract identifier from path       │                          │
│              │  2. Fetch repository data from DB      │                          │
│              │  3. Find satisfiable state events      │                          │
│              │  4. Find satisfiable PR events         │                          │
│              │  5. For each event:                    │                          │
│              │     - Sync OIDs to owner repos         │                          │
│              │     - Align refs (+ set HEAD)          │                          │
│              │     - Save to database                 │                          │
│              │     - Notify WebSocket                 │                          │
│              │     - Remove from purgatory            │                          │
│              └────────────────────────────────────────┘                          │
└─────────────────────────────────────────────────────────────────────────────────┘
```

### Event Discovery

The function discovers satisfiable events by scanning purgatory:

**For State Events:**
1. Get all state entries for the identifier from purgatory
2. For each entry, check if ALL required OIDs exist in source repo
3. Quick optimization: skip if none of `new_oids` are in the state's OID set

**For PR Events:**
1. Get all PR entries for the identifier from purgatory (via secondary index)
2. For each entry with an event, check if the commit OID exists in source repo
3. Quick optimization: skip if commit not in `new_oids`

### Sync to Owner Repos

**For State Events:**

For each owner whose maintainer set authorizes the state author:
1. Skip if a newer state already exists for that owner
2. Copy missing OIDs from source repo to target repo
3. Align refs (create/update/delete branches and tags)
4. Set HEAD per state announcement

**For PR Events:**

For each owner whose maintainer set includes any tagged owner (from `a` tags):
1. Copy commit from source repo to target repo (if missing)
2. Create `refs/nostr/<event-id>` pointing to the commit

## Data Structure Changes

### PrPurgatoryEntry

Add `identifier` field for secondary index lookup:

```rust
#[derive(Debug, Clone)]
pub struct PrPurgatoryEntry {
    /// The nostr PR event, if received (None = git data arrived first)
    pub event: Option<Event>,

    /// The expected commit SHA from 'c' tag or actual commit pushed
    pub commit: String,

    /// Repository identifier extracted from 'a' tag (30617:<owner>:<identifier>)
    /// Used for lookup when git data arrives
    pub identifier: Option<String>,

    /// When this entry was added to purgatory
    pub created_at: Instant,

    /// Expiry deadline
    pub expires_at: Instant,
}
```

### Purgatory Secondary Index

Add index for finding PR events by identifier:

```rust
pub struct Purgatory {
    /// State events indexed by repository identifier
    state_events: Arc<DashMap<String, Vec<StatePurgatoryEntry>>>,

    /// PR events indexed by event ID (hex string)
    pr_events: Arc<DashMap<String, PrPurgatoryEntry>>,

    /// Secondary index: identifier -> event_ids for PR events
    pr_events_by_identifier: Arc<DashMap<String, HashSet<String>>>,

    git_data_path: PathBuf,
}
```

### New Purgatory Methods

```rust
impl Purgatory {
    /// Find all PR events for an identifier
    pub fn find_prs_for_identifier(&self, identifier: &str) -> Vec<PrPurgatoryEntry>;
    
    /// Add PR with automatic identifier extraction and indexing
    pub fn add_pr(&self, event: Event, event_id: String, commit: String);
    
    /// Add placeholder with optional identifier
    pub fn add_pr_placeholder(&self, event_id: String, commit: String, identifier: Option<String>);
    
    /// Remove PR (also cleans up secondary index)
    pub fn remove_pr(&self, event_id: &str);
}
```

## Implementation

### Core Function

```rust
/// Unified processing of newly available git data.
///
/// Called whenever git data becomes available, whether from:
/// - A successful `git push` (handle_receive_pack)
/// - Purgatory sync fetching OIDs from remote servers
///
/// # What it does
///
/// 1. **Discover satisfiable events**: Scans purgatory for state and PR events
///    whose required OIDs are now available in `source_repo_path`
///
/// 2. **For each satisfiable STATE event**:
///    - Find all owner repos that authorize this state's author
///    - Copy OIDs from source repo to each authorized owner repo
///    - Align refs (create/update/delete) to match state
///    - Set HEAD per state announcement
///    - Save event to database
///    - Notify WebSocket subscribers
///    - Remove from purgatory
///
/// 3. **For each satisfiable PR event**:
///    - Find all owner repos that list tagged owners as maintainers
///    - Copy commit from source repo to each relevant owner repo
///    - Create refs/nostr/<event-id> in each repo
///    - Save event to database
///    - Notify WebSocket subscribers
///    - Remove from purgatory
pub async fn process_newly_available_git_data(
    source_repo_path: &Path,
    new_oids: &HashSet<String>,
    database: &SharedDatabase,
    local_relay: Option<&nostr_relay_builder::LocalRelay>,
    purgatory: &Purgatory,
    git_data_path: &Path,
) -> ProcessResult {
    let mut result = ProcessResult::default();

    // Extract identifier from repo path
    let identifier = match extract_identifier_from_repo_path(source_repo_path, git_data_path) {
        Some(id) => id,
        None => return result,
    };

    // Fetch repository data once for all operations
    let db_repo_data = match fetch_repository_data(database, &identifier).await {
        Ok(data) => data,
        Err(e) => {
            result.errors.push(format!("Failed to fetch repo data: {}", e));
            return result;
        }
    };

    // Process satisfiable state events
    let state_result = process_satisfiable_state_events(
        source_repo_path,
        &identifier,
        new_oids,
        &db_repo_data,
        database,
        local_relay,
        purgatory,
        git_data_path,
    ).await;
    
    result.merge_state_result(state_result);

    // Process satisfiable PR events
    let pr_result = process_satisfiable_pr_events(
        source_repo_path,
        &identifier,
        new_oids,
        &db_repo_data,
        database,
        local_relay,
        purgatory,
        git_data_path,
    ).await;
    
    result.merge_pr_result(pr_result);

    result
}
```

### Result Type

```rust
/// Result of processing newly available git data
#[derive(Debug, Default)]
pub struct ProcessResult {
    /// Number of state events released from purgatory
    pub states_released: usize,
    /// Number of PR events released from purgatory
    pub prs_released: usize,
    /// Number of owner repositories synced
    pub repos_synced: usize,
    /// Number of refs created across all repos
    pub refs_created: usize,
    /// Number of refs updated across all repos
    pub refs_updated: usize,
    /// Number of refs deleted across all repos
    pub refs_deleted: usize,
    /// Errors encountered (non-fatal)
    pub errors: Vec<String>,
}
```

### Helper: Extract Identifier from PR Event

```rust
/// Extract identifier from PR event's `a` tag.
/// Format: 30617:<owner_pubkey>:<identifier>
fn extract_identifier_from_pr_event(event: &Event) -> Option<String> {
    event.tags.iter().find_map(|tag| {
        let tag_vec = tag.clone().to_vec();
        if tag_vec.len() >= 2 && tag_vec[0] == "a" && tag_vec[1].starts_with("30617:") {
            let parts: Vec<&str> = tag_vec[1].split(':').collect();
            if parts.len() >= 3 {
                Some(parts[2].to_string())
            } else {
                None
            }
        } else {
            None
        }
    })
}
```

### Helper: Extract Identifier from Repo Path

```rust
/// Extract identifier from repository path.
/// Path format: {git_data_path}/{npub}/{identifier}.git
fn extract_identifier_from_repo_path(repo_path: &Path, git_data_path: &Path) -> Option<String> {
    let relative = repo_path.strip_prefix(git_data_path).ok()?;
    let components: Vec<_> = relative.components().collect();

    if components.len() >= 2 {
        let identifier_with_git = components[1].as_os_str().to_str()?;
        Some(identifier_with_git.trim_end_matches(".git").to_string())
    } else {
        None
    }
}
```

## Integration

### handle_receive_pack (Simplified)

```rust
// After git receive-pack succeeds:

// Collect new OIDs from the push
let new_oids: HashSet<String> = pushed_refs
    .iter()
    .filter(|(_, new_oid, _)| new_oid != "0000000000000000000000000000000000000000")
    .map(|(_, new_oid, _)| new_oid.clone())
    .collect();

// Single unified call handles everything
let result = process_newly_available_git_data(
    &repo_path,
    &new_oids,
    &database,
    Some(&relay),
    &purgatory,
    Path::new(git_data_path),
).await;

info!(
    "Processed push: {} states, {} PRs released, {} repos synced",
    result.states_released,
    result.prs_released,
    result.repos_synced
);
```

### Purgatory Sync (Simplified)

```rust
// After fetching OIDs from remote:

let new_oids: HashSet<String> = fetched_oids.into_iter().collect();

let result = process_newly_available_git_data(
    &source_repo_path,
    &new_oids,
    &database,
    local_relay.as_ref(),
    &purgatory,
    &git_data_path,
).await;
```

### Integration with Purgatory Sync Redesign

The purgatory sync redesign (see `purgatory-sync-redesign.md`) uses this unified function in its `sync_identifier_from_url` implementation:

```rust
pub async fn sync_identifier_from_url<C: SyncContext>(
    ctx: &C,
    identifier: &str,
    url: &str,
    throttle_manager: &Arc<ThrottleManager>,
) -> usize {
    // ... fetch OIDs from URL ...
    
    let fetched_oids = ctx.fetch_oids(&target_repo, url, &needed_oids).await?;
    
    if !fetched_oids.is_empty() {
        // Use unified processing
        let new_oids: HashSet<String> = fetched_oids.into_iter().collect();
        
        let result = process_newly_available_git_data(
            &target_repo,
            &new_oids,
            ctx.database(),
            ctx.local_relay(),
            ctx.purgatory(),
            ctx.git_data_path(),
        ).await;
        
        // Result already handled purgatory removal, DB saves, etc.
    }
    
    fetched_oids.len()
}
```

The `SyncContext` trait wraps this function in its `process_newly_available_git_data` method for testability.

## Benefits

1. **Single source of truth** - One function handles all post-git-data processing
2. **Always fresh discovery** - Events discovered from purgatory at processing time
3. **Consistent behavior** - Push and sync paths behave identically
4. **Simpler callers** - Just pass repo_path + new_oids
5. **Complete processing** - Handles all event types, all repo syncing, HEAD, DB, WebSocket, purgatory
6. **PR sync parity** - PR events now synced in purgatory path (was missing)

## Code to Remove/Simplify

After implementing the unified function:

1. **Remove**: Most of `sync_state_git_data` in `src/purgatory/mod.rs`
2. **Simplify**: Event handling in `handle_receive_pack` (replace ~100 lines with single call)
3. **Internalize**: `sync_to_owner_repos` and `sync_pr_refs_to_tagged_owner_repos` become internal helpers

## Testing Strategy

### Unit Tests

1. `extract_identifier_from_repo_path` - Various path formats
2. `extract_identifier_from_pr_event` - Various tag formats
3. Event discovery logic with mock purgatory

### Integration Tests

1. Push triggers processing and releases state event
2. Push triggers processing and releases PR event
3. Purgatory sync triggers processing
4. Multiple events for same identifier processed correctly
5. Cross-repo sync works for both state and PR events

## Future Considerations

### Batch Processing

Currently processes events one at a time. Could batch database saves and WebSocket notifications for efficiency with many events.

### Partial Failures

Currently continues on errors and collects them in result. Could add retry logic or transaction semantics if needed.

### Metrics

Add Prometheus metrics for:
- Events processed by type (state/PR)
- Repos synced per processing call
- Processing duration
- Errors by type

## Related Documents

- [Purgatory Sync Redesign](purgatory-sync-redesign.md) - Uses this unified function for purgatory sync operations