upleb.uk

Public git repos — served from a NIP-34 GRASP relay at git.upleb.uk

summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/explanation/purgatory-sync-redesign.md285
-rw-r--r--docs/explanation/unify-git-data-sync.md481
2 files changed, 573 insertions, 193 deletions
diff --git a/docs/explanation/purgatory-sync-redesign.md b/docs/explanation/purgatory-sync-redesign.md
index 54e279a..c1e6af6 100644
--- a/docs/explanation/purgatory-sync-redesign.md
+++ b/docs/explanation/purgatory-sync-redesign.md
@@ -29,7 +29,7 @@ Redesign purgatory sync to be **identifier-based** rather than **event-based**,
29 29
30### Key Design Decision: Where Does OID Copying Happen? 30### Key Design Decision: Where Does OID Copying Happen?
31 31
32**Answer: In `process_satisfiable_events`, NOT after the entire sync completes.** 32**Answer: In `process_newly_available_git_data`, NOT after the entire sync completes.**
33 33
34The current implementation (`sync_state_git_data`) fetches all OIDs first, then at the end: 34The current implementation (`sync_state_git_data`) fetches all OIDs first, then at the end:
351. Copies OIDs to all authorized owner repos 351. Copies OIDs to all authorized owner repos
@@ -38,7 +38,7 @@ The current implementation (`sync_state_git_data`) fetches all OIDs first, then
384. Notifies subscribers 384. Notifies subscribers
395. Removes from purgatory 395. Removes from purgatory
40 40
41The redesign moves all of this into `process_satisfiable_events`, which is called after **each successful URL fetch**. This enables: 41The redesign moves all of this into `process_newly_available_git_data`, which is called after **each successful URL fetch**. This enables:
42 42
43| Aspect | Current (end-of-sync) | Redesign (per-fetch) | 43| Aspect | Current (end-of-sync) | Redesign (per-fetch) |
44|--------|----------------------|---------------------| 44|--------|----------------------|---------------------|
@@ -55,12 +55,33 @@ Consider syncing an identifier with 3 state events from different maintainers:
55- State C needs OIDs from `server3.com` (down) 55- State C needs OIDs from `server3.com` (down)
56 56
57With the redesign: 57With the redesign:
581. Fetch from `server1.com` succeeds → `process_satisfiable_events` releases State A immediately 581. Fetch from `server1.com` succeeds → `process_newly_available_git_data` releases State A immediately
592. Fetch from `server2.com` succeeds → `process_satisfiable_events` releases State B 592. Fetch from `server2.com` succeeds → `process_newly_available_git_data` releases State B
603. Fetch from `server3.com` fails → State C stays in purgatory, retries with backoff 603. Fetch from `server3.com` fails → State C stays in purgatory, retries with backoff
61 61
62The current implementation would wait for all servers before releasing any events. 62The current implementation would wait for all servers before releasing any events.
63 63
64### Unified Processing with Git Push Handler
65
66**Key insight**: The post-git-data-available processing is identical whether data arrives via:
67- A successful `git push` (handle_receive_pack)
68- Purgatory sync fetching OIDs from remote servers
69
70Both paths need to:
711. Discover satisfiable events from purgatory
722. Sync OIDs to authorized owner repos
733. Align refs (+ set HEAD)
744. Save events to database
755. Notify WebSocket subscribers
766. Remove from purgatory
77
78Rather than duplicate this logic, we use a single unified function `process_newly_available_git_data` that handles all post-git-data-available processing. See [Unified Git Data Sync](unify-git-data-sync.md) for the complete design.
79
80This means:
81- **`handle_receive_pack`** calls `process_newly_available_git_data` after git push succeeds
82- **`sync_identifier_from_url`** calls `process_newly_available_git_data` after OID fetch succeeds
83- **Same behavior** regardless of how git data arrived
84
64## Architecture 85## Architecture
65 86
66### Overview 87### Overview
@@ -634,24 +655,17 @@ pub trait SyncContext: Send + Sync {
634 /// Fetch OIDs from a remote server 655 /// Fetch OIDs from a remote server
635 async fn fetch_oids(&self, repo_path: &Path, url: &str, oids: &[String]) -> Result<Vec<String>>; 656 async fn fetch_oids(&self, repo_path: &Path, url: &str, oids: &[String]) -> Result<Vec<String>>;
636 657
637 /// Process events that can now be satisfied. 658 /// Process newly available git data.
659 ///
660 /// This is a thin wrapper around the unified `process_newly_available_git_data` function.
661 /// Called after each successful OID fetch to check if any purgatory events can now be satisfied.
638 /// 662 ///
639 /// For each purgatory event (state or PR) for this identifier: 663 /// See [Unified Git Data Sync](unify-git-data-sync.md) for the complete design.
640 /// 1. Check if all required OIDs are now available in the source repo 664 async fn process_newly_available_git_data(
641 /// 2. For satisfiable state events: 665 &self,
642 /// a. Check if this state is authorized and should be applied (vs existing states) 666 source_repo_path: &Path,
643 /// b. Copy OIDs to all owner repos that authorize this state author 667 new_oids: &HashSet<String>,
644 /// c. Align refs with state in each authorized repo 668 ) -> Result<ProcessResult>;
645 /// d. Save state event to database
646 /// e. Notify WebSocket subscribers
647 /// f. Remove from purgatory
648 /// 3. For satisfiable PR events:
649 /// a. Copy PR commit to owner repos that share maintainers with tagged owners
650 /// b. Create refs/nostr/<event-id> in each repo
651 /// c. Save PR event to database
652 /// d. Notify WebSocket subscribers
653 /// e. Remove from purgatory
654 async fn process_satisfiable_events(&self, identifier: &str) -> Result<ProcessResult>;
655 669
656 /// Check if there are still pending events for this identifier 670 /// Check if there are still pending events for this identifier
657 fn has_pending_events(&self, identifier: &str) -> bool; 671 fn has_pending_events(&self, identifier: &str) -> bool;
@@ -694,21 +708,20 @@ impl RealSyncContext {
694impl SyncContext for RealSyncContext { 708impl SyncContext for RealSyncContext {
695 // ... other methods ... 709 // ... other methods ...
696 710
697 async fn process_satisfiable_events(&self, identifier: &str) -> Result<ProcessResult> { 711 async fn process_newly_available_git_data(
698 // Get repository data and find source repo 712 &self,
699 let db_repo_data = fetch_repository_data(&self.database, identifier).await?; 713 source_repo_path: &Path,
700 let source_repo_path = self.find_target_repo(&db_repo_data) 714 new_oids: &HashSet<String>,
701 .ok_or_else(|| anyhow::anyhow!("No target repo found"))?; 715 ) -> Result<ProcessResult> {
702 716 // Call the unified function that handles all post-git-data-available processing
703 // Call the standalone function with all dependencies 717 // This is the same function called by handle_receive_pack after a git push
704 process_satisfiable_events_impl( 718 crate::git::process_newly_available_git_data(
705 identifier, 719 source_repo_path,
706 &source_repo_path, 720 new_oids,
707 &db_repo_data,
708 &self.git_data_path,
709 &self.database, 721 &self.database,
710 self.local_relay.as_ref(), 722 self.local_relay.as_ref(),
711 &self.purgatory, 723 &self.purgatory,
724 &self.git_data_path,
712 ).await 725 ).await
713 } 726 }
714 727
@@ -716,7 +729,7 @@ impl SyncContext for RealSyncContext {
716} 729}
717``` 730```
718 731
719**Note**: The `SyncContext` trait abstracts away the dependencies for testability. The real implementation (`RealSyncContext`) holds references to purgatory, database, etc., and the `process_satisfiable_events` method uses them internally. This keeps the sync logic functions (`sync_identifier_next_url`, `sync_identifier_from_url`) clean and testable with mocks. 732**Note**: The `SyncContext` trait abstracts away the dependencies for testability. The real implementation (`RealSyncContext`) holds references to purgatory, database, etc., and the `process_newly_available_git_data` method delegates to the unified function. This keeps the sync logic functions (`sync_identifier_next_url`, `sync_identifier_from_url`) clean and testable with mocks.
720 733
721## Core Sync Logic 734## Core Sync Logic
722 735
@@ -962,11 +975,12 @@ pub async fn sync_identifier_from_url<C: SyncContext>(
962 975
963 // Try to process any events that can now be satisfied 976 // Try to process any events that can now be satisfied
964 if oids_fetched > 0 { 977 if oids_fetched > 0 {
965 if let Err(e) = ctx.process_satisfiable_events(identifier).await { 978 let new_oids: HashSet<String> = needed_oids.iter().cloned().collect();
979 if let Err(e) = ctx.process_newly_available_git_data(&target_repo, &new_oids).await {
966 tracing::warn!( 980 tracing::warn!(
967 identifier = %identifier, 981 identifier = %identifier,
968 error = %e, 982 error = %e,
969 "Failed to process satisfiable events" 983 "Failed to process newly available git data"
970 ); 984 );
971 } 985 }
972 } 986 }
@@ -975,18 +989,27 @@ pub async fn sync_identifier_from_url<C: SyncContext>(
975} 989}
976``` 990```
977 991
978### process_satisfiable_events 992### process_newly_available_git_data (Unified Function)
979 993
980This is the core function that handles the "release from purgatory" logic. It's called after each successful fetch to check if any purgatory events can now be satisfied with the available git data. 994This is the core function that handles the "release from purgatory" logic. It's called after each successful fetch to check if any purgatory events can now be satisfied with the available git data.
981 995
982**Key Design Decision**: OID copying and ref alignment happen in `process_satisfiable_events`, NOT after the entire sync completes. This enables: 996**Key Design Decision**: This is a **unified function** shared with the git push handler. Both `handle_receive_pack` (after git push) and `sync_identifier_from_url` (after purgatory sync fetch) call the same function. See [Unified Git Data Sync](unify-git-data-sync.md) for the complete implementation.
997
998**Why unify?**
983 999
9841. **Incremental progress**: Events can be released as soon as their OIDs are available, even if other events for the same identifier still need data 1000The post-git-data-available processing is identical regardless of how data arrived:
9852. **Partial success**: If we fetch OIDs for one state event but not another, the first can be released immediately 1001
9863. **Cleaner separation**: `sync_identifier_from_url` only fetches; `process_satisfiable_events` handles all the "what to do with the data" logic 1002| Step | After git push | After purgatory fetch |
1003|------|---------------|----------------------|
1004| Discover satisfiable events | ✅ Same | ✅ Same |
1005| Sync OIDs to owner repos | ✅ Same | ✅ Same |
1006| Align refs (+ set HEAD) | ✅ Same | ✅ Same |
1007| Save events to database | ✅ Same | ✅ Same |
1008| Notify WebSocket | ✅ Same | ✅ Same |
1009| Remove from purgatory | ✅ Same | ✅ Same |
987 1010
988```rust 1011```rust
989/// Result of processing satisfiable events 1012/// Result of processing newly available git data
990#[derive(Debug, Default)] 1013#[derive(Debug, Default)]
991pub struct ProcessResult { 1014pub struct ProcessResult {
992 /// Number of state events released from purgatory 1015 /// Number of state events released from purgatory
@@ -995,159 +1018,32 @@ pub struct ProcessResult {
995 pub prs_released: usize, 1018 pub prs_released: usize,
996 /// Number of repositories synced (OIDs copied + refs aligned) 1019 /// Number of repositories synced (OIDs copied + refs aligned)
997 pub repos_synced: usize, 1020 pub repos_synced: usize,
998 /// Errors encountered 1021 /// Number of refs created/updated/deleted
1022 pub refs_created: usize,
1023 pub refs_updated: usize,
1024 pub refs_deleted: usize,
1025 /// Errors encountered (non-fatal)
999 pub errors: Vec<String>, 1026 pub errors: Vec<String>,
1000} 1027}
1001 1028
1002/// Process purgatory events that can now be satisfied with available git data. 1029/// Unified processing of newly available git data.
1003/// 1030///
1004/// This function is called after each successful OID fetch. It: 1031/// Called whenever git data becomes available, whether from:
1005/// 1. Iterates through all purgatory events for this identifier 1032/// - A successful `git push` (handle_receive_pack)
1006/// 2. For each event, checks if all required OIDs are now available 1033/// - Purgatory sync fetching OIDs from remote servers
1007/// 3. For satisfiable events, performs the full "release" workflow 1034///
1008/// 1035/// See unify-git-data-sync.md for complete implementation details.
1009/// The release workflow for STATE events: 1036pub async fn process_newly_available_git_data(
1010/// 1. Check authorization: is this state author authorized by any owner's maintainer set?
1011/// 2. Check priority: is this state newer than existing states for those owners?
1012/// 3. Copy OIDs to all authorized owner repos (using sync_to_owner_repos logic)
1013/// 4. Align refs with state in each authorized repo
1014/// 5. Save state event to database
1015/// 6. Notify WebSocket subscribers
1016/// 7. Remove from purgatory
1017///
1018/// The release workflow for PR events:
1019/// 1. Copy PR commit to owner repos that share maintainers with tagged owners
1020/// 2. Create refs/nostr/<event-id> in each repo
1021/// 3. Save PR event to database
1022/// 4. Notify WebSocket subscribers
1023/// 5. Remove from purgatory
1024///
1025/// Note: This is the implementation function called by RealSyncContext.
1026/// The SyncContext trait method has a simpler signature because the
1027/// implementation has access to all dependencies via self.
1028pub async fn process_satisfiable_events_impl(
1029 identifier: &str,
1030 source_repo_path: &Path, 1037 source_repo_path: &Path,
1031 db_repo_data: &RepositoryData, 1038 new_oids: &HashSet<String>,
1032 git_data_path: &Path,
1033 database: &SharedDatabase, 1039 database: &SharedDatabase,
1034 local_relay: Option<&nostr_relay_builder::LocalRelay>, 1040 local_relay: Option<&nostr_relay_builder::LocalRelay>,
1035 purgatory: &Purgatory, 1041 purgatory: &Purgatory,
1036) -> Result<ProcessResult> { 1042 git_data_path: &Path,
1037 let mut result = ProcessResult::default(); 1043) -> Result<ProcessResult>;
1038
1039 // Process state events
1040 let state_entries = purgatory.find_state(identifier);
1041 for entry in state_entries {
1042 // Parse the state event
1043 let state = match RepositoryState::from_event(&entry.event) {
1044 Ok(s) => s,
1045 Err(e) => {
1046 tracing::warn!(
1047 event_id = %entry.event.id,
1048 error = %e,
1049 "Failed to parse state event from purgatory"
1050 );
1051 continue;
1052 }
1053 };
1054
1055 // Check if all OIDs are available in the source repo
1056 let missing_oids = identify_missing_oids(&state, source_repo_path);
1057 if !missing_oids.is_empty() {
1058 tracing::debug!(
1059 event_id = %entry.event.id,
1060 missing = missing_oids.len(),
1061 "State event still missing OIDs, skipping"
1062 );
1063 continue;
1064 }
1065
1066 // All OIDs available - proceed with release
1067 tracing::info!(
1068 identifier = %identifier,
1069 event_id = %entry.event.id,
1070 "All OIDs available, releasing state event from purgatory"
1071 );
1072
1073 // Sync to owner repos (copy OIDs + align refs)
1074 // This handles authorization checks internally
1075 let sync_result = sync_to_owner_repos(
1076 source_repo_path,
1077 &state,
1078 db_repo_data,
1079 git_data_path,
1080 );
1081 result.repos_synced += sync_result.repos_synced;
1082
1083 if sync_result.repos_synced == 0 {
1084 tracing::warn!(
1085 identifier = %identifier,
1086 event_id = %entry.event.id,
1087 "No repos synced - state author may not be authorized"
1088 );
1089 // Don't remove from purgatory - maybe authorization will change
1090 continue;
1091 }
1092
1093 // Save to database
1094 match database.save_event(&entry.event).await {
1095 Ok(_) => {
1096 tracing::info!(
1097 identifier = %identifier,
1098 event_id = %entry.event.id,
1099 "Saved state event to database"
1100 );
1101
1102 // Notify WebSocket subscribers
1103 if let Some(relay) = local_relay {
1104 relay.notify_event(entry.event.clone());
1105 }
1106
1107 // Remove from purgatory
1108 purgatory.remove_state_event(identifier, &entry.event.id);
1109 result.states_released += 1;
1110 }
1111 Err(e) => {
1112 tracing::warn!(
1113 event_id = %entry.event.id,
1114 error = %e,
1115 "Failed to save state event to database"
1116 );
1117 result.errors.push(format!("Failed to save state {}: {}", entry.event.id, e));
1118 }
1119 }
1120 }
1121
1122 // TODO: Process PR events similarly
1123 // For now, PR events are handled separately
1124
1125 Ok(result)
1126}
1127
1128/// Identify OIDs in a state that are missing from a repository
1129fn identify_missing_oids(state: &RepositoryState, repo_path: &Path) -> Vec<String> {
1130 let mut missing = Vec::new();
1131
1132 for branch in &state.branches {
1133 if !branch.commit.starts_with("ref: ") && !oid_exists(repo_path, &branch.commit) {
1134 missing.push(branch.commit.clone());
1135 }
1136 }
1137
1138 for tag in &state.tags {
1139 if !tag.commit.starts_with("ref: ") && !oid_exists(repo_path, &tag.commit) {
1140 missing.push(tag.commit.clone());
1141 }
1142 }
1143
1144 missing
1145}
1146``` 1044```
1147 1045
1148**Why this design?** 1046**Key properties of the unified function:**
1149
1150The key insight is that `process_satisfiable_events` is called after *each* successful URL fetch, not just at the end of the sync. This means:
1151 1047
11521. **Early release**: If we fetch from `server1.com` and get all OIDs for state event A, we immediately release A even if state event B still needs OIDs from `server2.com` 10481. **Early release**: If we fetch from `server1.com` and get all OIDs for state event A, we immediately release A even if state event B still needs OIDs from `server2.com`
1153 1049
@@ -1157,6 +1053,8 @@ The key insight is that `process_satisfiable_events` is called after *each* succ
1157 1053
11584. **Authorization at release time**: We check authorization when releasing, not when adding to purgatory. This handles the case where maintainer sets change while an event is in purgatory. 10544. **Authorization at release time**: We check authorization when releasing, not when adding to purgatory. This handles the case where maintainer sets change while an event is in purgatory.
1159 1055
10565. **Handles all event types**: Both state events (kind 30618) and PR events (kind 1617/1618) are processed uniformly.
1057
1160### The Sync Identifier Loop (Main Sync) 1058### The Sync Identifier Loop (Main Sync)
1161 1059
1162```rust 1060```rust
@@ -1198,8 +1096,6 @@ pub async fn sync_identifier<C: SyncContext>(
1198 1096
1199 let needed_oids = ctx.collect_needed_oids(identifier); 1097 let needed_oids = ctx.collect_needed_oids(identifier);
1200 if needed_oids.is_empty() { 1098 if needed_oids.is_empty() {
1201 // Process any remaining satisfiable events
1202 let _ = ctx.process_satisfiable_events(identifier).await;
1203 tracing::info!(identifier = %identifier, "Sync complete - all OIDs available"); 1099 tracing::info!(identifier = %identifier, "Sync complete - all OIDs available");
1204 return true; 1100 return true;
1205 } 1101 }
@@ -1220,7 +1116,6 @@ pub async fn sync_identifier<C: SyncContext>(
1220 1116
1221 let needed_oids = ctx.collect_needed_oids(identifier); 1117 let needed_oids = ctx.collect_needed_oids(identifier);
1222 if needed_oids.is_empty() { 1118 if needed_oids.is_empty() {
1223 let _ = ctx.process_satisfiable_events(identifier).await;
1224 return true; 1119 return true;
1225 } 1120 }
1226 1121
@@ -1558,7 +1453,11 @@ pub trait SyncContext: Send + Sync {
1558 async fn fetch_repository_data(&self, identifier: &str) -> Result<RepositoryData>; 1453 async fn fetch_repository_data(&self, identifier: &str) -> Result<RepositoryData>;
1559 fn collect_needed_oids(&self, identifier: &str) -> HashSet<String>; 1454 fn collect_needed_oids(&self, identifier: &str) -> HashSet<String>;
1560 async fn fetch_oids(&self, repo_path: &Path, url: &str, oids: &[String]) -> Result<Vec<String>>; 1455 async fn fetch_oids(&self, repo_path: &Path, url: &str, oids: &[String]) -> Result<Vec<String>>;
1561 async fn process_satisfiable_events(&self, identifier: &str) -> Result<ProcessResult>; 1456 async fn process_newly_available_git_data(
1457 &self,
1458 source_repo_path: &Path,
1459 new_oids: &HashSet<String>,
1460 ) -> Result<ProcessResult>;
1562 fn has_pending_events(&self, identifier: &str) -> bool; 1461 fn has_pending_events(&self, identifier: &str) -> bool;
1563 fn find_target_repo(&self, data: &RepositoryData) -> Option<PathBuf>; 1462 fn find_target_repo(&self, data: &RepositoryData) -> Option<PathBuf>;
1564 fn our_domain(&self) -> Option<&str>; 1463 fn our_domain(&self) -> Option<&str>;
@@ -1620,7 +1519,7 @@ mod tests {
1620 1519
1621 #[tokio::test] 1520 #[tokio::test]
1622 async fn from_url_fetches_and_processes_on_success() { 1521 async fn from_url_fetches_and_processes_on_success() {
1623 // Successful fetch triggers process_satisfiable_events 1522 // Successful fetch triggers process_newly_available_git_data
1624 } 1523 }
1625} 1524}
1626``` 1525```
@@ -1628,7 +1527,7 @@ mod tests {
1628**Success Criteria**: 1527**Success Criteria**:
1629- [ ] `sync_identifier_next_url` returns non-throttled, untried URL 1528- [ ] `sync_identifier_next_url` returns non-throttled, untried URL
1630- [ ] `sync_identifier_next_url` returns `None` when all URLs tried or throttled 1529- [ ] `sync_identifier_next_url` returns `None` when all URLs tried or throttled
1631- [ ] `sync_identifier_from_url` calls `fetch_oids` and `process_satisfiable_events` 1530- [ ] `sync_identifier_from_url` calls `fetch_oids` and `process_newly_available_git_data`
1632- [ ] All 3 unit tests pass 1531- [ ] All 3 unit tests pass
1633 1532
1634--- 1533---
@@ -1764,7 +1663,7 @@ impl SyncContext for RealSyncContext { ... }
1764**Success Criteria**: 1663**Success Criteria**:
1765- [ ] All `SyncContext` methods implemented 1664- [ ] All `SyncContext` methods implemented
1766- [ ] Connects to real database, git, and relay 1665- [ ] Connects to real database, git, and relay
1767- [ ] `process_satisfiable_events` releases events from purgatory 1666- [ ] `process_newly_available_git_data` releases events from purgatory
1768 1667
1769--- 1668---
1770 1669
diff --git a/docs/explanation/unify-git-data-sync.md b/docs/explanation/unify-git-data-sync.md
new file mode 100644
index 0000000..fa1f983
--- /dev/null
+++ b/docs/explanation/unify-git-data-sync.md
@@ -0,0 +1,481 @@
1# Unified Git Data Sync
2
3## Status
4
5**Proposed** - January 2026
6
7## Context
8
9Currently, two separate code paths handle "git data is now available" scenarios:
10
111. **`handle_receive_pack`** (src/git/handlers.rs) - After a successful `git push`
122. **`sync_state_git_data`** (src/purgatory/mod.rs) - After purgatory sync fetches OIDs from remote servers
13
14Both paths perform essentially the same post-processing:
15
16| Step | `handle_receive_pack` | `sync_state_git_data` |
17|------|----------------------|----------------------|
18| Set HEAD | ✅ `try_set_head_if_available()` | ✅ (via `align_repository_with_state`) |
19| Save events to DB | ✅ `database.save_event()` | ✅ `database.save_event()` |
20| Remove from purgatory | ✅ `remove_state_event()` / `remove_pr()` | ✅ `remove_state_event()` |
21| Notify WebSocket | ✅ `relay.notify_event()` | ✅ `relay.notify_event()` |
22| Sync state to owner repos | ✅ `sync_to_owner_repos()` | ✅ `sync_to_owner_repos()` |
23| Sync PR refs to owner repos | ✅ `sync_pr_refs_to_tagged_owner_repos()` | ❌ Not implemented |
24
25This duplication creates maintenance burden and inconsistent behavior (e.g., PR sync missing from purgatory path).
26
27## Decision
28
29Create a single unified function that handles all post-git-data-available processing:
30
31```rust
32pub async fn process_newly_available_git_data(
33 source_repo_path: &Path,
34 new_oids: &HashSet<String>,
35 database: &SharedDatabase,
36 local_relay: Option<&nostr_relay_builder::LocalRelay>,
37 purgatory: &Purgatory,
38 git_data_path: &Path,
39) -> ProcessResult
40```
41
42### Key Design Principles
43
44**1. Always discover events from purgatory**
45
46Rather than accepting pre-authorized events (which may have changed since authorization), the function always scans purgatory to find satisfiable events. This ensures consistency and handles race conditions where events change between authorization and processing.
47
48**2. Minimal input, maximal output**
49
50Callers only need to provide:
51- `source_repo_path` - Where the git data landed
52- `new_oids` - Which OIDs are now available (for efficient filtering)
53
54The function handles everything else: finding events, syncing across repos, aligning refs, setting HEAD, saving to database, notifying subscribers, and cleaning up purgatory.
55
56**3. Process all event types uniformly**
57
58Both state events (kind 30618) and PR events (kind 1617/1618) are processed in the same flow, ensuring consistent behavior.
59
60## Architecture
61
62### Flow Overview
63
64```
65┌─────────────────────────────────────────────────────────────────────────────────┐
66│ Git Data Becomes Available │
67│ │
68│ ┌─────────────────────┐ ┌─────────────────────┐ │
69│ │ handle_receive_pack │ │ purgatory sync │ │
70│ │ (push received) │ │ (fetch completed) │ │
71│ └──────────┬──────────┘ └──────────┬──────────┘ │
72│ │ │ │
73│ │ source_repo_path │ source_repo_path │
74│ │ new_oids │ new_oids │
75│ │ │ │
76│ └────────────────┬───────────────────┘ │
77│ │ │
78│ ▼ │
79│ ┌────────────────────────────────────────┐ │
80│ │ process_newly_available_git_data() │ │
81│ │ │ │
82│ │ 1. Extract identifier from path │ │
83│ │ 2. Fetch repository data from DB │ │
84│ │ 3. Find satisfiable state events │ │
85│ │ 4. Find satisfiable PR events │ │
86│ │ 5. For each event: │ │
87│ │ - Sync OIDs to owner repos │ │
88│ │ - Align refs (+ set HEAD) │ │
89│ │ - Save to database │ │
90│ │ - Notify WebSocket │ │
91│ │ - Remove from purgatory │ │
92│ └────────────────────────────────────────┘ │
93└─────────────────────────────────────────────────────────────────────────────────┘
94```
95
96### Event Discovery
97
98The function discovers satisfiable events by scanning purgatory:
99
100**For State Events:**
1011. Get all state entries for the identifier from purgatory
1022. For each entry, check if ALL required OIDs exist in source repo
1033. Quick optimization: skip if none of `new_oids` are in the state's OID set
104
105**For PR Events:**
1061. Get all PR entries for the identifier from purgatory (via secondary index)
1072. For each entry with an event, check if the commit OID exists in source repo
1083. Quick optimization: skip if commit not in `new_oids`
109
110### Sync to Owner Repos
111
112**For State Events:**
113
114For each owner whose maintainer set authorizes the state author:
1151. Skip if a newer state already exists for that owner
1162. Copy missing OIDs from source repo to target repo
1173. Align refs (create/update/delete branches and tags)
1184. Set HEAD per state announcement
119
120**For PR Events:**
121
122For each owner whose maintainer set includes any tagged owner (from `a` tags):
1231. Copy commit from source repo to target repo (if missing)
1242. Create `refs/nostr/<event-id>` pointing to the commit
125
126## Data Structure Changes
127
128### PrPurgatoryEntry
129
130Add `identifier` field for secondary index lookup:
131
132```rust
133#[derive(Debug, Clone)]
134pub struct PrPurgatoryEntry {
135 /// The nostr PR event, if received (None = git data arrived first)
136 pub event: Option<Event>,
137
138 /// The expected commit SHA from 'c' tag or actual commit pushed
139 pub commit: String,
140
141 /// Repository identifier extracted from 'a' tag (30617:<owner>:<identifier>)
142 /// Used for lookup when git data arrives
143 pub identifier: Option<String>,
144
145 /// When this entry was added to purgatory
146 pub created_at: Instant,
147
148 /// Expiry deadline
149 pub expires_at: Instant,
150}
151```
152
153### Purgatory Secondary Index
154
155Add index for finding PR events by identifier:
156
157```rust
158pub struct Purgatory {
159 /// State events indexed by repository identifier
160 state_events: Arc<DashMap<String, Vec<StatePurgatoryEntry>>>,
161
162 /// PR events indexed by event ID (hex string)
163 pr_events: Arc<DashMap<String, PrPurgatoryEntry>>,
164
165 /// Secondary index: identifier -> event_ids for PR events
166 pr_events_by_identifier: Arc<DashMap<String, HashSet<String>>>,
167
168 git_data_path: PathBuf,
169}
170```
171
172### New Purgatory Methods
173
174```rust
175impl Purgatory {
176 /// Find all PR events for an identifier
177 pub fn find_prs_for_identifier(&self, identifier: &str) -> Vec<PrPurgatoryEntry>;
178
179 /// Add PR with automatic identifier extraction and indexing
180 pub fn add_pr(&self, event: Event, event_id: String, commit: String);
181
182 /// Add placeholder with optional identifier
183 pub fn add_pr_placeholder(&self, event_id: String, commit: String, identifier: Option<String>);
184
185 /// Remove PR (also cleans up secondary index)
186 pub fn remove_pr(&self, event_id: &str);
187}
188```
189
190## Implementation
191
192### Core Function
193
194```rust
195/// Unified processing of newly available git data.
196///
197/// Called whenever git data becomes available, whether from:
198/// - A successful `git push` (handle_receive_pack)
199/// - Purgatory sync fetching OIDs from remote servers
200///
201/// # What it does
202///
203/// 1. **Discover satisfiable events**: Scans purgatory for state and PR events
204/// whose required OIDs are now available in `source_repo_path`
205///
206/// 2. **For each satisfiable STATE event**:
207/// - Find all owner repos that authorize this state's author
208/// - Copy OIDs from source repo to each authorized owner repo
209/// - Align refs (create/update/delete) to match state
210/// - Set HEAD per state announcement
211/// - Save event to database
212/// - Notify WebSocket subscribers
213/// - Remove from purgatory
214///
215/// 3. **For each satisfiable PR event**:
216/// - Find all owner repos that list tagged owners as maintainers
217/// - Copy commit from source repo to each relevant owner repo
218/// - Create refs/nostr/<event-id> in each repo
219/// - Save event to database
220/// - Notify WebSocket subscribers
221/// - Remove from purgatory
222pub async fn process_newly_available_git_data(
223 source_repo_path: &Path,
224 new_oids: &HashSet<String>,
225 database: &SharedDatabase,
226 local_relay: Option<&nostr_relay_builder::LocalRelay>,
227 purgatory: &Purgatory,
228 git_data_path: &Path,
229) -> ProcessResult {
230 let mut result = ProcessResult::default();
231
232 // Extract identifier from repo path
233 let identifier = match extract_identifier_from_repo_path(source_repo_path, git_data_path) {
234 Some(id) => id,
235 None => return result,
236 };
237
238 // Fetch repository data once for all operations
239 let db_repo_data = match fetch_repository_data(database, &identifier).await {
240 Ok(data) => data,
241 Err(e) => {
242 result.errors.push(format!("Failed to fetch repo data: {}", e));
243 return result;
244 }
245 };
246
247 // Process satisfiable state events
248 let state_result = process_satisfiable_state_events(
249 source_repo_path,
250 &identifier,
251 new_oids,
252 &db_repo_data,
253 database,
254 local_relay,
255 purgatory,
256 git_data_path,
257 ).await;
258
259 result.merge_state_result(state_result);
260
261 // Process satisfiable PR events
262 let pr_result = process_satisfiable_pr_events(
263 source_repo_path,
264 &identifier,
265 new_oids,
266 &db_repo_data,
267 database,
268 local_relay,
269 purgatory,
270 git_data_path,
271 ).await;
272
273 result.merge_pr_result(pr_result);
274
275 result
276}
277```
278
279### Result Type
280
281```rust
282/// Result of processing newly available git data
283#[derive(Debug, Default)]
284pub struct ProcessResult {
285 /// Number of state events released from purgatory
286 pub states_released: usize,
287 /// Number of PR events released from purgatory
288 pub prs_released: usize,
289 /// Number of owner repositories synced
290 pub repos_synced: usize,
291 /// Number of refs created across all repos
292 pub refs_created: usize,
293 /// Number of refs updated across all repos
294 pub refs_updated: usize,
295 /// Number of refs deleted across all repos
296 pub refs_deleted: usize,
297 /// Errors encountered (non-fatal)
298 pub errors: Vec<String>,
299}
300```
301
302### Helper: Extract Identifier from PR Event
303
304```rust
305/// Extract identifier from PR event's `a` tag.
306/// Format: 30617:<owner_pubkey>:<identifier>
307fn extract_identifier_from_pr_event(event: &Event) -> Option<String> {
308 event.tags.iter().find_map(|tag| {
309 let tag_vec = tag.clone().to_vec();
310 if tag_vec.len() >= 2 && tag_vec[0] == "a" && tag_vec[1].starts_with("30617:") {
311 let parts: Vec<&str> = tag_vec[1].split(':').collect();
312 if parts.len() >= 3 {
313 Some(parts[2].to_string())
314 } else {
315 None
316 }
317 } else {
318 None
319 }
320 })
321}
322```
323
324### Helper: Extract Identifier from Repo Path
325
326```rust
327/// Extract identifier from repository path.
328/// Path format: {git_data_path}/{npub}/{identifier}.git
329fn extract_identifier_from_repo_path(repo_path: &Path, git_data_path: &Path) -> Option<String> {
330 let relative = repo_path.strip_prefix(git_data_path).ok()?;
331 let components: Vec<_> = relative.components().collect();
332
333 if components.len() >= 2 {
334 let identifier_with_git = components[1].as_os_str().to_str()?;
335 Some(identifier_with_git.trim_end_matches(".git").to_string())
336 } else {
337 None
338 }
339}
340```
341
342## Integration
343
344### handle_receive_pack (Simplified)
345
346```rust
347// After git receive-pack succeeds:
348
349// Collect new OIDs from the push
350let new_oids: HashSet<String> = pushed_refs
351 .iter()
352 .filter(|(_, new_oid, _)| new_oid != "0000000000000000000000000000000000000000")
353 .map(|(_, new_oid, _)| new_oid.clone())
354 .collect();
355
356// Single unified call handles everything
357let result = process_newly_available_git_data(
358 &repo_path,
359 &new_oids,
360 &database,
361 Some(&relay),
362 &purgatory,
363 Path::new(git_data_path),
364).await;
365
366info!(
367 "Processed push: {} states, {} PRs released, {} repos synced",
368 result.states_released,
369 result.prs_released,
370 result.repos_synced
371);
372```
373
374### Purgatory Sync (Simplified)
375
376```rust
377// After fetching OIDs from remote:
378
379let new_oids: HashSet<String> = fetched_oids.into_iter().collect();
380
381let result = process_newly_available_git_data(
382 &source_repo_path,
383 &new_oids,
384 &database,
385 local_relay.as_ref(),
386 &purgatory,
387 &git_data_path,
388).await;
389```
390
391### Integration with Purgatory Sync Redesign
392
393The purgatory sync redesign (see `purgatory-sync-redesign.md`) uses this unified function in its `sync_identifier_from_url` implementation:
394
395```rust
396pub async fn sync_identifier_from_url<C: SyncContext>(
397 ctx: &C,
398 identifier: &str,
399 url: &str,
400 throttle_manager: &Arc<ThrottleManager>,
401) -> usize {
402 // ... fetch OIDs from URL ...
403
404 let fetched_oids = ctx.fetch_oids(&target_repo, url, &needed_oids).await?;
405
406 if !fetched_oids.is_empty() {
407 // Use unified processing
408 let new_oids: HashSet<String> = fetched_oids.into_iter().collect();
409
410 let result = process_newly_available_git_data(
411 &target_repo,
412 &new_oids,
413 ctx.database(),
414 ctx.local_relay(),
415 ctx.purgatory(),
416 ctx.git_data_path(),
417 ).await;
418
419 // Result already handled purgatory removal, DB saves, etc.
420 }
421
422 fetched_oids.len()
423}
424```
425
426The `SyncContext` trait wraps this function in its `process_newly_available_git_data` method for testability.
427
428## Benefits
429
4301. **Single source of truth** - One function handles all post-git-data processing
4312. **Always fresh discovery** - Events discovered from purgatory at processing time
4323. **Consistent behavior** - Push and sync paths behave identically
4334. **Simpler callers** - Just pass repo_path + new_oids
4345. **Complete processing** - Handles all event types, all repo syncing, HEAD, DB, WebSocket, purgatory
4356. **PR sync parity** - PR events now synced in purgatory path (was missing)
436
437## Code to Remove/Simplify
438
439After implementing the unified function:
440
4411. **Remove**: Most of `sync_state_git_data` in `src/purgatory/mod.rs`
4422. **Simplify**: Event handling in `handle_receive_pack` (replace ~100 lines with single call)
4433. **Internalize**: `sync_to_owner_repos` and `sync_pr_refs_to_tagged_owner_repos` become internal helpers
444
445## Testing Strategy
446
447### Unit Tests
448
4491. `extract_identifier_from_repo_path` - Various path formats
4502. `extract_identifier_from_pr_event` - Various tag formats
4513. Event discovery logic with mock purgatory
452
453### Integration Tests
454
4551. Push triggers processing and releases state event
4562. Push triggers processing and releases PR event
4573. Purgatory sync triggers processing
4584. Multiple events for same identifier processed correctly
4595. Cross-repo sync works for both state and PR events
460
461## Future Considerations
462
463### Batch Processing
464
465Currently processes events one at a time. Could batch database saves and WebSocket notifications for efficiency with many events.
466
467### Partial Failures
468
469Currently continues on errors and collects them in result. Could add retry logic or transaction semantics if needed.
470
471### Metrics
472
473Add Prometheus metrics for:
474- Events processed by type (state/PR)
475- Repos synced per processing call
476- Processing duration
477- Errors by type
478
479## Related Documents
480
481- [Purgatory Sync Redesign](purgatory-sync-redesign.md) - Uses this unified function for purgatory sync operations