upleb.uk

Public git repos — served from a NIP-34 GRASP relay at git.upleb.uk

summaryrefslogtreecommitdiff
path: root/docs/explanation/purgatory-sync-redesign.md
diff options
context:
space:
mode:
authorDanConwayDev <DanConwayDev@protonmail.com>2026-01-07 10:39:14 +0000
committerDanConwayDev <DanConwayDev@protonmail.com>2026-01-07 10:39:14 +0000
commit89a30ec554b724ed29023c38117cfd597371a739 (patch)
treeffbc4e281c3e74ea8ddb571880d2ee8a9d260c1e /docs/explanation/purgatory-sync-redesign.md
parentecdc68f7d38b63ec411b48fac05cd2d98303cab0 (diff)
docs: purgatory add unified process_newly_available_git_data to design
Diffstat (limited to 'docs/explanation/purgatory-sync-redesign.md')
-rw-r--r--docs/explanation/purgatory-sync-redesign.md285
1 files changed, 92 insertions, 193 deletions
diff --git a/docs/explanation/purgatory-sync-redesign.md b/docs/explanation/purgatory-sync-redesign.md
index 54e279a..c1e6af6 100644
--- a/docs/explanation/purgatory-sync-redesign.md
+++ b/docs/explanation/purgatory-sync-redesign.md
@@ -29,7 +29,7 @@ Redesign purgatory sync to be **identifier-based** rather than **event-based**,
29 29
30### Key Design Decision: Where Does OID Copying Happen? 30### Key Design Decision: Where Does OID Copying Happen?
31 31
32**Answer: In `process_satisfiable_events`, NOT after the entire sync completes.** 32**Answer: In `process_newly_available_git_data`, NOT after the entire sync completes.**
33 33
34The current implementation (`sync_state_git_data`) fetches all OIDs first, then at the end: 34The current implementation (`sync_state_git_data`) fetches all OIDs first, then at the end:
351. Copies OIDs to all authorized owner repos 351. Copies OIDs to all authorized owner repos
@@ -38,7 +38,7 @@ The current implementation (`sync_state_git_data`) fetches all OIDs first, then
384. Notifies subscribers 384. Notifies subscribers
395. Removes from purgatory 395. Removes from purgatory
40 40
41The redesign moves all of this into `process_satisfiable_events`, which is called after **each successful URL fetch**. This enables: 41The redesign moves all of this into `process_newly_available_git_data`, which is called after **each successful URL fetch**. This enables:
42 42
43| Aspect | Current (end-of-sync) | Redesign (per-fetch) | 43| Aspect | Current (end-of-sync) | Redesign (per-fetch) |
44|--------|----------------------|---------------------| 44|--------|----------------------|---------------------|
@@ -55,12 +55,33 @@ Consider syncing an identifier with 3 state events from different maintainers:
55- State C needs OIDs from `server3.com` (down) 55- State C needs OIDs from `server3.com` (down)
56 56
57With the redesign: 57With the redesign:
581. Fetch from `server1.com` succeeds → `process_satisfiable_events` releases State A immediately 581. Fetch from `server1.com` succeeds → `process_newly_available_git_data` releases State A immediately
592. Fetch from `server2.com` succeeds → `process_satisfiable_events` releases State B 592. Fetch from `server2.com` succeeds → `process_newly_available_git_data` releases State B
603. Fetch from `server3.com` fails → State C stays in purgatory, retries with backoff 603. Fetch from `server3.com` fails → State C stays in purgatory, retries with backoff
61 61
62The current implementation would wait for all servers before releasing any events. 62The current implementation would wait for all servers before releasing any events.
63 63
64### Unified Processing with Git Push Handler
65
66**Key insight**: The post-git-data-available processing is identical whether data arrives via:
67- A successful `git push` (handle_receive_pack)
68- Purgatory sync fetching OIDs from remote servers
69
70Both paths need to:
711. Discover satisfiable events from purgatory
722. Sync OIDs to authorized owner repos
733. Align refs (+ set HEAD)
744. Save events to database
755. Notify WebSocket subscribers
766. Remove from purgatory
77
78Rather than duplicate this logic, we use a single unified function `process_newly_available_git_data` that handles all post-git-data-available processing. See [Unified Git Data Sync](unify-git-data-sync.md) for the complete design.
79
80This means:
81- **`handle_receive_pack`** calls `process_newly_available_git_data` after git push succeeds
82- **`sync_identifier_from_url`** calls `process_newly_available_git_data` after OID fetch succeeds
83- **Same behavior** regardless of how git data arrived
84
64## Architecture 85## Architecture
65 86
66### Overview 87### Overview
@@ -634,24 +655,17 @@ pub trait SyncContext: Send + Sync {
634 /// Fetch OIDs from a remote server 655 /// Fetch OIDs from a remote server
635 async fn fetch_oids(&self, repo_path: &Path, url: &str, oids: &[String]) -> Result<Vec<String>>; 656 async fn fetch_oids(&self, repo_path: &Path, url: &str, oids: &[String]) -> Result<Vec<String>>;
636 657
637 /// Process events that can now be satisfied. 658 /// Process newly available git data.
659 ///
660 /// This is a thin wrapper around the unified `process_newly_available_git_data` function.
661 /// Called after each successful OID fetch to check if any purgatory events can now be satisfied.
638 /// 662 ///
639 /// For each purgatory event (state or PR) for this identifier: 663 /// See [Unified Git Data Sync](unify-git-data-sync.md) for the complete design.
640 /// 1. Check if all required OIDs are now available in the source repo 664 async fn process_newly_available_git_data(
641 /// 2. For satisfiable state events: 665 &self,
642 /// a. Check if this state is authorized and should be applied (vs existing states) 666 source_repo_path: &Path,
643 /// b. Copy OIDs to all owner repos that authorize this state author 667 new_oids: &HashSet<String>,
644 /// c. Align refs with state in each authorized repo 668 ) -> Result<ProcessResult>;
645 /// d. Save state event to database
646 /// e. Notify WebSocket subscribers
647 /// f. Remove from purgatory
648 /// 3. For satisfiable PR events:
649 /// a. Copy PR commit to owner repos that share maintainers with tagged owners
650 /// b. Create refs/nostr/<event-id> in each repo
651 /// c. Save PR event to database
652 /// d. Notify WebSocket subscribers
653 /// e. Remove from purgatory
654 async fn process_satisfiable_events(&self, identifier: &str) -> Result<ProcessResult>;
655 669
656 /// Check if there are still pending events for this identifier 670 /// Check if there are still pending events for this identifier
657 fn has_pending_events(&self, identifier: &str) -> bool; 671 fn has_pending_events(&self, identifier: &str) -> bool;
@@ -694,21 +708,20 @@ impl RealSyncContext {
694impl SyncContext for RealSyncContext { 708impl SyncContext for RealSyncContext {
695 // ... other methods ... 709 // ... other methods ...
696 710
697 async fn process_satisfiable_events(&self, identifier: &str) -> Result<ProcessResult> { 711 async fn process_newly_available_git_data(
698 // Get repository data and find source repo 712 &self,
699 let db_repo_data = fetch_repository_data(&self.database, identifier).await?; 713 source_repo_path: &Path,
700 let source_repo_path = self.find_target_repo(&db_repo_data) 714 new_oids: &HashSet<String>,
701 .ok_or_else(|| anyhow::anyhow!("No target repo found"))?; 715 ) -> Result<ProcessResult> {
702 716 // Call the unified function that handles all post-git-data-available processing
703 // Call the standalone function with all dependencies 717 // This is the same function called by handle_receive_pack after a git push
704 process_satisfiable_events_impl( 718 crate::git::process_newly_available_git_data(
705 identifier, 719 source_repo_path,
706 &source_repo_path, 720 new_oids,
707 &db_repo_data,
708 &self.git_data_path,
709 &self.database, 721 &self.database,
710 self.local_relay.as_ref(), 722 self.local_relay.as_ref(),
711 &self.purgatory, 723 &self.purgatory,
724 &self.git_data_path,
712 ).await 725 ).await
713 } 726 }
714 727
@@ -716,7 +729,7 @@ impl SyncContext for RealSyncContext {
716} 729}
717``` 730```
718 731
719**Note**: The `SyncContext` trait abstracts away the dependencies for testability. The real implementation (`RealSyncContext`) holds references to purgatory, database, etc., and the `process_satisfiable_events` method uses them internally. This keeps the sync logic functions (`sync_identifier_next_url`, `sync_identifier_from_url`) clean and testable with mocks. 732**Note**: The `SyncContext` trait abstracts away the dependencies for testability. The real implementation (`RealSyncContext`) holds references to purgatory, database, etc., and the `process_newly_available_git_data` method delegates to the unified function. This keeps the sync logic functions (`sync_identifier_next_url`, `sync_identifier_from_url`) clean and testable with mocks.
720 733
721## Core Sync Logic 734## Core Sync Logic
722 735
@@ -962,11 +975,12 @@ pub async fn sync_identifier_from_url<C: SyncContext>(
962 975
963 // Try to process any events that can now be satisfied 976 // Try to process any events that can now be satisfied
964 if oids_fetched > 0 { 977 if oids_fetched > 0 {
965 if let Err(e) = ctx.process_satisfiable_events(identifier).await { 978 let new_oids: HashSet<String> = needed_oids.iter().cloned().collect();
979 if let Err(e) = ctx.process_newly_available_git_data(&target_repo, &new_oids).await {
966 tracing::warn!( 980 tracing::warn!(
967 identifier = %identifier, 981 identifier = %identifier,
968 error = %e, 982 error = %e,
969 "Failed to process satisfiable events" 983 "Failed to process newly available git data"
970 ); 984 );
971 } 985 }
972 } 986 }
@@ -975,18 +989,27 @@ pub async fn sync_identifier_from_url<C: SyncContext>(
975} 989}
976``` 990```
977 991
978### process_satisfiable_events 992### process_newly_available_git_data (Unified Function)
979 993
980This is the core function that handles the "release from purgatory" logic. It's called after each successful fetch to check if any purgatory events can now be satisfied with the available git data. 994This is the core function that handles the "release from purgatory" logic. It's called after each successful fetch to check if any purgatory events can now be satisfied with the available git data.
981 995
982**Key Design Decision**: OID copying and ref alignment happen in `process_satisfiable_events`, NOT after the entire sync completes. This enables: 996**Key Design Decision**: This is a **unified function** shared with the git push handler. Both `handle_receive_pack` (after git push) and `sync_identifier_from_url` (after purgatory sync fetch) call the same function. See [Unified Git Data Sync](unify-git-data-sync.md) for the complete implementation.
997
998**Why unify?**
983 999
9841. **Incremental progress**: Events can be released as soon as their OIDs are available, even if other events for the same identifier still need data 1000The post-git-data-available processing is identical regardless of how data arrived:
9852. **Partial success**: If we fetch OIDs for one state event but not another, the first can be released immediately 1001
9863. **Cleaner separation**: `sync_identifier_from_url` only fetches; `process_satisfiable_events` handles all the "what to do with the data" logic 1002| Step | After git push | After purgatory fetch |
1003|------|---------------|----------------------|
1004| Discover satisfiable events | ✅ Same | ✅ Same |
1005| Sync OIDs to owner repos | ✅ Same | ✅ Same |
1006| Align refs (+ set HEAD) | ✅ Same | ✅ Same |
1007| Save events to database | ✅ Same | ✅ Same |
1008| Notify WebSocket | ✅ Same | ✅ Same |
1009| Remove from purgatory | ✅ Same | ✅ Same |
987 1010
988```rust 1011```rust
989/// Result of processing satisfiable events 1012/// Result of processing newly available git data
990#[derive(Debug, Default)] 1013#[derive(Debug, Default)]
991pub struct ProcessResult { 1014pub struct ProcessResult {
992 /// Number of state events released from purgatory 1015 /// Number of state events released from purgatory
@@ -995,159 +1018,32 @@ pub struct ProcessResult {
995 pub prs_released: usize, 1018 pub prs_released: usize,
996 /// Number of repositories synced (OIDs copied + refs aligned) 1019 /// Number of repositories synced (OIDs copied + refs aligned)
997 pub repos_synced: usize, 1020 pub repos_synced: usize,
998 /// Errors encountered 1021 /// Number of refs created/updated/deleted
1022 pub refs_created: usize,
1023 pub refs_updated: usize,
1024 pub refs_deleted: usize,
1025 /// Errors encountered (non-fatal)
999 pub errors: Vec<String>, 1026 pub errors: Vec<String>,
1000} 1027}
1001 1028
1002/// Process purgatory events that can now be satisfied with available git data. 1029/// Unified processing of newly available git data.
1003/// 1030///
1004/// This function is called after each successful OID fetch. It: 1031/// Called whenever git data becomes available, whether from:
1005/// 1. Iterates through all purgatory events for this identifier 1032/// - A successful `git push` (handle_receive_pack)
1006/// 2. For each event, checks if all required OIDs are now available 1033/// - Purgatory sync fetching OIDs from remote servers
1007/// 3. For satisfiable events, performs the full "release" workflow 1034///
1008/// 1035/// See unify-git-data-sync.md for complete implementation details.
1009/// The release workflow for STATE events: 1036pub async fn process_newly_available_git_data(
1010/// 1. Check authorization: is this state author authorized by any owner's maintainer set?
1011/// 2. Check priority: is this state newer than existing states for those owners?
1012/// 3. Copy OIDs to all authorized owner repos (using sync_to_owner_repos logic)
1013/// 4. Align refs with state in each authorized repo
1014/// 5. Save state event to database
1015/// 6. Notify WebSocket subscribers
1016/// 7. Remove from purgatory
1017///
1018/// The release workflow for PR events:
1019/// 1. Copy PR commit to owner repos that share maintainers with tagged owners
1020/// 2. Create refs/nostr/<event-id> in each repo
1021/// 3. Save PR event to database
1022/// 4. Notify WebSocket subscribers
1023/// 5. Remove from purgatory
1024///
1025/// Note: This is the implementation function called by RealSyncContext.
1026/// The SyncContext trait method has a simpler signature because the
1027/// implementation has access to all dependencies via self.
1028pub async fn process_satisfiable_events_impl(
1029 identifier: &str,
1030 source_repo_path: &Path, 1037 source_repo_path: &Path,
1031 db_repo_data: &RepositoryData, 1038 new_oids: &HashSet<String>,
1032 git_data_path: &Path,
1033 database: &SharedDatabase, 1039 database: &SharedDatabase,
1034 local_relay: Option<&nostr_relay_builder::LocalRelay>, 1040 local_relay: Option<&nostr_relay_builder::LocalRelay>,
1035 purgatory: &Purgatory, 1041 purgatory: &Purgatory,
1036) -> Result<ProcessResult> { 1042 git_data_path: &Path,
1037 let mut result = ProcessResult::default(); 1043) -> Result<ProcessResult>;
1038
1039 // Process state events
1040 let state_entries = purgatory.find_state(identifier);
1041 for entry in state_entries {
1042 // Parse the state event
1043 let state = match RepositoryState::from_event(&entry.event) {
1044 Ok(s) => s,
1045 Err(e) => {
1046 tracing::warn!(
1047 event_id = %entry.event.id,
1048 error = %e,
1049 "Failed to parse state event from purgatory"
1050 );
1051 continue;
1052 }
1053 };
1054
1055 // Check if all OIDs are available in the source repo
1056 let missing_oids = identify_missing_oids(&state, source_repo_path);
1057 if !missing_oids.is_empty() {
1058 tracing::debug!(
1059 event_id = %entry.event.id,
1060 missing = missing_oids.len(),
1061 "State event still missing OIDs, skipping"
1062 );
1063 continue;
1064 }
1065
1066 // All OIDs available - proceed with release
1067 tracing::info!(
1068 identifier = %identifier,
1069 event_id = %entry.event.id,
1070 "All OIDs available, releasing state event from purgatory"
1071 );
1072
1073 // Sync to owner repos (copy OIDs + align refs)
1074 // This handles authorization checks internally
1075 let sync_result = sync_to_owner_repos(
1076 source_repo_path,
1077 &state,
1078 db_repo_data,
1079 git_data_path,
1080 );
1081 result.repos_synced += sync_result.repos_synced;
1082
1083 if sync_result.repos_synced == 0 {
1084 tracing::warn!(
1085 identifier = %identifier,
1086 event_id = %entry.event.id,
1087 "No repos synced - state author may not be authorized"
1088 );
1089 // Don't remove from purgatory - maybe authorization will change
1090 continue;
1091 }
1092
1093 // Save to database
1094 match database.save_event(&entry.event).await {
1095 Ok(_) => {
1096 tracing::info!(
1097 identifier = %identifier,
1098 event_id = %entry.event.id,
1099 "Saved state event to database"
1100 );
1101
1102 // Notify WebSocket subscribers
1103 if let Some(relay) = local_relay {
1104 relay.notify_event(entry.event.clone());
1105 }
1106
1107 // Remove from purgatory
1108 purgatory.remove_state_event(identifier, &entry.event.id);
1109 result.states_released += 1;
1110 }
1111 Err(e) => {
1112 tracing::warn!(
1113 event_id = %entry.event.id,
1114 error = %e,
1115 "Failed to save state event to database"
1116 );
1117 result.errors.push(format!("Failed to save state {}: {}", entry.event.id, e));
1118 }
1119 }
1120 }
1121
1122 // TODO: Process PR events similarly
1123 // For now, PR events are handled separately
1124
1125 Ok(result)
1126}
1127
1128/// Identify OIDs in a state that are missing from a repository
1129fn identify_missing_oids(state: &RepositoryState, repo_path: &Path) -> Vec<String> {
1130 let mut missing = Vec::new();
1131
1132 for branch in &state.branches {
1133 if !branch.commit.starts_with("ref: ") && !oid_exists(repo_path, &branch.commit) {
1134 missing.push(branch.commit.clone());
1135 }
1136 }
1137
1138 for tag in &state.tags {
1139 if !tag.commit.starts_with("ref: ") && !oid_exists(repo_path, &tag.commit) {
1140 missing.push(tag.commit.clone());
1141 }
1142 }
1143
1144 missing
1145}
1146``` 1044```
1147 1045
1148**Why this design?** 1046**Key properties of the unified function:**
1149
1150The key insight is that `process_satisfiable_events` is called after *each* successful URL fetch, not just at the end of the sync. This means:
1151 1047
11521. **Early release**: If we fetch from `server1.com` and get all OIDs for state event A, we immediately release A even if state event B still needs OIDs from `server2.com` 10481. **Early release**: If we fetch from `server1.com` and get all OIDs for state event A, we immediately release A even if state event B still needs OIDs from `server2.com`
1153 1049
@@ -1157,6 +1053,8 @@ The key insight is that `process_satisfiable_events` is called after *each* succ
1157 1053
11584. **Authorization at release time**: We check authorization when releasing, not when adding to purgatory. This handles the case where maintainer sets change while an event is in purgatory. 10544. **Authorization at release time**: We check authorization when releasing, not when adding to purgatory. This handles the case where maintainer sets change while an event is in purgatory.
1159 1055
10565. **Handles all event types**: Both state events (kind 30618) and PR events (kind 1617/1618) are processed uniformly.
1057
1160### The Sync Identifier Loop (Main Sync) 1058### The Sync Identifier Loop (Main Sync)
1161 1059
1162```rust 1060```rust
@@ -1198,8 +1096,6 @@ pub async fn sync_identifier<C: SyncContext>(
1198 1096
1199 let needed_oids = ctx.collect_needed_oids(identifier); 1097 let needed_oids = ctx.collect_needed_oids(identifier);
1200 if needed_oids.is_empty() { 1098 if needed_oids.is_empty() {
1201 // Process any remaining satisfiable events
1202 let _ = ctx.process_satisfiable_events(identifier).await;
1203 tracing::info!(identifier = %identifier, "Sync complete - all OIDs available"); 1099 tracing::info!(identifier = %identifier, "Sync complete - all OIDs available");
1204 return true; 1100 return true;
1205 } 1101 }
@@ -1220,7 +1116,6 @@ pub async fn sync_identifier<C: SyncContext>(
1220 1116
1221 let needed_oids = ctx.collect_needed_oids(identifier); 1117 let needed_oids = ctx.collect_needed_oids(identifier);
1222 if needed_oids.is_empty() { 1118 if needed_oids.is_empty() {
1223 let _ = ctx.process_satisfiable_events(identifier).await;
1224 return true; 1119 return true;
1225 } 1120 }
1226 1121
@@ -1558,7 +1453,11 @@ pub trait SyncContext: Send + Sync {
1558 async fn fetch_repository_data(&self, identifier: &str) -> Result<RepositoryData>; 1453 async fn fetch_repository_data(&self, identifier: &str) -> Result<RepositoryData>;
1559 fn collect_needed_oids(&self, identifier: &str) -> HashSet<String>; 1454 fn collect_needed_oids(&self, identifier: &str) -> HashSet<String>;
1560 async fn fetch_oids(&self, repo_path: &Path, url: &str, oids: &[String]) -> Result<Vec<String>>; 1455 async fn fetch_oids(&self, repo_path: &Path, url: &str, oids: &[String]) -> Result<Vec<String>>;
1561 async fn process_satisfiable_events(&self, identifier: &str) -> Result<ProcessResult>; 1456 async fn process_newly_available_git_data(
1457 &self,
1458 source_repo_path: &Path,
1459 new_oids: &HashSet<String>,
1460 ) -> Result<ProcessResult>;
1562 fn has_pending_events(&self, identifier: &str) -> bool; 1461 fn has_pending_events(&self, identifier: &str) -> bool;
1563 fn find_target_repo(&self, data: &RepositoryData) -> Option<PathBuf>; 1462 fn find_target_repo(&self, data: &RepositoryData) -> Option<PathBuf>;
1564 fn our_domain(&self) -> Option<&str>; 1463 fn our_domain(&self) -> Option<&str>;
@@ -1620,7 +1519,7 @@ mod tests {
1620 1519
1621 #[tokio::test] 1520 #[tokio::test]
1622 async fn from_url_fetches_and_processes_on_success() { 1521 async fn from_url_fetches_and_processes_on_success() {
1623 // Successful fetch triggers process_satisfiable_events 1522 // Successful fetch triggers process_newly_available_git_data
1624 } 1523 }
1625} 1524}
1626``` 1525```
@@ -1628,7 +1527,7 @@ mod tests {
1628**Success Criteria**: 1527**Success Criteria**:
1629- [ ] `sync_identifier_next_url` returns non-throttled, untried URL 1528- [ ] `sync_identifier_next_url` returns non-throttled, untried URL
1630- [ ] `sync_identifier_next_url` returns `None` when all URLs tried or throttled 1529- [ ] `sync_identifier_next_url` returns `None` when all URLs tried or throttled
1631- [ ] `sync_identifier_from_url` calls `fetch_oids` and `process_satisfiable_events` 1530- [ ] `sync_identifier_from_url` calls `fetch_oids` and `process_newly_available_git_data`
1632- [ ] All 3 unit tests pass 1531- [ ] All 3 unit tests pass
1633 1532
1634--- 1533---
@@ -1764,7 +1663,7 @@ impl SyncContext for RealSyncContext { ... }
1764**Success Criteria**: 1663**Success Criteria**:
1765- [ ] All `SyncContext` methods implemented 1664- [ ] All `SyncContext` methods implemented
1766- [ ] Connects to real database, git, and relay 1665- [ ] Connects to real database, git, and relay
1767- [ ] `process_satisfiable_events` releases events from purgatory 1666- [ ] `process_newly_available_git_data` releases events from purgatory
1768 1667
1769--- 1668---
1770 1669