diff options
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/explanation/purgatory-sync-redesign.md | 285 | ||||
| -rw-r--r-- | docs/explanation/unify-git-data-sync.md | 481 |
2 files changed, 573 insertions, 193 deletions
diff --git a/docs/explanation/purgatory-sync-redesign.md b/docs/explanation/purgatory-sync-redesign.md index 54e279a..c1e6af6 100644 --- a/docs/explanation/purgatory-sync-redesign.md +++ b/docs/explanation/purgatory-sync-redesign.md | |||
| @@ -29,7 +29,7 @@ Redesign purgatory sync to be **identifier-based** rather than **event-based**, | |||
| 29 | 29 | ||
| 30 | ### Key Design Decision: Where Does OID Copying Happen? | 30 | ### Key Design Decision: Where Does OID Copying Happen? |
| 31 | 31 | ||
| 32 | **Answer: In `process_satisfiable_events`, NOT after the entire sync completes.** | 32 | **Answer: In `process_newly_available_git_data`, NOT after the entire sync completes.** |
| 33 | 33 | ||
| 34 | The current implementation (`sync_state_git_data`) fetches all OIDs first, then at the end: | 34 | The current implementation (`sync_state_git_data`) fetches all OIDs first, then at the end: |
| 35 | 1. Copies OIDs to all authorized owner repos | 35 | 1. Copies OIDs to all authorized owner repos |
| @@ -38,7 +38,7 @@ The current implementation (`sync_state_git_data`) fetches all OIDs first, then | |||
| 38 | 4. Notifies subscribers | 38 | 4. Notifies subscribers |
| 39 | 5. Removes from purgatory | 39 | 5. Removes from purgatory |
| 40 | 40 | ||
| 41 | The redesign moves all of this into `process_satisfiable_events`, which is called after **each successful URL fetch**. This enables: | 41 | The redesign moves all of this into `process_newly_available_git_data`, which is called after **each successful URL fetch**. This enables: |
| 42 | 42 | ||
| 43 | | Aspect | Current (end-of-sync) | Redesign (per-fetch) | | 43 | | Aspect | Current (end-of-sync) | Redesign (per-fetch) | |
| 44 | |--------|----------------------|---------------------| | 44 | |--------|----------------------|---------------------| |
| @@ -55,12 +55,33 @@ Consider syncing an identifier with 3 state events from different maintainers: | |||
| 55 | - State C needs OIDs from `server3.com` (down) | 55 | - State C needs OIDs from `server3.com` (down) |
| 56 | 56 | ||
| 57 | With the redesign: | 57 | With the redesign: |
| 58 | 1. Fetch from `server1.com` succeeds → `process_satisfiable_events` releases State A immediately | 58 | 1. Fetch from `server1.com` succeeds → `process_newly_available_git_data` releases State A immediately |
| 59 | 2. Fetch from `server2.com` succeeds → `process_satisfiable_events` releases State B | 59 | 2. Fetch from `server2.com` succeeds → `process_newly_available_git_data` releases State B |
| 60 | 3. Fetch from `server3.com` fails → State C stays in purgatory, retries with backoff | 60 | 3. Fetch from `server3.com` fails → State C stays in purgatory, retries with backoff |
| 61 | 61 | ||
| 62 | The current implementation would wait for all servers before releasing any events. | 62 | The current implementation would wait for all servers before releasing any events. |
| 63 | 63 | ||
| 64 | ### Unified Processing with Git Push Handler | ||
| 65 | |||
| 66 | **Key insight**: The post-git-data-available processing is identical whether data arrives via: | ||
| 67 | - A successful `git push` (handle_receive_pack) | ||
| 68 | - Purgatory sync fetching OIDs from remote servers | ||
| 69 | |||
| 70 | Both paths need to: | ||
| 71 | 1. Discover satisfiable events from purgatory | ||
| 72 | 2. Sync OIDs to authorized owner repos | ||
| 73 | 3. Align refs (+ set HEAD) | ||
| 74 | 4. Save events to database | ||
| 75 | 5. Notify WebSocket subscribers | ||
| 76 | 6. Remove from purgatory | ||
| 77 | |||
| 78 | Rather than duplicate this logic, we use a single unified function `process_newly_available_git_data` that handles all post-git-data-available processing. See [Unified Git Data Sync](unify-git-data-sync.md) for the complete design. | ||
| 79 | |||
| 80 | This means: | ||
| 81 | - **`handle_receive_pack`** calls `process_newly_available_git_data` after git push succeeds | ||
| 82 | - **`sync_identifier_from_url`** calls `process_newly_available_git_data` after OID fetch succeeds | ||
| 83 | - **Same behavior** regardless of how git data arrived | ||
| 84 | |||
| 64 | ## Architecture | 85 | ## Architecture |
| 65 | 86 | ||
| 66 | ### Overview | 87 | ### Overview |
| @@ -634,24 +655,17 @@ pub trait SyncContext: Send + Sync { | |||
| 634 | /// Fetch OIDs from a remote server | 655 | /// Fetch OIDs from a remote server |
| 635 | async fn fetch_oids(&self, repo_path: &Path, url: &str, oids: &[String]) -> Result<Vec<String>>; | 656 | async fn fetch_oids(&self, repo_path: &Path, url: &str, oids: &[String]) -> Result<Vec<String>>; |
| 636 | 657 | ||
| 637 | /// Process events that can now be satisfied. | 658 | /// Process newly available git data. |
| 659 | /// | ||
| 660 | /// This is a thin wrapper around the unified `process_newly_available_git_data` function. | ||
| 661 | /// Called after each successful OID fetch to check if any purgatory events can now be satisfied. | ||
| 638 | /// | 662 | /// |
| 639 | /// For each purgatory event (state or PR) for this identifier: | 663 | /// See [Unified Git Data Sync](unify-git-data-sync.md) for the complete design. |
| 640 | /// 1. Check if all required OIDs are now available in the source repo | 664 | async fn process_newly_available_git_data( |
| 641 | /// 2. For satisfiable state events: | 665 | &self, |
| 642 | /// a. Check if this state is authorized and should be applied (vs existing states) | 666 | source_repo_path: &Path, |
| 643 | /// b. Copy OIDs to all owner repos that authorize this state author | 667 | new_oids: &HashSet<String>, |
| 644 | /// c. Align refs with state in each authorized repo | 668 | ) -> Result<ProcessResult>; |
| 645 | /// d. Save state event to database | ||
| 646 | /// e. Notify WebSocket subscribers | ||
| 647 | /// f. Remove from purgatory | ||
| 648 | /// 3. For satisfiable PR events: | ||
| 649 | /// a. Copy PR commit to owner repos that share maintainers with tagged owners | ||
| 650 | /// b. Create refs/nostr/<event-id> in each repo | ||
| 651 | /// c. Save PR event to database | ||
| 652 | /// d. Notify WebSocket subscribers | ||
| 653 | /// e. Remove from purgatory | ||
| 654 | async fn process_satisfiable_events(&self, identifier: &str) -> Result<ProcessResult>; | ||
| 655 | 669 | ||
| 656 | /// Check if there are still pending events for this identifier | 670 | /// Check if there are still pending events for this identifier |
| 657 | fn has_pending_events(&self, identifier: &str) -> bool; | 671 | fn has_pending_events(&self, identifier: &str) -> bool; |
| @@ -694,21 +708,20 @@ impl RealSyncContext { | |||
| 694 | impl SyncContext for RealSyncContext { | 708 | impl SyncContext for RealSyncContext { |
| 695 | // ... other methods ... | 709 | // ... other methods ... |
| 696 | 710 | ||
| 697 | async fn process_satisfiable_events(&self, identifier: &str) -> Result<ProcessResult> { | 711 | async fn process_newly_available_git_data( |
| 698 | // Get repository data and find source repo | 712 | &self, |
| 699 | let db_repo_data = fetch_repository_data(&self.database, identifier).await?; | 713 | source_repo_path: &Path, |
| 700 | let source_repo_path = self.find_target_repo(&db_repo_data) | 714 | new_oids: &HashSet<String>, |
| 701 | .ok_or_else(|| anyhow::anyhow!("No target repo found"))?; | 715 | ) -> Result<ProcessResult> { |
| 702 | 716 | // Call the unified function that handles all post-git-data-available processing | |
| 703 | // Call the standalone function with all dependencies | 717 | // This is the same function called by handle_receive_pack after a git push |
| 704 | process_satisfiable_events_impl( | 718 | crate::git::process_newly_available_git_data( |
| 705 | identifier, | 719 | source_repo_path, |
| 706 | &source_repo_path, | 720 | new_oids, |
| 707 | &db_repo_data, | ||
| 708 | &self.git_data_path, | ||
| 709 | &self.database, | 721 | &self.database, |
| 710 | self.local_relay.as_ref(), | 722 | self.local_relay.as_ref(), |
| 711 | &self.purgatory, | 723 | &self.purgatory, |
| 724 | &self.git_data_path, | ||
| 712 | ).await | 725 | ).await |
| 713 | } | 726 | } |
| 714 | 727 | ||
| @@ -716,7 +729,7 @@ impl SyncContext for RealSyncContext { | |||
| 716 | } | 729 | } |
| 717 | ``` | 730 | ``` |
| 718 | 731 | ||
| 719 | **Note**: The `SyncContext` trait abstracts away the dependencies for testability. The real implementation (`RealSyncContext`) holds references to purgatory, database, etc., and the `process_satisfiable_events` method uses them internally. This keeps the sync logic functions (`sync_identifier_next_url`, `sync_identifier_from_url`) clean and testable with mocks. | 732 | **Note**: The `SyncContext` trait abstracts away the dependencies for testability. The real implementation (`RealSyncContext`) holds references to purgatory, database, etc., and the `process_newly_available_git_data` method delegates to the unified function. This keeps the sync logic functions (`sync_identifier_next_url`, `sync_identifier_from_url`) clean and testable with mocks. |
| 720 | 733 | ||
| 721 | ## Core Sync Logic | 734 | ## Core Sync Logic |
| 722 | 735 | ||
| @@ -962,11 +975,12 @@ pub async fn sync_identifier_from_url<C: SyncContext>( | |||
| 962 | 975 | ||
| 963 | // Try to process any events that can now be satisfied | 976 | // Try to process any events that can now be satisfied |
| 964 | if oids_fetched > 0 { | 977 | if oids_fetched > 0 { |
| 965 | if let Err(e) = ctx.process_satisfiable_events(identifier).await { | 978 | let new_oids: HashSet<String> = needed_oids.iter().cloned().collect(); |
| 979 | if let Err(e) = ctx.process_newly_available_git_data(&target_repo, &new_oids).await { | ||
| 966 | tracing::warn!( | 980 | tracing::warn!( |
| 967 | identifier = %identifier, | 981 | identifier = %identifier, |
| 968 | error = %e, | 982 | error = %e, |
| 969 | "Failed to process satisfiable events" | 983 | "Failed to process newly available git data" |
| 970 | ); | 984 | ); |
| 971 | } | 985 | } |
| 972 | } | 986 | } |
| @@ -975,18 +989,27 @@ pub async fn sync_identifier_from_url<C: SyncContext>( | |||
| 975 | } | 989 | } |
| 976 | ``` | 990 | ``` |
| 977 | 991 | ||
| 978 | ### process_satisfiable_events | 992 | ### process_newly_available_git_data (Unified Function) |
| 979 | 993 | ||
| 980 | This is the core function that handles the "release from purgatory" logic. It's called after each successful fetch to check if any purgatory events can now be satisfied with the available git data. | 994 | This is the core function that handles the "release from purgatory" logic. It's called after each successful fetch to check if any purgatory events can now be satisfied with the available git data. |
| 981 | 995 | ||
| 982 | **Key Design Decision**: OID copying and ref alignment happen in `process_satisfiable_events`, NOT after the entire sync completes. This enables: | 996 | **Key Design Decision**: This is a **unified function** shared with the git push handler. Both `handle_receive_pack` (after git push) and `sync_identifier_from_url` (after purgatory sync fetch) call the same function. See [Unified Git Data Sync](unify-git-data-sync.md) for the complete implementation. |
| 997 | |||
| 998 | **Why unify?** | ||
| 983 | 999 | ||
| 984 | 1. **Incremental progress**: Events can be released as soon as their OIDs are available, even if other events for the same identifier still need data | 1000 | The post-git-data-available processing is identical regardless of how data arrived: |
| 985 | 2. **Partial success**: If we fetch OIDs for one state event but not another, the first can be released immediately | 1001 | |
| 986 | 3. **Cleaner separation**: `sync_identifier_from_url` only fetches; `process_satisfiable_events` handles all the "what to do with the data" logic | 1002 | | Step | After git push | After purgatory fetch | |
| 1003 | |------|---------------|----------------------| | ||
| 1004 | | Discover satisfiable events | ✅ Same | ✅ Same | | ||
| 1005 | | Sync OIDs to owner repos | ✅ Same | ✅ Same | | ||
| 1006 | | Align refs (+ set HEAD) | ✅ Same | ✅ Same | | ||
| 1007 | | Save events to database | ✅ Same | ✅ Same | | ||
| 1008 | | Notify WebSocket | ✅ Same | ✅ Same | | ||
| 1009 | | Remove from purgatory | ✅ Same | ✅ Same | | ||
| 987 | 1010 | ||
| 988 | ```rust | 1011 | ```rust |
| 989 | /// Result of processing satisfiable events | 1012 | /// Result of processing newly available git data |
| 990 | #[derive(Debug, Default)] | 1013 | #[derive(Debug, Default)] |
| 991 | pub struct ProcessResult { | 1014 | pub struct ProcessResult { |
| 992 | /// Number of state events released from purgatory | 1015 | /// Number of state events released from purgatory |
| @@ -995,159 +1018,32 @@ pub struct ProcessResult { | |||
| 995 | pub prs_released: usize, | 1018 | pub prs_released: usize, |
| 996 | /// Number of repositories synced (OIDs copied + refs aligned) | 1019 | /// Number of repositories synced (OIDs copied + refs aligned) |
| 997 | pub repos_synced: usize, | 1020 | pub repos_synced: usize, |
| 998 | /// Errors encountered | 1021 | /// Number of refs created/updated/deleted |
| 1022 | pub refs_created: usize, | ||
| 1023 | pub refs_updated: usize, | ||
| 1024 | pub refs_deleted: usize, | ||
| 1025 | /// Errors encountered (non-fatal) | ||
| 999 | pub errors: Vec<String>, | 1026 | pub errors: Vec<String>, |
| 1000 | } | 1027 | } |
| 1001 | 1028 | ||
| 1002 | /// Process purgatory events that can now be satisfied with available git data. | 1029 | /// Unified processing of newly available git data. |
| 1003 | /// | 1030 | /// |
| 1004 | /// This function is called after each successful OID fetch. It: | 1031 | /// Called whenever git data becomes available, whether from: |
| 1005 | /// 1. Iterates through all purgatory events for this identifier | 1032 | /// - A successful `git push` (handle_receive_pack) |
| 1006 | /// 2. For each event, checks if all required OIDs are now available | 1033 | /// - Purgatory sync fetching OIDs from remote servers |
| 1007 | /// 3. For satisfiable events, performs the full "release" workflow | 1034 | /// |
| 1008 | /// | 1035 | /// See unify-git-data-sync.md for complete implementation details. |
| 1009 | /// The release workflow for STATE events: | 1036 | pub async fn process_newly_available_git_data( |
| 1010 | /// 1. Check authorization: is this state author authorized by any owner's maintainer set? | ||
| 1011 | /// 2. Check priority: is this state newer than existing states for those owners? | ||
| 1012 | /// 3. Copy OIDs to all authorized owner repos (using sync_to_owner_repos logic) | ||
| 1013 | /// 4. Align refs with state in each authorized repo | ||
| 1014 | /// 5. Save state event to database | ||
| 1015 | /// 6. Notify WebSocket subscribers | ||
| 1016 | /// 7. Remove from purgatory | ||
| 1017 | /// | ||
| 1018 | /// The release workflow for PR events: | ||
| 1019 | /// 1. Copy PR commit to owner repos that share maintainers with tagged owners | ||
| 1020 | /// 2. Create refs/nostr/<event-id> in each repo | ||
| 1021 | /// 3. Save PR event to database | ||
| 1022 | /// 4. Notify WebSocket subscribers | ||
| 1023 | /// 5. Remove from purgatory | ||
| 1024 | /// | ||
| 1025 | /// Note: This is the implementation function called by RealSyncContext. | ||
| 1026 | /// The SyncContext trait method has a simpler signature because the | ||
| 1027 | /// implementation has access to all dependencies via self. | ||
| 1028 | pub async fn process_satisfiable_events_impl( | ||
| 1029 | identifier: &str, | ||
| 1030 | source_repo_path: &Path, | 1037 | source_repo_path: &Path, |
| 1031 | db_repo_data: &RepositoryData, | 1038 | new_oids: &HashSet<String>, |
| 1032 | git_data_path: &Path, | ||
| 1033 | database: &SharedDatabase, | 1039 | database: &SharedDatabase, |
| 1034 | local_relay: Option<&nostr_relay_builder::LocalRelay>, | 1040 | local_relay: Option<&nostr_relay_builder::LocalRelay>, |
| 1035 | purgatory: &Purgatory, | 1041 | purgatory: &Purgatory, |
| 1036 | ) -> Result<ProcessResult> { | 1042 | git_data_path: &Path, |
| 1037 | let mut result = ProcessResult::default(); | 1043 | ) -> Result<ProcessResult>; |
| 1038 | |||
| 1039 | // Process state events | ||
| 1040 | let state_entries = purgatory.find_state(identifier); | ||
| 1041 | for entry in state_entries { | ||
| 1042 | // Parse the state event | ||
| 1043 | let state = match RepositoryState::from_event(&entry.event) { | ||
| 1044 | Ok(s) => s, | ||
| 1045 | Err(e) => { | ||
| 1046 | tracing::warn!( | ||
| 1047 | event_id = %entry.event.id, | ||
| 1048 | error = %e, | ||
| 1049 | "Failed to parse state event from purgatory" | ||
| 1050 | ); | ||
| 1051 | continue; | ||
| 1052 | } | ||
| 1053 | }; | ||
| 1054 | |||
| 1055 | // Check if all OIDs are available in the source repo | ||
| 1056 | let missing_oids = identify_missing_oids(&state, source_repo_path); | ||
| 1057 | if !missing_oids.is_empty() { | ||
| 1058 | tracing::debug!( | ||
| 1059 | event_id = %entry.event.id, | ||
| 1060 | missing = missing_oids.len(), | ||
| 1061 | "State event still missing OIDs, skipping" | ||
| 1062 | ); | ||
| 1063 | continue; | ||
| 1064 | } | ||
| 1065 | |||
| 1066 | // All OIDs available - proceed with release | ||
| 1067 | tracing::info!( | ||
| 1068 | identifier = %identifier, | ||
| 1069 | event_id = %entry.event.id, | ||
| 1070 | "All OIDs available, releasing state event from purgatory" | ||
| 1071 | ); | ||
| 1072 | |||
| 1073 | // Sync to owner repos (copy OIDs + align refs) | ||
| 1074 | // This handles authorization checks internally | ||
| 1075 | let sync_result = sync_to_owner_repos( | ||
| 1076 | source_repo_path, | ||
| 1077 | &state, | ||
| 1078 | db_repo_data, | ||
| 1079 | git_data_path, | ||
| 1080 | ); | ||
| 1081 | result.repos_synced += sync_result.repos_synced; | ||
| 1082 | |||
| 1083 | if sync_result.repos_synced == 0 { | ||
| 1084 | tracing::warn!( | ||
| 1085 | identifier = %identifier, | ||
| 1086 | event_id = %entry.event.id, | ||
| 1087 | "No repos synced - state author may not be authorized" | ||
| 1088 | ); | ||
| 1089 | // Don't remove from purgatory - maybe authorization will change | ||
| 1090 | continue; | ||
| 1091 | } | ||
| 1092 | |||
| 1093 | // Save to database | ||
| 1094 | match database.save_event(&entry.event).await { | ||
| 1095 | Ok(_) => { | ||
| 1096 | tracing::info!( | ||
| 1097 | identifier = %identifier, | ||
| 1098 | event_id = %entry.event.id, | ||
| 1099 | "Saved state event to database" | ||
| 1100 | ); | ||
| 1101 | |||
| 1102 | // Notify WebSocket subscribers | ||
| 1103 | if let Some(relay) = local_relay { | ||
| 1104 | relay.notify_event(entry.event.clone()); | ||
| 1105 | } | ||
| 1106 | |||
| 1107 | // Remove from purgatory | ||
| 1108 | purgatory.remove_state_event(identifier, &entry.event.id); | ||
| 1109 | result.states_released += 1; | ||
| 1110 | } | ||
| 1111 | Err(e) => { | ||
| 1112 | tracing::warn!( | ||
| 1113 | event_id = %entry.event.id, | ||
| 1114 | error = %e, | ||
| 1115 | "Failed to save state event to database" | ||
| 1116 | ); | ||
| 1117 | result.errors.push(format!("Failed to save state {}: {}", entry.event.id, e)); | ||
| 1118 | } | ||
| 1119 | } | ||
| 1120 | } | ||
| 1121 | |||
| 1122 | // TODO: Process PR events similarly | ||
| 1123 | // For now, PR events are handled separately | ||
| 1124 | |||
| 1125 | Ok(result) | ||
| 1126 | } | ||
| 1127 | |||
| 1128 | /// Identify OIDs in a state that are missing from a repository | ||
| 1129 | fn identify_missing_oids(state: &RepositoryState, repo_path: &Path) -> Vec<String> { | ||
| 1130 | let mut missing = Vec::new(); | ||
| 1131 | |||
| 1132 | for branch in &state.branches { | ||
| 1133 | if !branch.commit.starts_with("ref: ") && !oid_exists(repo_path, &branch.commit) { | ||
| 1134 | missing.push(branch.commit.clone()); | ||
| 1135 | } | ||
| 1136 | } | ||
| 1137 | |||
| 1138 | for tag in &state.tags { | ||
| 1139 | if !tag.commit.starts_with("ref: ") && !oid_exists(repo_path, &tag.commit) { | ||
| 1140 | missing.push(tag.commit.clone()); | ||
| 1141 | } | ||
| 1142 | } | ||
| 1143 | |||
| 1144 | missing | ||
| 1145 | } | ||
| 1146 | ``` | 1044 | ``` |
| 1147 | 1045 | ||
| 1148 | **Why this design?** | 1046 | **Key properties of the unified function:** |
| 1149 | |||
| 1150 | The key insight is that `process_satisfiable_events` is called after *each* successful URL fetch, not just at the end of the sync. This means: | ||
| 1151 | 1047 | ||
| 1152 | 1. **Early release**: If we fetch from `server1.com` and get all OIDs for state event A, we immediately release A even if state event B still needs OIDs from `server2.com` | 1048 | 1. **Early release**: If we fetch from `server1.com` and get all OIDs for state event A, we immediately release A even if state event B still needs OIDs from `server2.com` |
| 1153 | 1049 | ||
| @@ -1157,6 +1053,8 @@ The key insight is that `process_satisfiable_events` is called after *each* succ | |||
| 1157 | 1053 | ||
| 1158 | 4. **Authorization at release time**: We check authorization when releasing, not when adding to purgatory. This handles the case where maintainer sets change while an event is in purgatory. | 1054 | 4. **Authorization at release time**: We check authorization when releasing, not when adding to purgatory. This handles the case where maintainer sets change while an event is in purgatory. |
| 1159 | 1055 | ||
| 1056 | 5. **Handles all event types**: Both state events (kind 30618) and PR events (kind 1617/1618) are processed uniformly. | ||
| 1057 | |||
| 1160 | ### The Sync Identifier Loop (Main Sync) | 1058 | ### The Sync Identifier Loop (Main Sync) |
| 1161 | 1059 | ||
| 1162 | ```rust | 1060 | ```rust |
| @@ -1198,8 +1096,6 @@ pub async fn sync_identifier<C: SyncContext>( | |||
| 1198 | 1096 | ||
| 1199 | let needed_oids = ctx.collect_needed_oids(identifier); | 1097 | let needed_oids = ctx.collect_needed_oids(identifier); |
| 1200 | if needed_oids.is_empty() { | 1098 | if needed_oids.is_empty() { |
| 1201 | // Process any remaining satisfiable events | ||
| 1202 | let _ = ctx.process_satisfiable_events(identifier).await; | ||
| 1203 | tracing::info!(identifier = %identifier, "Sync complete - all OIDs available"); | 1099 | tracing::info!(identifier = %identifier, "Sync complete - all OIDs available"); |
| 1204 | return true; | 1100 | return true; |
| 1205 | } | 1101 | } |
| @@ -1220,7 +1116,6 @@ pub async fn sync_identifier<C: SyncContext>( | |||
| 1220 | 1116 | ||
| 1221 | let needed_oids = ctx.collect_needed_oids(identifier); | 1117 | let needed_oids = ctx.collect_needed_oids(identifier); |
| 1222 | if needed_oids.is_empty() { | 1118 | if needed_oids.is_empty() { |
| 1223 | let _ = ctx.process_satisfiable_events(identifier).await; | ||
| 1224 | return true; | 1119 | return true; |
| 1225 | } | 1120 | } |
| 1226 | 1121 | ||
| @@ -1558,7 +1453,11 @@ pub trait SyncContext: Send + Sync { | |||
| 1558 | async fn fetch_repository_data(&self, identifier: &str) -> Result<RepositoryData>; | 1453 | async fn fetch_repository_data(&self, identifier: &str) -> Result<RepositoryData>; |
| 1559 | fn collect_needed_oids(&self, identifier: &str) -> HashSet<String>; | 1454 | fn collect_needed_oids(&self, identifier: &str) -> HashSet<String>; |
| 1560 | async fn fetch_oids(&self, repo_path: &Path, url: &str, oids: &[String]) -> Result<Vec<String>>; | 1455 | async fn fetch_oids(&self, repo_path: &Path, url: &str, oids: &[String]) -> Result<Vec<String>>; |
| 1561 | async fn process_satisfiable_events(&self, identifier: &str) -> Result<ProcessResult>; | 1456 | async fn process_newly_available_git_data( |
| 1457 | &self, | ||
| 1458 | source_repo_path: &Path, | ||
| 1459 | new_oids: &HashSet<String>, | ||
| 1460 | ) -> Result<ProcessResult>; | ||
| 1562 | fn has_pending_events(&self, identifier: &str) -> bool; | 1461 | fn has_pending_events(&self, identifier: &str) -> bool; |
| 1563 | fn find_target_repo(&self, data: &RepositoryData) -> Option<PathBuf>; | 1462 | fn find_target_repo(&self, data: &RepositoryData) -> Option<PathBuf>; |
| 1564 | fn our_domain(&self) -> Option<&str>; | 1463 | fn our_domain(&self) -> Option<&str>; |
| @@ -1620,7 +1519,7 @@ mod tests { | |||
| 1620 | 1519 | ||
| 1621 | #[tokio::test] | 1520 | #[tokio::test] |
| 1622 | async fn from_url_fetches_and_processes_on_success() { | 1521 | async fn from_url_fetches_and_processes_on_success() { |
| 1623 | // Successful fetch triggers process_satisfiable_events | 1522 | // Successful fetch triggers process_newly_available_git_data |
| 1624 | } | 1523 | } |
| 1625 | } | 1524 | } |
| 1626 | ``` | 1525 | ``` |
| @@ -1628,7 +1527,7 @@ mod tests { | |||
| 1628 | **Success Criteria**: | 1527 | **Success Criteria**: |
| 1629 | - [ ] `sync_identifier_next_url` returns non-throttled, untried URL | 1528 | - [ ] `sync_identifier_next_url` returns non-throttled, untried URL |
| 1630 | - [ ] `sync_identifier_next_url` returns `None` when all URLs tried or throttled | 1529 | - [ ] `sync_identifier_next_url` returns `None` when all URLs tried or throttled |
| 1631 | - [ ] `sync_identifier_from_url` calls `fetch_oids` and `process_satisfiable_events` | 1530 | - [ ] `sync_identifier_from_url` calls `fetch_oids` and `process_newly_available_git_data` |
| 1632 | - [ ] All 3 unit tests pass | 1531 | - [ ] All 3 unit tests pass |
| 1633 | 1532 | ||
| 1634 | --- | 1533 | --- |
| @@ -1764,7 +1663,7 @@ impl SyncContext for RealSyncContext { ... } | |||
| 1764 | **Success Criteria**: | 1663 | **Success Criteria**: |
| 1765 | - [ ] All `SyncContext` methods implemented | 1664 | - [ ] All `SyncContext` methods implemented |
| 1766 | - [ ] Connects to real database, git, and relay | 1665 | - [ ] Connects to real database, git, and relay |
| 1767 | - [ ] `process_satisfiable_events` releases events from purgatory | 1666 | - [ ] `process_newly_available_git_data` releases events from purgatory |
| 1768 | 1667 | ||
| 1769 | --- | 1668 | --- |
| 1770 | 1669 | ||
diff --git a/docs/explanation/unify-git-data-sync.md b/docs/explanation/unify-git-data-sync.md new file mode 100644 index 0000000..fa1f983 --- /dev/null +++ b/docs/explanation/unify-git-data-sync.md | |||
| @@ -0,0 +1,481 @@ | |||
| 1 | # Unified Git Data Sync | ||
| 2 | |||
| 3 | ## Status | ||
| 4 | |||
| 5 | **Proposed** - January 2026 | ||
| 6 | |||
| 7 | ## Context | ||
| 8 | |||
| 9 | Currently, two separate code paths handle "git data is now available" scenarios: | ||
| 10 | |||
| 11 | 1. **`handle_receive_pack`** (src/git/handlers.rs) - After a successful `git push` | ||
| 12 | 2. **`sync_state_git_data`** (src/purgatory/mod.rs) - After purgatory sync fetches OIDs from remote servers | ||
| 13 | |||
| 14 | Both paths perform essentially the same post-processing: | ||
| 15 | |||
| 16 | | Step | `handle_receive_pack` | `sync_state_git_data` | | ||
| 17 | |------|----------------------|----------------------| | ||
| 18 | | Set HEAD | ✅ `try_set_head_if_available()` | ✅ (via `align_repository_with_state`) | | ||
| 19 | | Save events to DB | ✅ `database.save_event()` | ✅ `database.save_event()` | | ||
| 20 | | Remove from purgatory | ✅ `remove_state_event()` / `remove_pr()` | ✅ `remove_state_event()` | | ||
| 21 | | Notify WebSocket | ✅ `relay.notify_event()` | ✅ `relay.notify_event()` | | ||
| 22 | | Sync state to owner repos | ✅ `sync_to_owner_repos()` | ✅ `sync_to_owner_repos()` | | ||
| 23 | | Sync PR refs to owner repos | ✅ `sync_pr_refs_to_tagged_owner_repos()` | ❌ Not implemented | | ||
| 24 | |||
| 25 | This duplication creates maintenance burden and inconsistent behavior (e.g., PR sync missing from purgatory path). | ||
| 26 | |||
| 27 | ## Decision | ||
| 28 | |||
| 29 | Create a single unified function that handles all post-git-data-available processing: | ||
| 30 | |||
| 31 | ```rust | ||
| 32 | pub async fn process_newly_available_git_data( | ||
| 33 | source_repo_path: &Path, | ||
| 34 | new_oids: &HashSet<String>, | ||
| 35 | database: &SharedDatabase, | ||
| 36 | local_relay: Option<&nostr_relay_builder::LocalRelay>, | ||
| 37 | purgatory: &Purgatory, | ||
| 38 | git_data_path: &Path, | ||
| 39 | ) -> ProcessResult | ||
| 40 | ``` | ||
| 41 | |||
| 42 | ### Key Design Principles | ||
| 43 | |||
| 44 | **1. Always discover events from purgatory** | ||
| 45 | |||
| 46 | Rather than accepting pre-authorized events (which may have changed since authorization), the function always scans purgatory to find satisfiable events. This ensures consistency and handles race conditions where events change between authorization and processing. | ||
| 47 | |||
| 48 | **2. Minimal input, maximal output** | ||
| 49 | |||
| 50 | Callers only need to provide: | ||
| 51 | - `source_repo_path` - Where the git data landed | ||
| 52 | - `new_oids` - Which OIDs are now available (for efficient filtering) | ||
| 53 | |||
| 54 | The function handles everything else: finding events, syncing across repos, aligning refs, setting HEAD, saving to database, notifying subscribers, and cleaning up purgatory. | ||
| 55 | |||
| 56 | **3. Process all event types uniformly** | ||
| 57 | |||
| 58 | Both state events (kind 30618) and PR events (kind 1617/1618) are processed in the same flow, ensuring consistent behavior. | ||
| 59 | |||
| 60 | ## Architecture | ||
| 61 | |||
| 62 | ### Flow Overview | ||
| 63 | |||
| 64 | ``` | ||
| 65 | ┌─────────────────────────────────────────────────────────────────────────────────┐ | ||
| 66 | │ Git Data Becomes Available │ | ||
| 67 | │ │ | ||
| 68 | │ ┌─────────────────────┐ ┌─────────────────────┐ │ | ||
| 69 | │ │ handle_receive_pack │ │ purgatory sync │ │ | ||
| 70 | │ │ (push received) │ │ (fetch completed) │ │ | ||
| 71 | │ └──────────┬──────────┘ └──────────┬──────────┘ │ | ||
| 72 | │ │ │ │ | ||
| 73 | │ │ source_repo_path │ source_repo_path │ | ||
| 74 | │ │ new_oids │ new_oids │ | ||
| 75 | │ │ │ │ | ||
| 76 | │ └────────────────┬───────────────────┘ │ | ||
| 77 | │ │ │ | ||
| 78 | │ ▼ │ | ||
| 79 | │ ┌────────────────────────────────────────┐ │ | ||
| 80 | │ │ process_newly_available_git_data() │ │ | ||
| 81 | │ │ │ │ | ||
| 82 | │ │ 1. Extract identifier from path │ │ | ||
| 83 | │ │ 2. Fetch repository data from DB │ │ | ||
| 84 | │ │ 3. Find satisfiable state events │ │ | ||
| 85 | │ │ 4. Find satisfiable PR events │ │ | ||
| 86 | │ │ 5. For each event: │ │ | ||
| 87 | │ │ - Sync OIDs to owner repos │ │ | ||
| 88 | │ │ - Align refs (+ set HEAD) │ │ | ||
| 89 | │ │ - Save to database │ │ | ||
| 90 | │ │ - Notify WebSocket │ │ | ||
| 91 | │ │ - Remove from purgatory │ │ | ||
| 92 | │ └────────────────────────────────────────┘ │ | ||
| 93 | └─────────────────────────────────────────────────────────────────────────────────┘ | ||
| 94 | ``` | ||
| 95 | |||
| 96 | ### Event Discovery | ||
| 97 | |||
| 98 | The function discovers satisfiable events by scanning purgatory: | ||
| 99 | |||
| 100 | **For State Events:** | ||
| 101 | 1. Get all state entries for the identifier from purgatory | ||
| 102 | 2. For each entry, check if ALL required OIDs exist in source repo | ||
| 103 | 3. Quick optimization: skip if none of `new_oids` are in the state's OID set | ||
| 104 | |||
| 105 | **For PR Events:** | ||
| 106 | 1. Get all PR entries for the identifier from purgatory (via secondary index) | ||
| 107 | 2. For each entry with an event, check if the commit OID exists in source repo | ||
| 108 | 3. Quick optimization: skip if commit not in `new_oids` | ||
| 109 | |||
| 110 | ### Sync to Owner Repos | ||
| 111 | |||
| 112 | **For State Events:** | ||
| 113 | |||
| 114 | For each owner whose maintainer set authorizes the state author: | ||
| 115 | 1. Skip if a newer state already exists for that owner | ||
| 116 | 2. Copy missing OIDs from source repo to target repo | ||
| 117 | 3. Align refs (create/update/delete branches and tags) | ||
| 118 | 4. Set HEAD per state announcement | ||
| 119 | |||
| 120 | **For PR Events:** | ||
| 121 | |||
| 122 | For each owner whose maintainer set includes any tagged owner (from `a` tags): | ||
| 123 | 1. Copy commit from source repo to target repo (if missing) | ||
| 124 | 2. Create `refs/nostr/<event-id>` pointing to the commit | ||
| 125 | |||
| 126 | ## Data Structure Changes | ||
| 127 | |||
| 128 | ### PrPurgatoryEntry | ||
| 129 | |||
| 130 | Add `identifier` field for secondary index lookup: | ||
| 131 | |||
| 132 | ```rust | ||
| 133 | #[derive(Debug, Clone)] | ||
| 134 | pub struct PrPurgatoryEntry { | ||
| 135 | /// The nostr PR event, if received (None = git data arrived first) | ||
| 136 | pub event: Option<Event>, | ||
| 137 | |||
| 138 | /// The expected commit SHA from 'c' tag or actual commit pushed | ||
| 139 | pub commit: String, | ||
| 140 | |||
| 141 | /// Repository identifier extracted from 'a' tag (30617:<owner>:<identifier>) | ||
| 142 | /// Used for lookup when git data arrives | ||
| 143 | pub identifier: Option<String>, | ||
| 144 | |||
| 145 | /// When this entry was added to purgatory | ||
| 146 | pub created_at: Instant, | ||
| 147 | |||
| 148 | /// Expiry deadline | ||
| 149 | pub expires_at: Instant, | ||
| 150 | } | ||
| 151 | ``` | ||
| 152 | |||
| 153 | ### Purgatory Secondary Index | ||
| 154 | |||
| 155 | Add index for finding PR events by identifier: | ||
| 156 | |||
| 157 | ```rust | ||
| 158 | pub struct Purgatory { | ||
| 159 | /// State events indexed by repository identifier | ||
| 160 | state_events: Arc<DashMap<String, Vec<StatePurgatoryEntry>>>, | ||
| 161 | |||
| 162 | /// PR events indexed by event ID (hex string) | ||
| 163 | pr_events: Arc<DashMap<String, PrPurgatoryEntry>>, | ||
| 164 | |||
| 165 | /// Secondary index: identifier -> event_ids for PR events | ||
| 166 | pr_events_by_identifier: Arc<DashMap<String, HashSet<String>>>, | ||
| 167 | |||
| 168 | git_data_path: PathBuf, | ||
| 169 | } | ||
| 170 | ``` | ||
| 171 | |||
| 172 | ### New Purgatory Methods | ||
| 173 | |||
| 174 | ```rust | ||
| 175 | impl Purgatory { | ||
| 176 | /// Find all PR events for an identifier | ||
| 177 | pub fn find_prs_for_identifier(&self, identifier: &str) -> Vec<PrPurgatoryEntry>; | ||
| 178 | |||
| 179 | /// Add PR with automatic identifier extraction and indexing | ||
| 180 | pub fn add_pr(&self, event: Event, event_id: String, commit: String); | ||
| 181 | |||
| 182 | /// Add placeholder with optional identifier | ||
| 183 | pub fn add_pr_placeholder(&self, event_id: String, commit: String, identifier: Option<String>); | ||
| 184 | |||
| 185 | /// Remove PR (also cleans up secondary index) | ||
| 186 | pub fn remove_pr(&self, event_id: &str); | ||
| 187 | } | ||
| 188 | ``` | ||
| 189 | |||
| 190 | ## Implementation | ||
| 191 | |||
| 192 | ### Core Function | ||
| 193 | |||
| 194 | ```rust | ||
| 195 | /// Unified processing of newly available git data. | ||
| 196 | /// | ||
| 197 | /// Called whenever git data becomes available, whether from: | ||
| 198 | /// - A successful `git push` (handle_receive_pack) | ||
| 199 | /// - Purgatory sync fetching OIDs from remote servers | ||
| 200 | /// | ||
| 201 | /// # What it does | ||
| 202 | /// | ||
| 203 | /// 1. **Discover satisfiable events**: Scans purgatory for state and PR events | ||
| 204 | /// whose required OIDs are now available in `source_repo_path` | ||
| 205 | /// | ||
| 206 | /// 2. **For each satisfiable STATE event**: | ||
| 207 | /// - Find all owner repos that authorize this state's author | ||
| 208 | /// - Copy OIDs from source repo to each authorized owner repo | ||
| 209 | /// - Align refs (create/update/delete) to match state | ||
| 210 | /// - Set HEAD per state announcement | ||
| 211 | /// - Save event to database | ||
| 212 | /// - Notify WebSocket subscribers | ||
| 213 | /// - Remove from purgatory | ||
| 214 | /// | ||
| 215 | /// 3. **For each satisfiable PR event**: | ||
| 216 | /// - Find all owner repos that list tagged owners as maintainers | ||
| 217 | /// - Copy commit from source repo to each relevant owner repo | ||
| 218 | /// - Create refs/nostr/<event-id> in each repo | ||
| 219 | /// - Save event to database | ||
| 220 | /// - Notify WebSocket subscribers | ||
| 221 | /// - Remove from purgatory | ||
| 222 | pub async fn process_newly_available_git_data( | ||
| 223 | source_repo_path: &Path, | ||
| 224 | new_oids: &HashSet<String>, | ||
| 225 | database: &SharedDatabase, | ||
| 226 | local_relay: Option<&nostr_relay_builder::LocalRelay>, | ||
| 227 | purgatory: &Purgatory, | ||
| 228 | git_data_path: &Path, | ||
| 229 | ) -> ProcessResult { | ||
| 230 | let mut result = ProcessResult::default(); | ||
| 231 | |||
| 232 | // Extract identifier from repo path | ||
| 233 | let identifier = match extract_identifier_from_repo_path(source_repo_path, git_data_path) { | ||
| 234 | Some(id) => id, | ||
| 235 | None => return result, | ||
| 236 | }; | ||
| 237 | |||
| 238 | // Fetch repository data once for all operations | ||
| 239 | let db_repo_data = match fetch_repository_data(database, &identifier).await { | ||
| 240 | Ok(data) => data, | ||
| 241 | Err(e) => { | ||
| 242 | result.errors.push(format!("Failed to fetch repo data: {}", e)); | ||
| 243 | return result; | ||
| 244 | } | ||
| 245 | }; | ||
| 246 | |||
| 247 | // Process satisfiable state events | ||
| 248 | let state_result = process_satisfiable_state_events( | ||
| 249 | source_repo_path, | ||
| 250 | &identifier, | ||
| 251 | new_oids, | ||
| 252 | &db_repo_data, | ||
| 253 | database, | ||
| 254 | local_relay, | ||
| 255 | purgatory, | ||
| 256 | git_data_path, | ||
| 257 | ).await; | ||
| 258 | |||
| 259 | result.merge_state_result(state_result); | ||
| 260 | |||
| 261 | // Process satisfiable PR events | ||
| 262 | let pr_result = process_satisfiable_pr_events( | ||
| 263 | source_repo_path, | ||
| 264 | &identifier, | ||
| 265 | new_oids, | ||
| 266 | &db_repo_data, | ||
| 267 | database, | ||
| 268 | local_relay, | ||
| 269 | purgatory, | ||
| 270 | git_data_path, | ||
| 271 | ).await; | ||
| 272 | |||
| 273 | result.merge_pr_result(pr_result); | ||
| 274 | |||
| 275 | result | ||
| 276 | } | ||
| 277 | ``` | ||
| 278 | |||
| 279 | ### Result Type | ||
| 280 | |||
| 281 | ```rust | ||
| 282 | /// Result of processing newly available git data | ||
| 283 | #[derive(Debug, Default)] | ||
| 284 | pub struct ProcessResult { | ||
| 285 | /// Number of state events released from purgatory | ||
| 286 | pub states_released: usize, | ||
| 287 | /// Number of PR events released from purgatory | ||
| 288 | pub prs_released: usize, | ||
| 289 | /// Number of owner repositories synced | ||
| 290 | pub repos_synced: usize, | ||
| 291 | /// Number of refs created across all repos | ||
| 292 | pub refs_created: usize, | ||
| 293 | /// Number of refs updated across all repos | ||
| 294 | pub refs_updated: usize, | ||
| 295 | /// Number of refs deleted across all repos | ||
| 296 | pub refs_deleted: usize, | ||
| 297 | /// Errors encountered (non-fatal) | ||
| 298 | pub errors: Vec<String>, | ||
| 299 | } | ||
| 300 | ``` | ||
| 301 | |||
| 302 | ### Helper: Extract Identifier from PR Event | ||
| 303 | |||
| 304 | ```rust | ||
| 305 | /// Extract identifier from PR event's `a` tag. | ||
| 306 | /// Format: 30617:<owner_pubkey>:<identifier> | ||
| 307 | fn extract_identifier_from_pr_event(event: &Event) -> Option<String> { | ||
| 308 | event.tags.iter().find_map(|tag| { | ||
| 309 | let tag_vec = tag.clone().to_vec(); | ||
| 310 | if tag_vec.len() >= 2 && tag_vec[0] == "a" && tag_vec[1].starts_with("30617:") { | ||
| 311 | let parts: Vec<&str> = tag_vec[1].split(':').collect(); | ||
| 312 | if parts.len() >= 3 { | ||
| 313 | Some(parts[2].to_string()) | ||
| 314 | } else { | ||
| 315 | None | ||
| 316 | } | ||
| 317 | } else { | ||
| 318 | None | ||
| 319 | } | ||
| 320 | }) | ||
| 321 | } | ||
| 322 | ``` | ||
| 323 | |||
| 324 | ### Helper: Extract Identifier from Repo Path | ||
| 325 | |||
| 326 | ```rust | ||
| 327 | /// Extract identifier from repository path. | ||
| 328 | /// Path format: {git_data_path}/{npub}/{identifier}.git | ||
| 329 | fn extract_identifier_from_repo_path(repo_path: &Path, git_data_path: &Path) -> Option<String> { | ||
| 330 | let relative = repo_path.strip_prefix(git_data_path).ok()?; | ||
| 331 | let components: Vec<_> = relative.components().collect(); | ||
| 332 | |||
| 333 | if components.len() >= 2 { | ||
| 334 | let identifier_with_git = components[1].as_os_str().to_str()?; | ||
| 335 | Some(identifier_with_git.trim_end_matches(".git").to_string()) | ||
| 336 | } else { | ||
| 337 | None | ||
| 338 | } | ||
| 339 | } | ||
| 340 | ``` | ||
| 341 | |||
| 342 | ## Integration | ||
| 343 | |||
| 344 | ### handle_receive_pack (Simplified) | ||
| 345 | |||
| 346 | ```rust | ||
| 347 | // After git receive-pack succeeds: | ||
| 348 | |||
| 349 | // Collect new OIDs from the push | ||
| 350 | let new_oids: HashSet<String> = pushed_refs | ||
| 351 | .iter() | ||
| 352 | .filter(|(_, new_oid, _)| new_oid != "0000000000000000000000000000000000000000") | ||
| 353 | .map(|(_, new_oid, _)| new_oid.clone()) | ||
| 354 | .collect(); | ||
| 355 | |||
| 356 | // Single unified call handles everything | ||
| 357 | let result = process_newly_available_git_data( | ||
| 358 | &repo_path, | ||
| 359 | &new_oids, | ||
| 360 | &database, | ||
| 361 | Some(&relay), | ||
| 362 | &purgatory, | ||
| 363 | Path::new(git_data_path), | ||
| 364 | ).await; | ||
| 365 | |||
| 366 | info!( | ||
| 367 | "Processed push: {} states, {} PRs released, {} repos synced", | ||
| 368 | result.states_released, | ||
| 369 | result.prs_released, | ||
| 370 | result.repos_synced | ||
| 371 | ); | ||
| 372 | ``` | ||
| 373 | |||
| 374 | ### Purgatory Sync (Simplified) | ||
| 375 | |||
| 376 | ```rust | ||
| 377 | // After fetching OIDs from remote: | ||
| 378 | |||
| 379 | let new_oids: HashSet<String> = fetched_oids.into_iter().collect(); | ||
| 380 | |||
| 381 | let result = process_newly_available_git_data( | ||
| 382 | &source_repo_path, | ||
| 383 | &new_oids, | ||
| 384 | &database, | ||
| 385 | local_relay.as_ref(), | ||
| 386 | &purgatory, | ||
| 387 | &git_data_path, | ||
| 388 | ).await; | ||
| 389 | ``` | ||
| 390 | |||
| 391 | ### Integration with Purgatory Sync Redesign | ||
| 392 | |||
| 393 | The purgatory sync redesign (see `purgatory-sync-redesign.md`) uses this unified function in its `sync_identifier_from_url` implementation: | ||
| 394 | |||
| 395 | ```rust | ||
| 396 | pub async fn sync_identifier_from_url<C: SyncContext>( | ||
| 397 | ctx: &C, | ||
| 398 | identifier: &str, | ||
| 399 | url: &str, | ||
| 400 | throttle_manager: &Arc<ThrottleManager>, | ||
| 401 | ) -> usize { | ||
| 402 | // ... fetch OIDs from URL ... | ||
| 403 | |||
| 404 | let fetched_oids = ctx.fetch_oids(&target_repo, url, &needed_oids).await?; | ||
| 405 | |||
| 406 | if !fetched_oids.is_empty() { | ||
| 407 | // Use unified processing | ||
| 408 | let new_oids: HashSet<String> = fetched_oids.into_iter().collect(); | ||
| 409 | |||
| 410 | let result = process_newly_available_git_data( | ||
| 411 | &target_repo, | ||
| 412 | &new_oids, | ||
| 413 | ctx.database(), | ||
| 414 | ctx.local_relay(), | ||
| 415 | ctx.purgatory(), | ||
| 416 | ctx.git_data_path(), | ||
| 417 | ).await; | ||
| 418 | |||
| 419 | // Result already handled purgatory removal, DB saves, etc. | ||
| 420 | } | ||
| 421 | |||
| 422 | fetched_oids.len() | ||
| 423 | } | ||
| 424 | ``` | ||
| 425 | |||
| 426 | The `SyncContext` trait wraps this function in its `process_newly_available_git_data` method for testability. | ||
| 427 | |||
| 428 | ## Benefits | ||
| 429 | |||
| 430 | 1. **Single source of truth** - One function handles all post-git-data processing | ||
| 431 | 2. **Always fresh discovery** - Events discovered from purgatory at processing time | ||
| 432 | 3. **Consistent behavior** - Push and sync paths behave identically | ||
| 433 | 4. **Simpler callers** - Just pass repo_path + new_oids | ||
| 434 | 5. **Complete processing** - Handles all event types, all repo syncing, HEAD, DB, WebSocket, purgatory | ||
| 435 | 6. **PR sync parity** - PR events now synced in purgatory path (was missing) | ||
| 436 | |||
| 437 | ## Code to Remove/Simplify | ||
| 438 | |||
| 439 | After implementing the unified function: | ||
| 440 | |||
| 441 | 1. **Remove**: Most of `sync_state_git_data` in `src/purgatory/mod.rs` | ||
| 442 | 2. **Simplify**: Event handling in `handle_receive_pack` (replace ~100 lines with single call) | ||
| 443 | 3. **Internalize**: `sync_to_owner_repos` and `sync_pr_refs_to_tagged_owner_repos` become internal helpers | ||
| 444 | |||
| 445 | ## Testing Strategy | ||
| 446 | |||
| 447 | ### Unit Tests | ||
| 448 | |||
| 449 | 1. `extract_identifier_from_repo_path` - Various path formats | ||
| 450 | 2. `extract_identifier_from_pr_event` - Various tag formats | ||
| 451 | 3. Event discovery logic with mock purgatory | ||
| 452 | |||
| 453 | ### Integration Tests | ||
| 454 | |||
| 455 | 1. Push triggers processing and releases state event | ||
| 456 | 2. Push triggers processing and releases PR event | ||
| 457 | 3. Purgatory sync triggers processing | ||
| 458 | 4. Multiple events for same identifier processed correctly | ||
| 459 | 5. Cross-repo sync works for both state and PR events | ||
| 460 | |||
| 461 | ## Future Considerations | ||
| 462 | |||
| 463 | ### Batch Processing | ||
| 464 | |||
| 465 | Currently processes events one at a time. Could batch database saves and WebSocket notifications for efficiency with many events. | ||
| 466 | |||
| 467 | ### Partial Failures | ||
| 468 | |||
| 469 | Currently continues on errors and collects them in result. Could add retry logic or transaction semantics if needed. | ||
| 470 | |||
| 471 | ### Metrics | ||
| 472 | |||
| 473 | Add Prometheus metrics for: | ||
| 474 | - Events processed by type (state/PR) | ||
| 475 | - Repos synced per processing call | ||
| 476 | - Processing duration | ||
| 477 | - Errors by type | ||
| 478 | |||
| 479 | ## Related Documents | ||
| 480 | |||
| 481 | - [Purgatory Sync Redesign](purgatory-sync-redesign.md) - Uses this unified function for purgatory sync operations | ||