diff options
| author | DanConwayDev <DanConwayDev@protonmail.com> | 2026-01-23 11:06:12 +0000 |
|---|---|---|
| committer | DanConwayDev <DanConwayDev@protonmail.com> | 2026-01-27 20:37:55 +0000 |
| commit | a5504395c946bdf28b5ad0e0148ff371ca33d4d3 (patch) | |
| tree | 3465b303ef9f0c8fba1269012740710018cd1797 | |
| parent | 7536160c0ab1b64090ba9b5ab8ea6aef4747bb48 (diff) | |
Add Phase 3 migration scripts for categorization and comparison
- 20-categorize.sh: Categorizes git sync status into 4 categories
- 21-compare-relays.sh: Compares prod vs archive to find gaps
- Updated how-to doc with detailed Phase 3 outputs and directory structure
- Tested with Jan 22 data: 231 complete in both, 276 complete in prod but missing from archive
| -rw-r--r-- | docs/how-to/migrate-ngit-relay-to-ngit-grasp.md | 61 | ||||
| -rwxr-xr-x | docs/how-to/migration-scripts/20-categorize.sh | 212 | ||||
| -rwxr-xr-x | docs/how-to/migration-scripts/21-compare-relays.sh | 294 |
3 files changed, 543 insertions, 24 deletions
diff --git a/docs/how-to/migrate-ngit-relay-to-ngit-grasp.md b/docs/how-to/migrate-ngit-relay-to-ngit-grasp.md index d01bbf2..4c3a4ba 100644 --- a/docs/how-to/migrate-ngit-relay-to-ngit-grasp.md +++ b/docs/how-to/migrate-ngit-relay-to-ngit-grasp.md | |||
| @@ -51,7 +51,7 @@ The analysis is split into modular phases for fast iteration. Phases 1-3 and 5 c | |||
| 51 | ``` | 51 | ``` |
| 52 | ┌─────────────────────────────────────────────────────────────────┐ | 52 | ┌─────────────────────────────────────────────────────────────────┐ |
| 53 | │ PHASE 1: Fetch Events (~30s, local) │ | 53 | │ PHASE 1: Fetch Events (~30s, local) │ |
| 54 | │ scripts/migration/01-fetch-events.sh <relay> <output-dir> │ | 54 | │ migration-scripts/01-fetch-events.sh <relay> <output-dir> │ |
| 55 | ├─────────────────────────────────────────────────────────────────┤ | 55 | ├─────────────────────────────────────────────────────────────────┤ |
| 56 | │ Fetches from relay: │ | 56 | │ Fetches from relay: │ |
| 57 | │ - kind 30618 (state events) │ | 57 | │ - kind 30618 (state events) │ |
| @@ -64,7 +64,7 @@ The analysis is split into modular phases for fast iteration. Phases 1-3 and 5 c | |||
| 64 | ↓ | 64 | ↓ |
| 65 | ┌─────────────────────────────────────────────────────────────────┐ | 65 | ┌─────────────────────────────────────────────────────────────────┐ |
| 66 | │ PHASE 2: Git Sync Check (~20 mins, VPS required) │ | 66 | │ PHASE 2: Git Sync Check (~20 mins, VPS required) │ |
| 67 | │ scripts/migration/10-check-git-sync.sh <events> <git-base> <out>│ | 67 | │ migration-scripts/10-check-git-sync.sh <events> <git-base> <out>│ |
| 68 | ├─────────────────────────────────────────────────────────────────┤ | 68 | ├─────────────────────────────────────────────────────────────────┤ |
| 69 | │ For each state event, compares refs to actual git data on disk. │ | 69 | │ For each state event, compares refs to actual git data on disk. │ |
| 70 | │ │ | 70 | │ │ |
| @@ -78,8 +78,8 @@ The analysis is split into modular phases for fast iteration. Phases 1-3 and 5 c | |||
| 78 | ↓ | 78 | ↓ |
| 79 | ┌─────────────────────────────────────────────────────────────────┐ | 79 | ┌─────────────────────────────────────────────────────────────────┐ |
| 80 | │ PHASE 3: Categorize & Compare (fast, local) │ | 80 | │ PHASE 3: Categorize & Compare (fast, local) │ |
| 81 | │ scripts/migration/20-categorize.sh <sync-status> <output-dir> │ | 81 | │ migration-scripts/20-categorize.sh <sync-status> <output-dir> │ |
| 82 | │ scripts/migration/21-compare-relays.sh <prod> <archive> <out> │ | 82 | │ migration-scripts/21-compare-relays.sh <prod> <archive> <out> │ |
| 83 | ├─────────────────────────────────────────────────────────────────┤ | 83 | ├─────────────────────────────────────────────────────────────────┤ |
| 84 | │ 20-categorize.sh applies 4-category logic: │ | 84 | │ 20-categorize.sh applies 4-category logic: │ |
| 85 | │ - cat1: complete match (all refs match) │ | 85 | │ - cat1: complete match (all refs match) │ |
| @@ -87,18 +87,20 @@ The analysis is split into modular phases for fast iteration. Phases 1-3 and 5 c | |||
| 87 | │ - cat3: partial match (some refs match) │ | 87 | │ - cat3: partial match (some refs match) │ |
| 88 | │ - cat4: no match (git exists but refs don't match) │ | 88 | │ - cat4: no match (git exists but refs don't match) │ |
| 89 | │ │ | 89 | │ │ |
| 90 | │ 21-compare-relays.sh finds gaps: │ | 90 | │ 21-compare-relays.sh compares prod vs archive: │ |
| 91 | │ - in prod but not archive │ | 91 | │ - complete-in-both.txt (no action needed) │ |
| 92 | │ - in archive but not prod │ | 92 | │ - complete-prod-missing-archive.txt (needs investigation) │ |
| 93 | │ - different status between relays │ | 93 | │ - complete-prod-incomplete-archive.txt (sync in progress?) │ |
| 94 | │ - incomplete-in-both.txt (git data incomplete) │ | ||
| 95 | │ - in-archive-not-prod.txt (deleted or new) │ | ||
| 94 | │ │ | 96 | │ │ |
| 95 | │ Output: category-{1,2,3,4}.txt, relay-gaps.txt │ | 97 | │ Output: category-{1,2,3,4}.txt, comparison/*.txt, summary.txt │ |
| 96 | └─────────────────────────────────────────────────────────────────┘ | 98 | └─────────────────────────────────────────────────────────────────┘ |
| 97 | ↓ | 99 | ↓ |
| 98 | ┌─────────────────────────────────────────────────────────────────┐ | 100 | ┌─────────────────────────────────────────────────────────────────┐ |
| 99 | │ PHASE 4: Log-Based Categories (VPS required) │ | 101 | │ PHASE 4: Log-Based Categories (VPS required) │ |
| 100 | │ scripts/migration/30-extract-parse-failures.sh <service> <out> │ | 102 | │ migration-scripts/30-extract-parse-failures.sh <service> <out> │ |
| 101 | │ scripts/migration/31-extract-purgatory-expiry.sh <service> <out>│ | 103 | │ migration-scripts/31-extract-purgatory-expiry.sh <service> <out>│ |
| 102 | ├─────────────────────────────────────────────────────────────────┤ | 104 | ├─────────────────────────────────────────────────────────────────┤ |
| 103 | │ Extracts structured log entries from journalctl: │ | 105 | │ Extracts structured log entries from journalctl: │ |
| 104 | │ - Parse failures: [PARSE_FAIL] kind=X event_id=Y reason=Z │ | 106 | │ - Parse failures: [PARSE_FAIL] kind=X event_id=Y reason=Z │ |
| @@ -112,7 +114,7 @@ The analysis is split into modular phases for fast iteration. Phases 1-3 and 5 c | |||
| 112 | ↓ | 114 | ↓ |
| 113 | ┌─────────────────────────────────────────────────────────────────┐ | 115 | ┌─────────────────────────────────────────────────────────────────┐ |
| 114 | │ PHASE 5: Final Classification (fast, local) │ | 116 | │ PHASE 5: Final Classification (fast, local) │ |
| 115 | │ scripts/migration/40-classify-actions.sh <all-inputs> <out> │ | 117 | │ migration-scripts/40-classify-actions.sh <all-inputs> <out> │ |
| 116 | ├─────────────────────────────────────────────────────────────────┤ | 118 | ├─────────────────────────────────────────────────────────────────┤ |
| 117 | │ Combines all data sources to produce final classification: │ | 119 | │ Combines all data sources to produce final classification: │ |
| 118 | │ │ | 120 | │ │ |
| @@ -136,27 +138,38 @@ The analysis is split into modular phases for fast iteration. Phases 1-3 and 5 c | |||
| 136 | work/migration-analysis-YYYYMMDD-HHMM/ | 138 | work/migration-analysis-YYYYMMDD-HHMM/ |
| 137 | ├── prod/ | 139 | ├── prod/ |
| 138 | │ ├── raw/ | 140 | │ ├── raw/ |
| 139 | │ │ ├── state-events.json | 141 | │ │ ├── state-events.json # Phase 1 output |
| 140 | │ │ ├── announcements.json | 142 | │ │ ├── announcements.json # Phase 1 output |
| 141 | │ │ └── deletions.json | 143 | │ │ └── deletions.json # Phase 1 output |
| 142 | │ ├── git-sync-status.tsv | 144 | │ ├── git-sync-status.tsv # Phase 2 output (optional) |
| 143 | │ └── category-{1,2,3,4}.txt | 145 | │ ├── category1-complete-match.txt # Phase 2/3 output |
| 146 | │ ├── category2-empty-blank.txt # Phase 2/3 output | ||
| 147 | │ ├── category3-partial-match.txt # Phase 2/3 output | ||
| 148 | │ └── category4-no-match.txt # Phase 2/3 output | ||
| 144 | ├── archive/ | 149 | ├── archive/ |
| 145 | │ ├── raw/ | 150 | │ ├── raw/ |
| 146 | │ │ ├── state-events.json | 151 | │ │ ├── state-events.json |
| 147 | │ │ ├── announcements.json | 152 | │ │ ├── announcements.json |
| 148 | │ │ └── deletions.json | 153 | │ │ └── deletions.json |
| 149 | │ ├── git-sync-status.tsv | 154 | │ ├── git-sync-status.tsv |
| 150 | │ └── category-{1,2,3,4}.txt | 155 | │ ├── category1-complete-match.txt |
| 156 | │ ├── category2-empty-blank.txt | ||
| 157 | │ ├── category3-partial-match.txt | ||
| 158 | │ └── category4-no-match.txt | ||
| 151 | ├── logs/ | 159 | ├── logs/ |
| 152 | │ ├── parse-failures.txt | 160 | │ ├── parse-failures.txt # Phase 4 output |
| 153 | │ └── purgatory-expired.txt | 161 | │ └── purgatory-expired.txt # Phase 4 output |
| 154 | ├── comparison/ | 162 | ├── comparison/ |
| 155 | │ └── relay-gaps.txt | 163 | │ ├── complete-in-both.txt # Phase 3 output (no action) |
| 164 | │ ├── complete-prod-missing-archive.txt # Phase 3 output (investigate) | ||
| 165 | │ ├── complete-prod-incomplete-archive.txt # Phase 3 output (sync in progress?) | ||
| 166 | │ ├── incomplete-in-both.txt # Phase 3 output (git incomplete) | ||
| 167 | │ ├── in-archive-not-prod.txt # Phase 3 output (deleted/new) | ||
| 168 | │ └── summary.txt # Phase 3 output (human-readable) | ||
| 156 | └── results/ | 169 | └── results/ |
| 157 | ├── no-action-required.txt | 170 | ├── no-action-required.txt # Phase 5 output |
| 158 | ├── action-required.txt | 171 | ├── action-required.txt # Phase 5 output |
| 159 | └── manual-investigation.txt | 172 | └── manual-investigation.txt # Phase 5 output |
| 160 | ``` | 173 | ``` |
| 161 | 174 | ||
| 162 | ## Prerequisites | 175 | ## Prerequisites |
diff --git a/docs/how-to/migration-scripts/20-categorize.sh b/docs/how-to/migration-scripts/20-categorize.sh new file mode 100755 index 0000000..f47eb55 --- /dev/null +++ b/docs/how-to/migration-scripts/20-categorize.sh | |||
| @@ -0,0 +1,212 @@ | |||
| 1 | #!/usr/bin/env bash | ||
| 2 | # | ||
| 3 | # 20-categorize.sh - Categorize git sync status into 4 categories | ||
| 4 | # | ||
| 5 | # PHASE 3a of the ngit-relay to ngit-grasp migration analysis pipeline. | ||
| 6 | # Takes git-sync-status.tsv from Phase 2 and categorizes into 4 files. | ||
| 7 | # | ||
| 8 | # USAGE: | ||
| 9 | # ./20-categorize.sh <git-sync-status.tsv> <output-dir> | ||
| 10 | # | ||
| 11 | # EXAMPLES: | ||
| 12 | # ./20-categorize.sh output/prod/git-sync-status.tsv output/prod | ||
| 13 | # ./20-categorize.sh output/archive/git-sync-status.tsv output/archive | ||
| 14 | # | ||
| 15 | # INPUT FORMAT (git-sync-status.tsv): | ||
| 16 | # Tab-separated values with columns: | ||
| 17 | # repo<TAB>npub<TAB>state_refs<TAB>git_refs<TAB>matches<TAB>reason | ||
| 18 | # | ||
| 19 | # Where reason is optional and can be: no_git_dir, empty_refs, no_state_refs | ||
| 20 | # | ||
| 21 | # OUTPUT: | ||
| 22 | # <output-dir>/category1-complete-match.txt - All refs match perfectly | ||
| 23 | # <output-dir>/category2-empty-blank.txt - No git data available | ||
| 24 | # <output-dir>/category3-partial-match.txt - Some refs match | ||
| 25 | # <output-dir>/category4-no-match.txt - Git exists but refs don't match | ||
| 26 | # | ||
| 27 | # OUTPUT FORMAT: | ||
| 28 | # repo | npub | state_refs=N | git_refs=N | matches=N [| reason=X] | ||
| 29 | # | ||
| 30 | # CATEGORIES: | ||
| 31 | # 1. Complete Match: state_refs == git_refs == matches (all > 0) | ||
| 32 | # 2. Empty/Blank: git_refs == 0 OR reason in (no_git_dir, empty_refs, no_state_refs) | ||
| 33 | # 3. Partial Match: matches > 0 AND matches < state_refs | ||
| 34 | # 4. No Match: git_refs > 0 AND matches == 0 | ||
| 35 | # | ||
| 36 | # PREREQUISITES: | ||
| 37 | # - awk (standard Unix tool) | ||
| 38 | # | ||
| 39 | # RUNTIME: < 1 second (local processing only) | ||
| 40 | # | ||
| 41 | # SEE ALSO: | ||
| 42 | # docs/how-to/migrate-ngit-relay-to-ngit-grasp.md - Full migration guide | ||
| 43 | # 10-check-git-sync.sh - Phase 2 script that produces input for this script | ||
| 44 | # | ||
| 45 | |||
| 46 | set -euo pipefail | ||
| 47 | |||
| 48 | # Colors for output (disabled if not a terminal) | ||
| 49 | if [[ -t 1 ]]; then | ||
| 50 | RED='\033[0;31m' | ||
| 51 | GREEN='\033[0;32m' | ||
| 52 | YELLOW='\033[0;33m' | ||
| 53 | BLUE='\033[0;34m' | ||
| 54 | NC='\033[0m' | ||
| 55 | else | ||
| 56 | RED='' | ||
| 57 | GREEN='' | ||
| 58 | YELLOW='' | ||
| 59 | BLUE='' | ||
| 60 | NC='' | ||
| 61 | fi | ||
| 62 | |||
| 63 | log_info() { | ||
| 64 | echo -e "${BLUE}[INFO]${NC} $*" >&2 | ||
| 65 | } | ||
| 66 | |||
| 67 | log_success() { | ||
| 68 | echo -e "${GREEN}[OK]${NC} $*" >&2 | ||
| 69 | } | ||
| 70 | |||
| 71 | log_warn() { | ||
| 72 | echo -e "${YELLOW}[WARN]${NC} $*" >&2 | ||
| 73 | } | ||
| 74 | |||
| 75 | log_error() { | ||
| 76 | echo -e "${RED}[ERROR]${NC} $*" >&2 | ||
| 77 | } | ||
| 78 | |||
| 79 | usage() { | ||
| 80 | echo "Usage: $0 <git-sync-status.tsv> <output-dir>" | ||
| 81 | echo "" | ||
| 82 | echo "Arguments:" | ||
| 83 | echo " git-sync-status.tsv TSV file from Phase 2 (10-check-git-sync.sh)" | ||
| 84 | echo " output-dir Directory to store categorized output" | ||
| 85 | echo "" | ||
| 86 | echo "Examples:" | ||
| 87 | echo " $0 output/prod/git-sync-status.tsv output/prod" | ||
| 88 | echo " $0 output/archive/git-sync-status.tsv output/archive" | ||
| 89 | echo "" | ||
| 90 | echo "Input format (TSV):" | ||
| 91 | echo " repo<TAB>npub<TAB>state_refs<TAB>git_refs<TAB>matches<TAB>reason" | ||
| 92 | echo "" | ||
| 93 | echo "Output files:" | ||
| 94 | echo " category1-complete-match.txt - All refs match" | ||
| 95 | echo " category2-empty-blank.txt - No git data" | ||
| 96 | echo " category3-partial-match.txt - Some refs match" | ||
| 97 | echo " category4-no-match.txt - Git exists, refs don't match" | ||
| 98 | exit 1 | ||
| 99 | } | ||
| 100 | |||
| 101 | # Main | ||
| 102 | main() { | ||
| 103 | if [[ $# -ne 2 ]]; then | ||
| 104 | usage | ||
| 105 | fi | ||
| 106 | |||
| 107 | local input_file="$1" | ||
| 108 | local output_dir="$2" | ||
| 109 | |||
| 110 | # Validate input file | ||
| 111 | if [[ ! -f "$input_file" ]]; then | ||
| 112 | log_error "Input file not found: $input_file" | ||
| 113 | exit 1 | ||
| 114 | fi | ||
| 115 | |||
| 116 | log_info "Categorizing git sync status" | ||
| 117 | log_info "Input: $input_file" | ||
| 118 | log_info "Output: $output_dir" | ||
| 119 | |||
| 120 | # Create output directory | ||
| 121 | mkdir -p "$output_dir" | ||
| 122 | |||
| 123 | # Output files | ||
| 124 | local cat1="$output_dir/category1-complete-match.txt" | ||
| 125 | local cat2="$output_dir/category2-empty-blank.txt" | ||
| 126 | local cat3="$output_dir/category3-partial-match.txt" | ||
| 127 | local cat4="$output_dir/category4-no-match.txt" | ||
| 128 | |||
| 129 | # Clear previous results | ||
| 130 | > "$cat1" | ||
| 131 | > "$cat2" | ||
| 132 | > "$cat3" | ||
| 133 | > "$cat4" | ||
| 134 | |||
| 135 | # Process input file with awk | ||
| 136 | # Input: repo<TAB>npub<TAB>state_refs<TAB>git_refs<TAB>matches<TAB>reason | ||
| 137 | awk -F'\t' -v cat1="$cat1" -v cat2="$cat2" -v cat3="$cat3" -v cat4="$cat4" ' | ||
| 138 | BEGIN { | ||
| 139 | count1 = 0; count2 = 0; count3 = 0; count4 = 0 | ||
| 140 | } | ||
| 141 | NR == 1 && /^repo/ { next } # Skip header if present | ||
| 142 | NF >= 5 { | ||
| 143 | repo = $1 | ||
| 144 | npub = $2 | ||
| 145 | state_refs = int($3) | ||
| 146 | git_refs = int($4) | ||
| 147 | matches = int($5) | ||
| 148 | reason = (NF >= 6) ? $6 : "" | ||
| 149 | |||
| 150 | # Format output line | ||
| 151 | if (reason != "") { | ||
| 152 | line = repo " | " npub " | state_refs=" state_refs " | git_refs=" git_refs " | matches=" matches " | reason=" reason | ||
| 153 | } else { | ||
| 154 | line = repo " | " npub " | state_refs=" state_refs " | git_refs=" git_refs " | matches=" matches | ||
| 155 | } | ||
| 156 | |||
| 157 | # Categorize | ||
| 158 | if (reason == "no_git_dir" || reason == "empty_refs" || reason == "no_state_refs" || git_refs == 0) { | ||
| 159 | # Category 2: Empty/Blank | ||
| 160 | print line >> cat2 | ||
| 161 | count2++ | ||
| 162 | } else if (state_refs > 0 && state_refs == git_refs && matches == state_refs) { | ||
| 163 | # Category 1: Complete Match | ||
| 164 | print line >> cat1 | ||
| 165 | count1++ | ||
| 166 | } else if (matches > 0 && matches < state_refs) { | ||
| 167 | # Category 3: Partial Match | ||
| 168 | print line >> cat3 | ||
| 169 | count3++ | ||
| 170 | } else if (git_refs > 0 && matches == 0) { | ||
| 171 | # Category 4: No Match | ||
| 172 | print line >> cat4 | ||
| 173 | count4++ | ||
| 174 | } else if (matches > 0) { | ||
| 175 | # Edge case: matches > 0 but does not fit other categories | ||
| 176 | # This can happen when git_refs > state_refs but all state refs match | ||
| 177 | # Treat as partial match | ||
| 178 | print line >> cat3 | ||
| 179 | count3++ | ||
| 180 | } else { | ||
| 181 | # Fallback: treat as category 2 (empty/blank) | ||
| 182 | print line >> cat2 | ||
| 183 | count2++ | ||
| 184 | } | ||
| 185 | } | ||
| 186 | END { | ||
| 187 | total = count1 + count2 + count3 + count4 | ||
| 188 | print "COUNTS:" count1 ":" count2 ":" count3 ":" count4 ":" total | ||
| 189 | } | ||
| 190 | ' "$input_file" 2>&1 | while IFS= read -r line; do | ||
| 191 | if [[ "$line" =~ ^COUNTS: ]]; then | ||
| 192 | # Parse counts from awk output | ||
| 193 | IFS=':' read -r _ c1 c2 c3 c4 total <<< "$line" | ||
| 194 | |||
| 195 | echo "" | ||
| 196 | log_info "=== Categorization Summary ===" | ||
| 197 | log_info "Total entries: $total" | ||
| 198 | log_success "Category 1 (Complete Match): $c1" | ||
| 199 | log_warn "Category 2 (Empty/Blank): $c2" | ||
| 200 | log_warn "Category 3 (Partial Match): $c3" | ||
| 201 | log_error "Category 4 (No Match): $c4" | ||
| 202 | echo "" | ||
| 203 | log_info "Output files:" | ||
| 204 | echo " $cat1" | ||
| 205 | echo " $cat2" | ||
| 206 | echo " $cat3" | ||
| 207 | echo " $cat4" | ||
| 208 | fi | ||
| 209 | done | ||
| 210 | } | ||
| 211 | |||
| 212 | main "$@" | ||
diff --git a/docs/how-to/migration-scripts/21-compare-relays.sh b/docs/how-to/migration-scripts/21-compare-relays.sh new file mode 100755 index 0000000..6b40dc8 --- /dev/null +++ b/docs/how-to/migration-scripts/21-compare-relays.sh | |||
| @@ -0,0 +1,294 @@ | |||
| 1 | #!/usr/bin/env bash | ||
| 2 | # | ||
| 3 | # 21-compare-relays.sh - Compare prod vs archive category files to find gaps | ||
| 4 | # | ||
| 5 | # PHASE 3b of the ngit-relay to ngit-grasp migration analysis pipeline. | ||
| 6 | # Compares categorized output from prod and archive to identify: | ||
| 7 | # - Repos complete in prod but missing/incomplete in archive | ||
| 8 | # - Repos in archive but not in prod | ||
| 9 | # - Status differences between relays | ||
| 10 | # | ||
| 11 | # USAGE: | ||
| 12 | # ./21-compare-relays.sh <prod-dir> <archive-dir> <output-dir> | ||
| 13 | # | ||
| 14 | # EXAMPLES: | ||
| 15 | # ./21-compare-relays.sh output/prod output/archive output/comparison | ||
| 16 | # | ||
| 17 | # INPUT: | ||
| 18 | # Both prod-dir and archive-dir must contain: | ||
| 19 | # - category1-complete-match.txt | ||
| 20 | # - category2-empty-blank.txt | ||
| 21 | # - category3-partial-match.txt | ||
| 22 | # - category4-no-match.txt | ||
| 23 | # | ||
| 24 | # OUTPUT: | ||
| 25 | # <output-dir>/complete-in-both.txt - Repos complete in both relays (no action) | ||
| 26 | # <output-dir>/complete-prod-missing-archive.txt - Complete in prod, not in archive cat1 | ||
| 27 | # <output-dir>/complete-prod-incomplete-archive.txt - Complete in prod, incomplete in archive | ||
| 28 | # <output-dir>/incomplete-in-both.txt - Incomplete in both relays | ||
| 29 | # <output-dir>/in-archive-not-prod.txt - In archive but not in prod | ||
| 30 | # <output-dir>/summary.txt - Human-readable summary | ||
| 31 | # | ||
| 32 | # OUTPUT FORMAT: | ||
| 33 | # Each file contains lines in the format: | ||
| 34 | # repo | npub | prod_status | archive_status | ||
| 35 | # | ||
| 36 | # PREREQUISITES: | ||
| 37 | # - awk, sort, comm (standard Unix tools) | ||
| 38 | # | ||
| 39 | # RUNTIME: < 1 second (local processing only) | ||
| 40 | # | ||
| 41 | # SEE ALSO: | ||
| 42 | # docs/how-to/migrate-ngit-relay-to-ngit-grasp.md - Full migration guide | ||
| 43 | # 20-categorize.sh - Phase 3a script that produces input for this script | ||
| 44 | # | ||
| 45 | |||
| 46 | set -euo pipefail | ||
| 47 | |||
| 48 | # Colors for output (disabled if not a terminal) | ||
| 49 | if [[ -t 1 ]]; then | ||
| 50 | RED='\033[0;31m' | ||
| 51 | GREEN='\033[0;32m' | ||
| 52 | YELLOW='\033[0;33m' | ||
| 53 | BLUE='\033[0;34m' | ||
| 54 | NC='\033[0m' | ||
| 55 | else | ||
| 56 | RED='' | ||
| 57 | GREEN='' | ||
| 58 | YELLOW='' | ||
| 59 | BLUE='' | ||
| 60 | NC='' | ||
| 61 | fi | ||
| 62 | |||
| 63 | log_info() { | ||
| 64 | echo -e "${BLUE}[INFO]${NC} $*" >&2 | ||
| 65 | } | ||
| 66 | |||
| 67 | log_success() { | ||
| 68 | echo -e "${GREEN}[OK]${NC} $*" >&2 | ||
| 69 | } | ||
| 70 | |||
| 71 | log_warn() { | ||
| 72 | echo -e "${YELLOW}[WARN]${NC} $*" >&2 | ||
| 73 | } | ||
| 74 | |||
| 75 | log_error() { | ||
| 76 | echo -e "${RED}[ERROR]${NC} $*" >&2 | ||
| 77 | } | ||
| 78 | |||
| 79 | usage() { | ||
| 80 | echo "Usage: $0 <prod-dir> <archive-dir> <output-dir>" | ||
| 81 | echo "" | ||
| 82 | echo "Arguments:" | ||
| 83 | echo " prod-dir Directory containing prod category files" | ||
| 84 | echo " archive-dir Directory containing archive category files" | ||
| 85 | echo " output-dir Directory to store comparison results" | ||
| 86 | echo "" | ||
| 87 | echo "Examples:" | ||
| 88 | echo " $0 output/prod output/archive output/comparison" | ||
| 89 | echo "" | ||
| 90 | echo "Required input files in each directory:" | ||
| 91 | echo " category1-complete-match.txt" | ||
| 92 | echo " category2-empty-blank.txt" | ||
| 93 | echo " category3-partial-match.txt" | ||
| 94 | echo " category4-no-match.txt" | ||
| 95 | exit 1 | ||
| 96 | } | ||
| 97 | |||
| 98 | # Extract repo|npub key from category line | ||
| 99 | # Input: "repo | npub | state_refs=N | ..." | ||
| 100 | # Output: "repo|npub" | ||
| 101 | extract_key() { | ||
| 102 | awk -F' \\| ' '{print $1 "|" $2}' | ||
| 103 | } | ||
| 104 | |||
| 105 | # Build lookup table from category files | ||
| 106 | # Args: $1=directory, $2=output_file | ||
| 107 | build_lookup() { | ||
| 108 | local dir="$1" | ||
| 109 | local output="$2" | ||
| 110 | |||
| 111 | # Process all 4 category files | ||
| 112 | for cat in 1 2 3 4; do | ||
| 113 | local file="$dir/category${cat}-*.txt" | ||
| 114 | # shellcheck disable=SC2086 | ||
| 115 | if ls $file 1>/dev/null 2>&1; then | ||
| 116 | # shellcheck disable=SC2086 | ||
| 117 | cat $file | while IFS= read -r line; do | ||
| 118 | key=$(echo "$line" | extract_key) | ||
| 119 | echo "${key}|cat${cat}|${line}" | ||
| 120 | done | ||
| 121 | fi | ||
| 122 | done | sort -t'|' -k1,2 > "$output" | ||
| 123 | } | ||
| 124 | |||
| 125 | # Main | ||
| 126 | main() { | ||
| 127 | if [[ $# -ne 3 ]]; then | ||
| 128 | usage | ||
| 129 | fi | ||
| 130 | |||
| 131 | local prod_dir="$1" | ||
| 132 | local archive_dir="$2" | ||
| 133 | local output_dir="$3" | ||
| 134 | |||
| 135 | # Validate input directories | ||
| 136 | for dir in "$prod_dir" "$archive_dir"; do | ||
| 137 | if [[ ! -d "$dir" ]]; then | ||
| 138 | log_error "Directory not found: $dir" | ||
| 139 | exit 1 | ||
| 140 | fi | ||
| 141 | if [[ ! -f "$dir/category1-complete-match.txt" ]]; then | ||
| 142 | log_error "Missing category1-complete-match.txt in $dir" | ||
| 143 | exit 1 | ||
| 144 | fi | ||
| 145 | done | ||
| 146 | |||
| 147 | log_info "Comparing relay categories" | ||
| 148 | log_info "Prod: $prod_dir" | ||
| 149 | log_info "Archive: $archive_dir" | ||
| 150 | log_info "Output: $output_dir" | ||
| 151 | |||
| 152 | # Create output directory | ||
| 153 | mkdir -p "$output_dir" | ||
| 154 | |||
| 155 | # Create temp files for processing | ||
| 156 | local tmp_dir | ||
| 157 | tmp_dir=$(mktemp -d) | ||
| 158 | # shellcheck disable=SC2064 | ||
| 159 | trap "rm -rf '$tmp_dir'" EXIT | ||
| 160 | |||
| 161 | log_info "Building lookup tables..." | ||
| 162 | |||
| 163 | # Build lookup tables: key|category|full_line | ||
| 164 | build_lookup "$prod_dir" "$tmp_dir/prod_lookup.txt" | ||
| 165 | build_lookup "$archive_dir" "$tmp_dir/archive_lookup.txt" | ||
| 166 | |||
| 167 | # Extract just keys for comparison | ||
| 168 | cut -d'|' -f1,2 "$tmp_dir/prod_lookup.txt" | sort -u > "$tmp_dir/prod_keys.txt" | ||
| 169 | cut -d'|' -f1,2 "$tmp_dir/archive_lookup.txt" | sort -u > "$tmp_dir/archive_keys.txt" | ||
| 170 | |||
| 171 | log_info "Comparing categories..." | ||
| 172 | |||
| 173 | # Initialize output files | ||
| 174 | > "$output_dir/complete-in-both.txt" | ||
| 175 | > "$output_dir/complete-prod-missing-archive.txt" | ||
| 176 | > "$output_dir/complete-prod-incomplete-archive.txt" | ||
| 177 | > "$output_dir/incomplete-in-both.txt" | ||
| 178 | > "$output_dir/in-archive-not-prod.txt" | ||
| 179 | |||
| 180 | # Process prod category 1 (complete) entries | ||
| 181 | while IFS='|' read -r repo npub cat full_line; do | ||
| 182 | key="${repo}|${npub}" | ||
| 183 | |||
| 184 | # Look up in archive | ||
| 185 | archive_entry=$(grep "^${key}|" "$tmp_dir/archive_lookup.txt" 2>/dev/null | head -1 || echo "") | ||
| 186 | |||
| 187 | if [[ -z "$archive_entry" ]]; then | ||
| 188 | # Not in archive at all | ||
| 189 | echo "$repo | $npub | prod=complete | archive=missing" >> "$output_dir/complete-prod-missing-archive.txt" | ||
| 190 | else | ||
| 191 | archive_cat=$(echo "$archive_entry" | cut -d'|' -f3) | ||
| 192 | if [[ "$archive_cat" == "cat1" ]]; then | ||
| 193 | # Complete in both | ||
| 194 | echo "$repo | $npub | prod=complete | archive=complete" >> "$output_dir/complete-in-both.txt" | ||
| 195 | else | ||
| 196 | # Complete in prod, incomplete in archive | ||
| 197 | echo "$repo | $npub | prod=complete | archive=$archive_cat" >> "$output_dir/complete-prod-incomplete-archive.txt" | ||
| 198 | fi | ||
| 199 | fi | ||
| 200 | done < <(grep '|cat1|' "$tmp_dir/prod_lookup.txt" | sed 's/|cat1|/|cat1|/') | ||
| 201 | |||
| 202 | # Process prod categories 2-4 (incomplete) entries | ||
| 203 | for cat in cat2 cat3 cat4; do | ||
| 204 | while IFS='|' read -r repo npub _ full_line; do | ||
| 205 | key="${repo}|${npub}" | ||
| 206 | |||
| 207 | # Look up in archive | ||
| 208 | archive_entry=$(grep "^${key}|" "$tmp_dir/archive_lookup.txt" 2>/dev/null | head -1 || echo "") | ||
| 209 | |||
| 210 | if [[ -z "$archive_entry" ]]; then | ||
| 211 | # Incomplete in prod, missing in archive | ||
| 212 | echo "$repo | $npub | prod=$cat | archive=missing" >> "$output_dir/incomplete-in-both.txt" | ||
| 213 | else | ||
| 214 | archive_cat=$(echo "$archive_entry" | cut -d'|' -f3) | ||
| 215 | if [[ "$archive_cat" != "cat1" ]]; then | ||
| 216 | # Incomplete in both | ||
| 217 | echo "$repo | $npub | prod=$cat | archive=$archive_cat" >> "$output_dir/incomplete-in-both.txt" | ||
| 218 | fi | ||
| 219 | # If archive is complete but prod is not, that's unusual but not an error | ||
| 220 | fi | ||
| 221 | done < <(grep "|${cat}|" "$tmp_dir/prod_lookup.txt") | ||
| 222 | done | ||
| 223 | |||
| 224 | # Find entries in archive but not in prod | ||
| 225 | comm -23 "$tmp_dir/archive_keys.txt" "$tmp_dir/prod_keys.txt" | while IFS='|' read -r repo npub; do | ||
| 226 | key="${repo}|${npub}" | ||
| 227 | archive_entry=$(grep "^${key}|" "$tmp_dir/archive_lookup.txt" 2>/dev/null | head -1 || echo "") | ||
| 228 | archive_cat=$(echo "$archive_entry" | cut -d'|' -f3) | ||
| 229 | echo "$repo | $npub | prod=missing | archive=$archive_cat" >> "$output_dir/in-archive-not-prod.txt" | ||
| 230 | done | ||
| 231 | |||
| 232 | # Count results | ||
| 233 | local count_both count_missing count_incomplete count_both_incomplete count_archive_only | ||
| 234 | count_both=$(wc -l < "$output_dir/complete-in-both.txt" | tr -d ' ') | ||
| 235 | count_missing=$(wc -l < "$output_dir/complete-prod-missing-archive.txt" | tr -d ' ') | ||
| 236 | count_incomplete=$(wc -l < "$output_dir/complete-prod-incomplete-archive.txt" | tr -d ' ') | ||
| 237 | count_both_incomplete=$(wc -l < "$output_dir/incomplete-in-both.txt" | tr -d ' ') | ||
| 238 | count_archive_only=$(wc -l < "$output_dir/in-archive-not-prod.txt" | tr -d ' ') | ||
| 239 | |||
| 240 | # Generate summary | ||
| 241 | cat > "$output_dir/summary.txt" << EOF | ||
| 242 | # Relay Comparison Summary | ||
| 243 | Generated: $(date -Iseconds) | ||
| 244 | |||
| 245 | ## Input | ||
| 246 | - Prod: $prod_dir | ||
| 247 | - Archive: $archive_dir | ||
| 248 | |||
| 249 | ## Results | ||
| 250 | |||
| 251 | ### No Action Required | ||
| 252 | - Complete in both relays: $count_both | ||
| 253 | |||
| 254 | ### Action/Decision Required | ||
| 255 | - Complete in prod, MISSING from archive: $count_missing | ||
| 256 | - Complete in prod, INCOMPLETE in archive: $count_incomplete | ||
| 257 | - Incomplete in BOTH relays: $count_both_incomplete | ||
| 258 | |||
| 259 | ### For Reference | ||
| 260 | - In archive but not in prod: $count_archive_only | ||
| 261 | |||
| 262 | ## Files | ||
| 263 | - complete-in-both.txt: Repos successfully migrated (no action) | ||
| 264 | - complete-prod-missing-archive.txt: Need investigation - why not in archive? | ||
| 265 | - complete-prod-incomplete-archive.txt: Archive sync may still be in progress | ||
| 266 | - incomplete-in-both.txt: Git data incomplete on both relays | ||
| 267 | - in-archive-not-prod.txt: May be deleted from prod or new to archive | ||
| 268 | |||
| 269 | ## Next Steps | ||
| 270 | 1. Review complete-prod-missing-archive.txt - these repos need attention | ||
| 271 | 2. Check if archive sync is still running for incomplete entries | ||
| 272 | 3. Cross-reference with deletion events (kind 5) from Phase 1 | ||
| 273 | 4. Use Phase 4 logs to understand parse failures and purgatory expiry | ||
| 274 | EOF | ||
| 275 | |||
| 276 | # Display summary | ||
| 277 | echo "" | ||
| 278 | log_info "=== Comparison Summary ===" | ||
| 279 | log_success "Complete in both: $count_both (no action needed)" | ||
| 280 | log_error "Complete in prod, MISSING from archive: $count_missing" | ||
| 281 | log_warn "Complete in prod, incomplete in archive: $count_incomplete" | ||
| 282 | log_warn "Incomplete in both: $count_both_incomplete" | ||
| 283 | log_info "In archive only: $count_archive_only" | ||
| 284 | echo "" | ||
| 285 | log_info "Output files:" | ||
| 286 | echo " $output_dir/complete-in-both.txt" | ||
| 287 | echo " $output_dir/complete-prod-missing-archive.txt" | ||
| 288 | echo " $output_dir/complete-prod-incomplete-archive.txt" | ||
| 289 | echo " $output_dir/incomplete-in-both.txt" | ||
| 290 | echo " $output_dir/in-archive-not-prod.txt" | ||
| 291 | echo " $output_dir/summary.txt" | ||
| 292 | } | ||
| 293 | |||
| 294 | main "$@" | ||