upleb.uk

Public git repos — served from a NIP-34 GRASP relay at git.upleb.uk

summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorDanConwayDev <DanConwayDev@protonmail.com>2026-01-23 11:06:12 +0000
committerDanConwayDev <DanConwayDev@protonmail.com>2026-01-27 20:37:55 +0000
commita5504395c946bdf28b5ad0e0148ff371ca33d4d3 (patch)
tree3465b303ef9f0c8fba1269012740710018cd1797 /docs
parent7536160c0ab1b64090ba9b5ab8ea6aef4747bb48 (diff)
Add Phase 3 migration scripts for categorization and comparison
- 20-categorize.sh: Categorizes git sync status into 4 categories - 21-compare-relays.sh: Compares prod vs archive to find gaps - Updated how-to doc with detailed Phase 3 outputs and directory structure - Tested with Jan 22 data: 231 complete in both, 276 complete in prod but missing from archive
Diffstat (limited to 'docs')
-rw-r--r--docs/how-to/migrate-ngit-relay-to-ngit-grasp.md61
-rwxr-xr-xdocs/how-to/migration-scripts/20-categorize.sh212
-rwxr-xr-xdocs/how-to/migration-scripts/21-compare-relays.sh294
3 files changed, 543 insertions, 24 deletions
diff --git a/docs/how-to/migrate-ngit-relay-to-ngit-grasp.md b/docs/how-to/migrate-ngit-relay-to-ngit-grasp.md
index d01bbf2..4c3a4ba 100644
--- a/docs/how-to/migrate-ngit-relay-to-ngit-grasp.md
+++ b/docs/how-to/migrate-ngit-relay-to-ngit-grasp.md
@@ -51,7 +51,7 @@ The analysis is split into modular phases for fast iteration. Phases 1-3 and 5 c
51``` 51```
52┌─────────────────────────────────────────────────────────────────┐ 52┌─────────────────────────────────────────────────────────────────┐
53│ PHASE 1: Fetch Events (~30s, local) │ 53│ PHASE 1: Fetch Events (~30s, local) │
54scripts/migration/01-fetch-events.sh <relay> <output-dir> │ 54│ migration-scripts/01-fetch-events.sh <relay> <output-dir> │
55├─────────────────────────────────────────────────────────────────┤ 55├─────────────────────────────────────────────────────────────────┤
56│ Fetches from relay: │ 56│ Fetches from relay: │
57│ - kind 30618 (state events) │ 57│ - kind 30618 (state events) │
@@ -64,7 +64,7 @@ The analysis is split into modular phases for fast iteration. Phases 1-3 and 5 c
64 64
65┌─────────────────────────────────────────────────────────────────┐ 65┌─────────────────────────────────────────────────────────────────┐
66│ PHASE 2: Git Sync Check (~20 mins, VPS required) │ 66│ PHASE 2: Git Sync Check (~20 mins, VPS required) │
67scripts/migration/10-check-git-sync.sh <events> <git-base> <out>│ 67│ migration-scripts/10-check-git-sync.sh <events> <git-base> <out>│
68├─────────────────────────────────────────────────────────────────┤ 68├─────────────────────────────────────────────────────────────────┤
69│ For each state event, compares refs to actual git data on disk. │ 69│ For each state event, compares refs to actual git data on disk. │
70│ │ 70│ │
@@ -78,8 +78,8 @@ The analysis is split into modular phases for fast iteration. Phases 1-3 and 5 c
78 78
79┌─────────────────────────────────────────────────────────────────┐ 79┌─────────────────────────────────────────────────────────────────┐
80│ PHASE 3: Categorize & Compare (fast, local) │ 80│ PHASE 3: Categorize & Compare (fast, local) │
81scripts/migration/20-categorize.sh <sync-status> <output-dir> │ 81│ migration-scripts/20-categorize.sh <sync-status> <output-dir> │
82scripts/migration/21-compare-relays.sh <prod> <archive> <out> │ 82│ migration-scripts/21-compare-relays.sh <prod> <archive> <out> │
83├─────────────────────────────────────────────────────────────────┤ 83├─────────────────────────────────────────────────────────────────┤
84│ 20-categorize.sh applies 4-category logic: │ 84│ 20-categorize.sh applies 4-category logic: │
85│ - cat1: complete match (all refs match) │ 85│ - cat1: complete match (all refs match) │
@@ -87,18 +87,20 @@ The analysis is split into modular phases for fast iteration. Phases 1-3 and 5 c
87│ - cat3: partial match (some refs match) │ 87│ - cat3: partial match (some refs match) │
88│ - cat4: no match (git exists but refs don't match) │ 88│ - cat4: no match (git exists but refs don't match) │
89│ │ 89│ │
90│ 21-compare-relays.sh finds gaps: │ 90│ 21-compare-relays.sh compares prod vs archive: │
91│ - in prod but not archive │ 91│ - complete-in-both.txt (no action needed) │
92│ - in archive but not prod │ 92│ - complete-prod-missing-archive.txt (needs investigation) │
93│ - different status between relays │ 93│ - complete-prod-incomplete-archive.txt (sync in progress?) │
94│ - incomplete-in-both.txt (git data incomplete) │
95│ - in-archive-not-prod.txt (deleted or new) │
94│ │ 96│ │
95│ Output: category-{1,2,3,4}.txt, relay-gaps.txt 97│ Output: category-{1,2,3,4}.txt, comparison/*.txt, summary.txt
96└─────────────────────────────────────────────────────────────────┘ 98└─────────────────────────────────────────────────────────────────┘
97 99
98┌─────────────────────────────────────────────────────────────────┐ 100┌─────────────────────────────────────────────────────────────────┐
99│ PHASE 4: Log-Based Categories (VPS required) │ 101│ PHASE 4: Log-Based Categories (VPS required) │
100scripts/migration/30-extract-parse-failures.sh <service> <out> │ 102│ migration-scripts/30-extract-parse-failures.sh <service> <out> │
101scripts/migration/31-extract-purgatory-expiry.sh <service> <out>│ 103│ migration-scripts/31-extract-purgatory-expiry.sh <service> <out>│
102├─────────────────────────────────────────────────────────────────┤ 104├─────────────────────────────────────────────────────────────────┤
103│ Extracts structured log entries from journalctl: │ 105│ Extracts structured log entries from journalctl: │
104│ - Parse failures: [PARSE_FAIL] kind=X event_id=Y reason=Z │ 106│ - Parse failures: [PARSE_FAIL] kind=X event_id=Y reason=Z │
@@ -112,7 +114,7 @@ The analysis is split into modular phases for fast iteration. Phases 1-3 and 5 c
112 114
113┌─────────────────────────────────────────────────────────────────┐ 115┌─────────────────────────────────────────────────────────────────┐
114│ PHASE 5: Final Classification (fast, local) │ 116│ PHASE 5: Final Classification (fast, local) │
115scripts/migration/40-classify-actions.sh <all-inputs> <out> │ 117│ migration-scripts/40-classify-actions.sh <all-inputs> <out> │
116├─────────────────────────────────────────────────────────────────┤ 118├─────────────────────────────────────────────────────────────────┤
117│ Combines all data sources to produce final classification: │ 119│ Combines all data sources to produce final classification: │
118│ │ 120│ │
@@ -136,27 +138,38 @@ The analysis is split into modular phases for fast iteration. Phases 1-3 and 5 c
136work/migration-analysis-YYYYMMDD-HHMM/ 138work/migration-analysis-YYYYMMDD-HHMM/
137├── prod/ 139├── prod/
138│ ├── raw/ 140│ ├── raw/
139│ │ ├── state-events.json 141│ │ ├── state-events.json # Phase 1 output
140│ │ ├── announcements.json 142│ │ ├── announcements.json # Phase 1 output
141│ │ └── deletions.json 143│ │ └── deletions.json # Phase 1 output
142│ ├── git-sync-status.tsv 144│ ├── git-sync-status.tsv # Phase 2 output (optional)
143│ └── category-{1,2,3,4}.txt 145│ ├── category1-complete-match.txt # Phase 2/3 output
146│ ├── category2-empty-blank.txt # Phase 2/3 output
147│ ├── category3-partial-match.txt # Phase 2/3 output
148│ └── category4-no-match.txt # Phase 2/3 output
144├── archive/ 149├── archive/
145│ ├── raw/ 150│ ├── raw/
146│ │ ├── state-events.json 151│ │ ├── state-events.json
147│ │ ├── announcements.json 152│ │ ├── announcements.json
148│ │ └── deletions.json 153│ │ └── deletions.json
149│ ├── git-sync-status.tsv 154│ ├── git-sync-status.tsv
150│ └── category-{1,2,3,4}.txt 155│ ├── category1-complete-match.txt
156│ ├── category2-empty-blank.txt
157│ ├── category3-partial-match.txt
158│ └── category4-no-match.txt
151├── logs/ 159├── logs/
152│ ├── parse-failures.txt 160│ ├── parse-failures.txt # Phase 4 output
153│ └── purgatory-expired.txt 161│ └── purgatory-expired.txt # Phase 4 output
154├── comparison/ 162├── comparison/
155│ └── relay-gaps.txt 163│ ├── complete-in-both.txt # Phase 3 output (no action)
164│ ├── complete-prod-missing-archive.txt # Phase 3 output (investigate)
165│ ├── complete-prod-incomplete-archive.txt # Phase 3 output (sync in progress?)
166│ ├── incomplete-in-both.txt # Phase 3 output (git incomplete)
167│ ├── in-archive-not-prod.txt # Phase 3 output (deleted/new)
168│ └── summary.txt # Phase 3 output (human-readable)
156└── results/ 169└── results/
157 ├── no-action-required.txt 170 ├── no-action-required.txt # Phase 5 output
158 ├── action-required.txt 171 ├── action-required.txt # Phase 5 output
159 └── manual-investigation.txt 172 └── manual-investigation.txt # Phase 5 output
160``` 173```
161 174
162## Prerequisites 175## Prerequisites
diff --git a/docs/how-to/migration-scripts/20-categorize.sh b/docs/how-to/migration-scripts/20-categorize.sh
new file mode 100755
index 0000000..f47eb55
--- /dev/null
+++ b/docs/how-to/migration-scripts/20-categorize.sh
@@ -0,0 +1,212 @@
1#!/usr/bin/env bash
2#
3# 20-categorize.sh - Categorize git sync status into 4 categories
4#
5# PHASE 3a of the ngit-relay to ngit-grasp migration analysis pipeline.
6# Takes git-sync-status.tsv from Phase 2 and categorizes into 4 files.
7#
8# USAGE:
9# ./20-categorize.sh <git-sync-status.tsv> <output-dir>
10#
11# EXAMPLES:
12# ./20-categorize.sh output/prod/git-sync-status.tsv output/prod
13# ./20-categorize.sh output/archive/git-sync-status.tsv output/archive
14#
15# INPUT FORMAT (git-sync-status.tsv):
16# Tab-separated values with columns:
17# repo<TAB>npub<TAB>state_refs<TAB>git_refs<TAB>matches<TAB>reason
18#
19# Where reason is optional and can be: no_git_dir, empty_refs, no_state_refs
20#
21# OUTPUT:
22# <output-dir>/category1-complete-match.txt - All refs match perfectly
23# <output-dir>/category2-empty-blank.txt - No git data available
24# <output-dir>/category3-partial-match.txt - Some refs match
25# <output-dir>/category4-no-match.txt - Git exists but refs don't match
26#
27# OUTPUT FORMAT:
28# repo | npub | state_refs=N | git_refs=N | matches=N [| reason=X]
29#
30# CATEGORIES:
31# 1. Complete Match: state_refs == git_refs == matches (all > 0)
32# 2. Empty/Blank: git_refs == 0 OR reason in (no_git_dir, empty_refs, no_state_refs)
33# 3. Partial Match: matches > 0 AND matches < state_refs
34# 4. No Match: git_refs > 0 AND matches == 0
35#
36# PREREQUISITES:
37# - awk (standard Unix tool)
38#
39# RUNTIME: < 1 second (local processing only)
40#
41# SEE ALSO:
42# docs/how-to/migrate-ngit-relay-to-ngit-grasp.md - Full migration guide
43# 10-check-git-sync.sh - Phase 2 script that produces input for this script
44#
45
46set -euo pipefail
47
48# Colors for output (disabled if not a terminal)
49if [[ -t 1 ]]; then
50 RED='\033[0;31m'
51 GREEN='\033[0;32m'
52 YELLOW='\033[0;33m'
53 BLUE='\033[0;34m'
54 NC='\033[0m'
55else
56 RED=''
57 GREEN=''
58 YELLOW=''
59 BLUE=''
60 NC=''
61fi
62
63log_info() {
64 echo -e "${BLUE}[INFO]${NC} $*" >&2
65}
66
67log_success() {
68 echo -e "${GREEN}[OK]${NC} $*" >&2
69}
70
71log_warn() {
72 echo -e "${YELLOW}[WARN]${NC} $*" >&2
73}
74
75log_error() {
76 echo -e "${RED}[ERROR]${NC} $*" >&2
77}
78
79usage() {
80 echo "Usage: $0 <git-sync-status.tsv> <output-dir>"
81 echo ""
82 echo "Arguments:"
83 echo " git-sync-status.tsv TSV file from Phase 2 (10-check-git-sync.sh)"
84 echo " output-dir Directory to store categorized output"
85 echo ""
86 echo "Examples:"
87 echo " $0 output/prod/git-sync-status.tsv output/prod"
88 echo " $0 output/archive/git-sync-status.tsv output/archive"
89 echo ""
90 echo "Input format (TSV):"
91 echo " repo<TAB>npub<TAB>state_refs<TAB>git_refs<TAB>matches<TAB>reason"
92 echo ""
93 echo "Output files:"
94 echo " category1-complete-match.txt - All refs match"
95 echo " category2-empty-blank.txt - No git data"
96 echo " category3-partial-match.txt - Some refs match"
97 echo " category4-no-match.txt - Git exists, refs don't match"
98 exit 1
99}
100
101# Main
102main() {
103 if [[ $# -ne 2 ]]; then
104 usage
105 fi
106
107 local input_file="$1"
108 local output_dir="$2"
109
110 # Validate input file
111 if [[ ! -f "$input_file" ]]; then
112 log_error "Input file not found: $input_file"
113 exit 1
114 fi
115
116 log_info "Categorizing git sync status"
117 log_info "Input: $input_file"
118 log_info "Output: $output_dir"
119
120 # Create output directory
121 mkdir -p "$output_dir"
122
123 # Output files
124 local cat1="$output_dir/category1-complete-match.txt"
125 local cat2="$output_dir/category2-empty-blank.txt"
126 local cat3="$output_dir/category3-partial-match.txt"
127 local cat4="$output_dir/category4-no-match.txt"
128
129 # Clear previous results
130 > "$cat1"
131 > "$cat2"
132 > "$cat3"
133 > "$cat4"
134
135 # Process input file with awk
136 # Input: repo<TAB>npub<TAB>state_refs<TAB>git_refs<TAB>matches<TAB>reason
137 awk -F'\t' -v cat1="$cat1" -v cat2="$cat2" -v cat3="$cat3" -v cat4="$cat4" '
138 BEGIN {
139 count1 = 0; count2 = 0; count3 = 0; count4 = 0
140 }
141 NR == 1 && /^repo/ { next } # Skip header if present
142 NF >= 5 {
143 repo = $1
144 npub = $2
145 state_refs = int($3)
146 git_refs = int($4)
147 matches = int($5)
148 reason = (NF >= 6) ? $6 : ""
149
150 # Format output line
151 if (reason != "") {
152 line = repo " | " npub " | state_refs=" state_refs " | git_refs=" git_refs " | matches=" matches " | reason=" reason
153 } else {
154 line = repo " | " npub " | state_refs=" state_refs " | git_refs=" git_refs " | matches=" matches
155 }
156
157 # Categorize
158 if (reason == "no_git_dir" || reason == "empty_refs" || reason == "no_state_refs" || git_refs == 0) {
159 # Category 2: Empty/Blank
160 print line >> cat2
161 count2++
162 } else if (state_refs > 0 && state_refs == git_refs && matches == state_refs) {
163 # Category 1: Complete Match
164 print line >> cat1
165 count1++
166 } else if (matches > 0 && matches < state_refs) {
167 # Category 3: Partial Match
168 print line >> cat3
169 count3++
170 } else if (git_refs > 0 && matches == 0) {
171 # Category 4: No Match
172 print line >> cat4
173 count4++
174 } else if (matches > 0) {
175 # Edge case: matches > 0 but does not fit other categories
176 # This can happen when git_refs > state_refs but all state refs match
177 # Treat as partial match
178 print line >> cat3
179 count3++
180 } else {
181 # Fallback: treat as category 2 (empty/blank)
182 print line >> cat2
183 count2++
184 }
185 }
186 END {
187 total = count1 + count2 + count3 + count4
188 print "COUNTS:" count1 ":" count2 ":" count3 ":" count4 ":" total
189 }
190 ' "$input_file" 2>&1 | while IFS= read -r line; do
191 if [[ "$line" =~ ^COUNTS: ]]; then
192 # Parse counts from awk output
193 IFS=':' read -r _ c1 c2 c3 c4 total <<< "$line"
194
195 echo ""
196 log_info "=== Categorization Summary ==="
197 log_info "Total entries: $total"
198 log_success "Category 1 (Complete Match): $c1"
199 log_warn "Category 2 (Empty/Blank): $c2"
200 log_warn "Category 3 (Partial Match): $c3"
201 log_error "Category 4 (No Match): $c4"
202 echo ""
203 log_info "Output files:"
204 echo " $cat1"
205 echo " $cat2"
206 echo " $cat3"
207 echo " $cat4"
208 fi
209 done
210}
211
212main "$@"
diff --git a/docs/how-to/migration-scripts/21-compare-relays.sh b/docs/how-to/migration-scripts/21-compare-relays.sh
new file mode 100755
index 0000000..6b40dc8
--- /dev/null
+++ b/docs/how-to/migration-scripts/21-compare-relays.sh
@@ -0,0 +1,294 @@
1#!/usr/bin/env bash
2#
3# 21-compare-relays.sh - Compare prod vs archive category files to find gaps
4#
5# PHASE 3b of the ngit-relay to ngit-grasp migration analysis pipeline.
6# Compares categorized output from prod and archive to identify:
7# - Repos complete in prod but missing/incomplete in archive
8# - Repos in archive but not in prod
9# - Status differences between relays
10#
11# USAGE:
12# ./21-compare-relays.sh <prod-dir> <archive-dir> <output-dir>
13#
14# EXAMPLES:
15# ./21-compare-relays.sh output/prod output/archive output/comparison
16#
17# INPUT:
18# Both prod-dir and archive-dir must contain:
19# - category1-complete-match.txt
20# - category2-empty-blank.txt
21# - category3-partial-match.txt
22# - category4-no-match.txt
23#
24# OUTPUT:
25# <output-dir>/complete-in-both.txt - Repos complete in both relays (no action)
26# <output-dir>/complete-prod-missing-archive.txt - Complete in prod, not in archive cat1
27# <output-dir>/complete-prod-incomplete-archive.txt - Complete in prod, incomplete in archive
28# <output-dir>/incomplete-in-both.txt - Incomplete in both relays
29# <output-dir>/in-archive-not-prod.txt - In archive but not in prod
30# <output-dir>/summary.txt - Human-readable summary
31#
32# OUTPUT FORMAT:
33# Each file contains lines in the format:
34# repo | npub | prod_status | archive_status
35#
36# PREREQUISITES:
37# - awk, sort, comm (standard Unix tools)
38#
39# RUNTIME: < 1 second (local processing only)
40#
41# SEE ALSO:
42# docs/how-to/migrate-ngit-relay-to-ngit-grasp.md - Full migration guide
43# 20-categorize.sh - Phase 3a script that produces input for this script
44#
45
46set -euo pipefail
47
48# Colors for output (disabled if not a terminal)
49if [[ -t 1 ]]; then
50 RED='\033[0;31m'
51 GREEN='\033[0;32m'
52 YELLOW='\033[0;33m'
53 BLUE='\033[0;34m'
54 NC='\033[0m'
55else
56 RED=''
57 GREEN=''
58 YELLOW=''
59 BLUE=''
60 NC=''
61fi
62
63log_info() {
64 echo -e "${BLUE}[INFO]${NC} $*" >&2
65}
66
67log_success() {
68 echo -e "${GREEN}[OK]${NC} $*" >&2
69}
70
71log_warn() {
72 echo -e "${YELLOW}[WARN]${NC} $*" >&2
73}
74
75log_error() {
76 echo -e "${RED}[ERROR]${NC} $*" >&2
77}
78
79usage() {
80 echo "Usage: $0 <prod-dir> <archive-dir> <output-dir>"
81 echo ""
82 echo "Arguments:"
83 echo " prod-dir Directory containing prod category files"
84 echo " archive-dir Directory containing archive category files"
85 echo " output-dir Directory to store comparison results"
86 echo ""
87 echo "Examples:"
88 echo " $0 output/prod output/archive output/comparison"
89 echo ""
90 echo "Required input files in each directory:"
91 echo " category1-complete-match.txt"
92 echo " category2-empty-blank.txt"
93 echo " category3-partial-match.txt"
94 echo " category4-no-match.txt"
95 exit 1
96}
97
98# Extract repo|npub key from category line
99# Input: "repo | npub | state_refs=N | ..."
100# Output: "repo|npub"
101extract_key() {
102 awk -F' \\| ' '{print $1 "|" $2}'
103}
104
105# Build lookup table from category files
106# Args: $1=directory, $2=output_file
107build_lookup() {
108 local dir="$1"
109 local output="$2"
110
111 # Process all 4 category files
112 for cat in 1 2 3 4; do
113 local file="$dir/category${cat}-*.txt"
114 # shellcheck disable=SC2086
115 if ls $file 1>/dev/null 2>&1; then
116 # shellcheck disable=SC2086
117 cat $file | while IFS= read -r line; do
118 key=$(echo "$line" | extract_key)
119 echo "${key}|cat${cat}|${line}"
120 done
121 fi
122 done | sort -t'|' -k1,2 > "$output"
123}
124
125# Main
126main() {
127 if [[ $# -ne 3 ]]; then
128 usage
129 fi
130
131 local prod_dir="$1"
132 local archive_dir="$2"
133 local output_dir="$3"
134
135 # Validate input directories
136 for dir in "$prod_dir" "$archive_dir"; do
137 if [[ ! -d "$dir" ]]; then
138 log_error "Directory not found: $dir"
139 exit 1
140 fi
141 if [[ ! -f "$dir/category1-complete-match.txt" ]]; then
142 log_error "Missing category1-complete-match.txt in $dir"
143 exit 1
144 fi
145 done
146
147 log_info "Comparing relay categories"
148 log_info "Prod: $prod_dir"
149 log_info "Archive: $archive_dir"
150 log_info "Output: $output_dir"
151
152 # Create output directory
153 mkdir -p "$output_dir"
154
155 # Create temp files for processing
156 local tmp_dir
157 tmp_dir=$(mktemp -d)
158 # shellcheck disable=SC2064
159 trap "rm -rf '$tmp_dir'" EXIT
160
161 log_info "Building lookup tables..."
162
163 # Build lookup tables: key|category|full_line
164 build_lookup "$prod_dir" "$tmp_dir/prod_lookup.txt"
165 build_lookup "$archive_dir" "$tmp_dir/archive_lookup.txt"
166
167 # Extract just keys for comparison
168 cut -d'|' -f1,2 "$tmp_dir/prod_lookup.txt" | sort -u > "$tmp_dir/prod_keys.txt"
169 cut -d'|' -f1,2 "$tmp_dir/archive_lookup.txt" | sort -u > "$tmp_dir/archive_keys.txt"
170
171 log_info "Comparing categories..."
172
173 # Initialize output files
174 > "$output_dir/complete-in-both.txt"
175 > "$output_dir/complete-prod-missing-archive.txt"
176 > "$output_dir/complete-prod-incomplete-archive.txt"
177 > "$output_dir/incomplete-in-both.txt"
178 > "$output_dir/in-archive-not-prod.txt"
179
180 # Process prod category 1 (complete) entries
181 while IFS='|' read -r repo npub cat full_line; do
182 key="${repo}|${npub}"
183
184 # Look up in archive
185 archive_entry=$(grep "^${key}|" "$tmp_dir/archive_lookup.txt" 2>/dev/null | head -1 || echo "")
186
187 if [[ -z "$archive_entry" ]]; then
188 # Not in archive at all
189 echo "$repo | $npub | prod=complete | archive=missing" >> "$output_dir/complete-prod-missing-archive.txt"
190 else
191 archive_cat=$(echo "$archive_entry" | cut -d'|' -f3)
192 if [[ "$archive_cat" == "cat1" ]]; then
193 # Complete in both
194 echo "$repo | $npub | prod=complete | archive=complete" >> "$output_dir/complete-in-both.txt"
195 else
196 # Complete in prod, incomplete in archive
197 echo "$repo | $npub | prod=complete | archive=$archive_cat" >> "$output_dir/complete-prod-incomplete-archive.txt"
198 fi
199 fi
200 done < <(grep '|cat1|' "$tmp_dir/prod_lookup.txt" | sed 's/|cat1|/|cat1|/')
201
202 # Process prod categories 2-4 (incomplete) entries
203 for cat in cat2 cat3 cat4; do
204 while IFS='|' read -r repo npub _ full_line; do
205 key="${repo}|${npub}"
206
207 # Look up in archive
208 archive_entry=$(grep "^${key}|" "$tmp_dir/archive_lookup.txt" 2>/dev/null | head -1 || echo "")
209
210 if [[ -z "$archive_entry" ]]; then
211 # Incomplete in prod, missing in archive
212 echo "$repo | $npub | prod=$cat | archive=missing" >> "$output_dir/incomplete-in-both.txt"
213 else
214 archive_cat=$(echo "$archive_entry" | cut -d'|' -f3)
215 if [[ "$archive_cat" != "cat1" ]]; then
216 # Incomplete in both
217 echo "$repo | $npub | prod=$cat | archive=$archive_cat" >> "$output_dir/incomplete-in-both.txt"
218 fi
219 # If archive is complete but prod is not, that's unusual but not an error
220 fi
221 done < <(grep "|${cat}|" "$tmp_dir/prod_lookup.txt")
222 done
223
224 # Find entries in archive but not in prod
225 comm -23 "$tmp_dir/archive_keys.txt" "$tmp_dir/prod_keys.txt" | while IFS='|' read -r repo npub; do
226 key="${repo}|${npub}"
227 archive_entry=$(grep "^${key}|" "$tmp_dir/archive_lookup.txt" 2>/dev/null | head -1 || echo "")
228 archive_cat=$(echo "$archive_entry" | cut -d'|' -f3)
229 echo "$repo | $npub | prod=missing | archive=$archive_cat" >> "$output_dir/in-archive-not-prod.txt"
230 done
231
232 # Count results
233 local count_both count_missing count_incomplete count_both_incomplete count_archive_only
234 count_both=$(wc -l < "$output_dir/complete-in-both.txt" | tr -d ' ')
235 count_missing=$(wc -l < "$output_dir/complete-prod-missing-archive.txt" | tr -d ' ')
236 count_incomplete=$(wc -l < "$output_dir/complete-prod-incomplete-archive.txt" | tr -d ' ')
237 count_both_incomplete=$(wc -l < "$output_dir/incomplete-in-both.txt" | tr -d ' ')
238 count_archive_only=$(wc -l < "$output_dir/in-archive-not-prod.txt" | tr -d ' ')
239
240 # Generate summary
241 cat > "$output_dir/summary.txt" << EOF
242# Relay Comparison Summary
243Generated: $(date -Iseconds)
244
245## Input
246- Prod: $prod_dir
247- Archive: $archive_dir
248
249## Results
250
251### No Action Required
252- Complete in both relays: $count_both
253
254### Action/Decision Required
255- Complete in prod, MISSING from archive: $count_missing
256- Complete in prod, INCOMPLETE in archive: $count_incomplete
257- Incomplete in BOTH relays: $count_both_incomplete
258
259### For Reference
260- In archive but not in prod: $count_archive_only
261
262## Files
263- complete-in-both.txt: Repos successfully migrated (no action)
264- complete-prod-missing-archive.txt: Need investigation - why not in archive?
265- complete-prod-incomplete-archive.txt: Archive sync may still be in progress
266- incomplete-in-both.txt: Git data incomplete on both relays
267- in-archive-not-prod.txt: May be deleted from prod or new to archive
268
269## Next Steps
2701. Review complete-prod-missing-archive.txt - these repos need attention
2712. Check if archive sync is still running for incomplete entries
2723. Cross-reference with deletion events (kind 5) from Phase 1
2734. Use Phase 4 logs to understand parse failures and purgatory expiry
274EOF
275
276 # Display summary
277 echo ""
278 log_info "=== Comparison Summary ==="
279 log_success "Complete in both: $count_both (no action needed)"
280 log_error "Complete in prod, MISSING from archive: $count_missing"
281 log_warn "Complete in prod, incomplete in archive: $count_incomplete"
282 log_warn "Incomplete in both: $count_both_incomplete"
283 log_info "In archive only: $count_archive_only"
284 echo ""
285 log_info "Output files:"
286 echo " $output_dir/complete-in-both.txt"
287 echo " $output_dir/complete-prod-missing-archive.txt"
288 echo " $output_dir/complete-prod-incomplete-archive.txt"
289 echo " $output_dir/incomplete-in-both.txt"
290 echo " $output_dir/in-archive-not-prod.txt"
291 echo " $output_dir/summary.txt"
292}
293
294main "$@"