upleb.uk

Public git repos — served from a NIP-34 GRASP relay at git.upleb.uk

summaryrefslogtreecommitdiff
path: root/docs/archive/2026-01-relay-ngit-dev-migration/migration-guide.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/archive/2026-01-relay-ngit-dev-migration/migration-guide.md')
-rw-r--r--docs/archive/2026-01-relay-ngit-dev-migration/migration-guide.md1030
1 files changed, 1030 insertions, 0 deletions
diff --git a/docs/archive/2026-01-relay-ngit-dev-migration/migration-guide.md b/docs/archive/2026-01-relay-ngit-dev-migration/migration-guide.md
new file mode 100644
index 0000000..abe2191
--- /dev/null
+++ b/docs/archive/2026-01-relay-ngit-dev-migration/migration-guide.md
@@ -0,0 +1,1030 @@
1# Migrate to ngit-grasp from another GRASP implementation
2
3This guide walks you through migrating a production GRASP relay to ngit-grasp. The process involves analyzing your existing data to identify repositories that need attention before switching over.
4
5## Compatibility
6
7This migration process works with any GRASP implementation that:
8
9- Stores git data in the `<npub>/<identifier>.git` directory structure
10- Uses standard GRASP events (kind 30617 announcements, kind 30618 state, kind 5 deletions)
11- Exposes a Nostr relay WebSocket endpoint
12
13**Known compatible implementations:**
14- ngit-relay (reference implementation)
15- ngit-grasp (when migrating between instances or from archive mode)
16- Other GRASP-compliant relays following the specification
17
18The migration scripts analyze Nostr events and git data directly, making them implementation-agnostic.
19
20## Quick Start
21
22Run the migration analysis with a single command:
23
24```bash
25# Basic analysis (fetches events, compares relays)
26./docs/how-to/migration-scripts/run-migration-analysis.sh \
27 --prod-relay wss://source-relay.example.com \
28 --archive-relay wss://target-relay.example.com
29
30# Full analysis (includes git sync check - run on VPS)
31./docs/how-to/migration-scripts/run-migration-analysis.sh \
32 --prod-relay wss://source-relay.example.com \
33 --archive-relay wss://target-relay.example.com \
34 --prod-git /var/lib/grasp-relay/git \
35 --archive-git /var/lib/ngit-grasp/git \
36 --service ngit-grasp.service
37```
38
39The script produces three output files:
40- `results/no-action-required.txt` - Repos ready for migration
41- `results/action-required.txt` - Repos needing intervention
42- `results/manual-investigation.txt` - Repos needing human review
43
44See [Running the Analysis](#running-the-analysis) for detailed options.
45
46## Prerequisites
47
48### Required Tools
49
50- **nak** - Nostr Army Knife for fetching events ([install](https://github.com/fiatjaf/nak))
51- **jq** - JSON processing (install via package manager)
52
53### For Full Analysis (VPS)
54
55- SSH access to the VPS running your source relay
56- Read access to git data directories
57- Access to systemd journal (for log extraction)
58
59### Verify Installation
60
61```bash
62# Check required tools
63nak --version
64jq --version
65git --version
66
67# Check optional tools (for VPS phases)
68journalctl --version
69```
70
71## Gotchas and Common Issues
72
73Before running the analysis, be aware of these common issues discovered during real migrations:
74
75### Git Must Be Installed
76
77The analysis scripts require `git` to be installed and in PATH. This may not be present on minimal VPS installations.
78
79```bash
80# Check if git is available
81which git || echo "Git not found - install it first"
82
83# Install on Debian/Ubuntu
84apt install git
85
86# Install on NixOS (add to configuration.nix)
87environment.systemPackages = [ pkgs.git ];
88```
89
90### Archive Relay May Only Be Accessible Locally
91
92If your archive relay is configured to listen only on localhost (e.g., `ws://localhost:7443`), you must run the analysis **on the VPS itself**, not from a remote machine.
93
94```bash
95# Check if archive relay is accessible
96# This will fail if run remotely against a localhost-only relay
97nak req -k 30618 --limit 1 ws://localhost:7443
98
99# Solution: SSH into the VPS and run analysis there
100ssh user@your-vps
101cd /path/to/scripts
102./run-migration-analysis.sh --archive-relay ws://localhost:7443 ...
103```
104
105### Git Data Paths May Differ from Defaults
106
107Different deployments store git data in different locations. **Always verify paths before running the analysis.**
108
109```bash
110# Find actual git data paths from service configuration
111systemctl cat ngit-relay.service | grep -E 'ExecStart|WorkingDirectory|Environment'
112systemctl cat ngit-grasp-*.service | grep -E 'ExecStart|WorkingDirectory|Environment'
113
114# Common locations:
115# - /var/lib/ngit-relay/git (default)
116# - /var/lib/ngit-grasp/git (default)
117# - /persistent/*/data/repos (custom deployments)
118
119# Verify the path exists and contains expected structure
120ls /path/to/git/npub1*/ # Should show *.git directories
121```
122
123### Phase 4 Needs the Correct Service Name
124
125> **CRITICAL:** Phase 4 extracts structured logs (`[PARSE_FAIL]`, `[PURGATORY_EXPIRED]`, `Invalid announcement` rejections) from journald. These logs **ONLY exist in ngit-grasp services**, NOT in ngit-relay services.
126
127If you specify an ngit-relay service (like `ngit-relay.service`), Phase 4 will find **zero logs** and produce empty results. This is a common mistake that wastes time and produces misleading analysis.
128
129**Correct service names (ngit-grasp):**
130- `ngit-grasp.service`
131- `ngit-grasp-relay-ngit-dev.service` (NixOS multi-instance)
132- `ngit-grasp-archive.service`
133
134**Incorrect service names (ngit-relay - NO structured logging):**
135- `ngit-relay.service`
136- `relay-ngit-dev.service`
137
138```bash
139# Find all ngit-related services
140systemctl list-units 'ngit-*' --all
141
142# Check which service has structured logging (should be ngit-grasp)
143journalctl -u ngit-grasp-*.service | grep -E '\[PARSE_FAIL\]|\[PURGATORY_EXPIRED\]|Invalid announcement' | head -5
144
145# Verify ngit-relay does NOT have structured logging
146journalctl -u ngit-relay.service | grep -E '\[PARSE_FAIL\]|\[PURGATORY_EXPIRED\]|Invalid announcement' | head -5
147# ^ This should return nothing
148
149# Use the archive service name for Phase 4
150./run-migration-analysis.sh ... --service ngit-grasp-relay-ngit-dev.service
151```
152
153The migration scripts now validate the service name and will **error** if you specify an ngit-relay service, preventing this common mistake.
154
155### Permission Issues with Service-Owned Directories
156
157Git data directories are typically owned by the service user and may require elevated permissions to read.
158
159```bash
160# Check directory permissions
161ls -la /var/lib/ngit-grasp/git
162
163# Options:
164# 1. Run as root/sudo
165sudo ./run-migration-analysis.sh ...
166
167# 2. Run as the service user
168sudo -u ngit-grasp ./run-migration-analysis.sh ...
169
170# 3. Add your user to the service group
171sudo usermod -aG ngit-grasp $USER
172# (logout/login required)
173```
174
175### Service Names Vary by Deployment
176
177NixOS multi-instance deployments use service names like `ngit-grasp-<instance>.service`. Always check actual service names.
178
179```bash
180# List all ngit services
181systemctl list-units 'ngit-*' --all --no-pager
182
183# Example output:
184# ngit-relay.service loaded active running ngit-relay
185# ngit-grasp-relay-ngit-dev.service loaded active running ngit-grasp (relay-ngit-dev)
186```
187
188## Migration Overview
189
190The migration process has three stages:
191
192### Stage 1: Deploy Archive Instance
193
194Deploy ngit-grasp alongside your production relay:
195
1961. Configure ngit-grasp with:
197 - `domain` set to `<prod-domain>.internal` (temporary)
198 - `archiveService` set to your production domain
199 - Running on a different port
200
2012. Let it sync for ~1 hour to gather all events and git data
202
203### Stage 2: Analyze Data
204
205Run the migration analysis to identify:
206- Repositories successfully migrated (no action needed)
207- Repositories with incomplete data (need investigation)
208- Repositories with parse failures (may need re-announcement)
209
210### Stage 3: Switch Over
211
212Once all issues are resolved:
2131. Set `domain` to your production URL
2142. Disable archive mode
2153. Update your reverse proxy to point to ngit-grasp
216
217## Running the Analysis
218
219### Before You Start
220
221**Verify paths and service names** before running the analysis. Incorrect paths are the most common source of errors.
222
223```bash
224# 1. Find actual git data paths
225systemctl cat ngit-relay.service | grep -E 'ExecStart|data|git'
226systemctl cat ngit-grasp-*.service | grep -E 'ExecStart|data|git'
227
228# 2. Find service names
229systemctl list-units 'ngit-*' --all --no-pager
230
231# 3. Verify git data exists at the paths
232ls /path/to/prod/git/npub1*/ | head -5
233ls /path/to/archive/git/npub1*/ | head -5
234
235# 4. Check if archive relay is accessible
236nak req -k 30618 --limit 1 ws://localhost:7443 # or your archive URL
237```
238
239### Basic Usage
240
241```bash
242# Preview what will happen (dry run)
243./run-migration-analysis.sh \
244 --prod-relay wss://source-relay.example.com \
245 --archive-relay wss://target-relay.example.com \
246 --dry-run
247
248# Run the analysis
249./run-migration-analysis.sh \
250 --prod-relay wss://source-relay.example.com \
251 --archive-relay wss://target-relay.example.com
252```
253
254### Full Analysis on VPS
255
256**Important:** If your archive relay is localhost-only, you must run this on the VPS.
257
258```bash
259# First, discover your actual paths (see "Before You Start" above)
260# Then run with the correct values:
261
262./run-migration-analysis.sh \
263 --prod-relay wss://source-relay.example.com \
264 --archive-relay ws://localhost:7443 \
265 --prod-git /path/to/prod/git \
266 --archive-git /path/to/archive/git \
267 --service ngit-grasp-your-instance.service
268```
269
270### Phase Control
271
272Skip or run specific phases:
273
274```bash
275# Skip Phase 2 (use cached git sync data)
276./run-migration-analysis.sh ... --skip-phase-2
277
278# Run only Phase 1 (fetch events)
279./run-migration-analysis.sh ... --only-phase-1
280
281# Resume from Phase 3 (using existing data)
282./run-migration-analysis.sh ... --from-phase-3 --output work/migration-analysis-20260122-1430
283```
284
285### All Options
286
287| Option | Description |
288|--------|-------------|
289| `--prod-relay <url>` | Source relay WebSocket URL (required) |
290| `--archive-relay <url>` | Target relay WebSocket URL (required) |
291| `--prod-git <path>` | Git base directory for prod (enables Phase 2) |
292| `--archive-git <path>` | Git base directory for archive (enables Phase 2) |
293| `--service <name>` | Systemd service name for Phase 4 log extraction. **MUST be an ngit-grasp service** (not ngit-relay). Structured logging only exists in ngit-grasp. |
294| `--output <dir>` | Output directory (default: auto-generated) |
295| `--skip-phase-N` | Skip phase N (1-5) |
296| `--only-phase-N` | Run only phase N |
297| `--from-phase-N` | Start from phase N |
298| `--dry-run` | Show what would be executed |
299| `--continue-on-error` | Continue even if a phase fails |
300
301## Understanding Results
302
303### Summary File
304
305The `results/summary.txt` file provides an overview:
306
307```
308## Overview
309
310| Category | Count | Percentage |
311|----------|-------|------------|
312| No Action Required | 450 | 85.7% |
313| Action Required | 52 | 9.9% |
314| Manual Investigation | 23 | 4.4% |
315```
316
317### No Action Required
318
319Repositories in `no-action-required.txt` are ready for migration:
320
321```
322myrepo | npub1abc... | complete in both prod and archive
323oldrepo | npub1def... | deleted by user
324testrepo | npub1ghi... | empty/blank in both (user never pushed)
325```
326
327**Common reasons:**
328- `complete in both prod and archive` - Successfully migrated
329- `deleted by user` - User requested deletion (kind 5 event)
330- `empty/blank in both` - No git data was ever pushed
331- `purgatory expired` - System already handled the timeout
332
333### Action Required
334
335Repositories in `action-required.txt` need intervention:
336
337```
338myrepo | npub1abc... | complete in prod, missing from archive | trigger re-sync or investigate
339otherrepo | npub1def... | incomplete in both (prod=cat3, archive=cat2) | investigate git data source
340```
341
342**Common actions:**
343- **Re-sync needed**: Trigger the archive to re-fetch from the source
344- **Wait for sync**: Archive sync may still be in progress
345- **Investigate git source**: Original git data may be incomplete
346- **Fix parse failure**: Event format issue, may need re-announcement
347
348### Manual Investigation
349
350Repositories in `manual-investigation.txt` have unusual states:
351
352```
353weirdrepo | npub1abc... | in archive (cat1) but not in prod | may be new announcement or deleted from prod
354conflictrepo | npub1def... | complete in prod, missing from archive, parse failure logged | investigate parse failure
355```
356
357These require human judgment to determine the correct action.
358
359## Troubleshooting
360
361### "nak not found"
362
363Install nak from https://github.com/fiatjaf/nak:
364
365```bash
366# Using Go
367go install github.com/fiatjaf/nak@latest
368
369# Or download binary from releases
370```
371
372### "git not found"
373
374Git must be installed and in PATH:
375
376```bash
377# Check if git is available
378which git
379
380# Install on Debian/Ubuntu
381sudo apt install git
382
383# Install on NixOS (add to configuration.nix)
384environment.systemPackages = [ pkgs.git ];
385```
386
387### "Permission denied" on git directories
388
389Run with sudo or ensure your user has read access:
390
391```bash
392# Check permissions
393ls -la /var/lib/grasp-relay/git
394
395# Option 1: Run with sudo
396sudo ./run-migration-analysis.sh ...
397
398# Option 2: Run as service user
399sudo -u ngit-grasp ./run-migration-analysis.sh ...
400```
401
402### Archive relay connection failed
403
404If you get connection errors to the archive relay:
405
406```bash
407# Check if relay is running
408systemctl status ngit-grasp-*.service
409
410# Check if it's localhost-only
411# If archive is ws://localhost:7443, you MUST run on the VPS
412ssh user@your-vps
413./run-migration-analysis.sh --archive-relay ws://localhost:7443 ...
414```
415
416### Wrong git paths / "No such file or directory"
417
418Git data paths vary by deployment. Discover the actual paths:
419
420```bash
421# Find paths from service configuration
422systemctl cat ngit-relay.service | grep -E 'ExecStart|WorkingDirectory|Environment'
423systemctl cat ngit-grasp-*.service | grep -E 'ExecStart|WorkingDirectory|Environment'
424
425# Verify the path contains git repos
426ls /discovered/path/npub1*/
427```
428
429### Phase 2 takes too long
430
431The git sync check processes each repository individually (~20 minutes total). To speed up iteration:
432
4331. Run Phase 2 once and save the output
4342. Use `--skip-phase-2` for subsequent runs
4353. Use `--from-phase-3` to re-run classification with existing data
436
437### No parse failures found
438
439This is expected if:
440- ngit-grasp logging improvements aren't deployed yet
441- No events actually failed to parse
442
443The analysis will continue without log data.
444
445### Phase 4 finds no structured logs
446
447**Symptom:** Phase 4 completes but `parse-failures.txt` and `purgatory-expired.txt` are empty or contain only header comments.
448
449**Most common cause:** You're querying the wrong service (ngit-relay instead of ngit-grasp).
450
451Structured logging (`[PARSE_FAIL]`, `[PURGATORY_EXPIRED]`, `Invalid announcement` rejections) **only exists in ngit-grasp services**. If you specify an ngit-relay service, Phase 4 will find zero logs.
452
453**How to diagnose:**
454
455```bash
456# 1. Check what service you configured
457cat /path/to/output/config.txt | grep SERVICE_NAME
458
459# 2. If it contains "ngit-relay", that's the problem!
460# ngit-relay does NOT have structured logging
461
462# 3. Find the correct ngit-grasp service
463systemctl list-units 'ngit-grasp*' --all
464
465# 4. Verify the ngit-grasp service has structured logs
466journalctl -u ngit-grasp-relay-ngit-dev.service --since "7 days ago" | \
467 grep -E '\[PARSE_FAIL\]|\[PURGATORY_EXPIRED\]|Invalid announcement' | head -5
468```
469
470**How to fix:**
471
472```bash
473# Update SERVICE_NAME to the ngit-grasp archive service and re-run
474./run-migration-analysis.sh \
475 --prod-relay wss://relay.ngit.dev \
476 --archive-relay ws://localhost:7443 \
477 --service ngit-grasp-relay-ngit-dev.service \
478 --from-phase-4 # Skip phases 1-3, just re-run phase 4
479```
480
481**Other possible causes:**
482
4831. **Structured logging not deployed:** If the ngit-grasp instance doesn't have the logging improvements deployed, no structured logs will exist. Check the ngit-grasp version.
484
4852. **No events in time window:** If there genuinely were no parse failures, purgatory expiry events, or invalid announcement rejections, the files will be empty. This is valid - it means everything parsed successfully.
486
4873. **Wrong time range:** The default is 30 days. If your archive has been running longer, you may need `--since` to extend the range.
488
489**Prevention:** The migration scripts now validate the service name and will error if you specify an ngit-relay service.
490
491**Note on "Invalid announcement" rejections:** These are announcements (kind 30617) that were rejected by the write policy due to format violations. The most common reason is "multiple clone tags found" - the NIP-34 spec requires a single clone tag with multiple values, not multiple clone tags. These rejections are logged as `Event rejected by write policy ... reason=Invalid announcement: ...`.
492
493### Event counts are multiples of 250
494
495This suggests pagination may have failed. The scripts use `--paginate` by default, but if you see exactly 250, 500, 750 events, verify the relay is responding correctly.
496
497## Architecture
498
499### Analysis Phases
500
501The analysis is split into 5 modular phases:
502
503| Phase | Name | Time | Location | Description |
504|-------|------|------|----------|-------------|
505| 1 | Fetch Events | ~30s each | Local | Fetch events from both relays |
506| 2 | Git Sync Check | ~20 min each | VPS | Compare state events to git data |
507| 3 | Categorize & Compare | <1s | Local | Categorize and compare results |
508| 4 | Extract Logs | <30s | VPS | Extract parse failures and purgatory expiry |
509| 5 | Final Classification | <5s | Local | Combine all data into actionable results |
510
511### Phase Flow Diagram
512
513```
514┌─────────────────────────────────────────────────────────────────┐
515│ PHASE 1: Fetch Events (~30s, local) │
516│ Fetches kind 30618 (state), 30617 (announcements), 5 (deletion) │
517│ Run twice: once for prod, once for archive │
518└─────────────────────────────────────────────────────────────────┘
519
520┌─────────────────────────────────────────────────────────────────┐
521│ PHASE 2: Git Sync Check (~20 mins, VPS required) │
522│ Compares state event refs to actual git data on disk │
523│ Categorizes into: complete, empty, partial, no-match │
524└─────────────────────────────────────────────────────────────────┘
525
526┌─────────────────────────────────────────────────────────────────┐
527│ PHASE 3: Categorize & Compare (fast, local) │
528│ Compares prod vs archive categories │
529│ Identifies gaps and sync issues │
530└─────────────────────────────────────────────────────────────────┘
531
532┌─────────────────────────────────────────────────────────────────┐
533│ PHASE 4: Log-Based Categories (VPS required) │
534│ Extracts structured logs from the archive service: │
535│ - [PARSE_FAIL] - Events that failed to parse │
536│ - [PURGATORY_EXPIRED] - Repos where git data never arrived │
537│ - "Invalid announcement" - Announcements rejected for format │
538│ violations (e.g., multiple clone tags) │
539│ Provides context for why repos failed to sync │
540└─────────────────────────────────────────────────────────────────┘
541
542┌─────────────────────────────────────────────────────────────────┐
543│ PHASE 5: Final Classification (fast, local) │
544│ Combines all data sources │
545│ Outputs: no-action, action-required, manual-investigation │
546└─────────────────────────────────────────────────────────────────┘
547```
548
549### Git Sync Categories
550
551Phase 2 categorizes repositories into 4 categories:
552
553| Category | Description | Meaning |
554|----------|-------------|---------|
555| 1 | Complete Match | All refs in state event match git data |
556| 2 | Empty/Blank | No git data available |
557| 3 | Partial Match | Some refs match, some don't |
558| 4 | No Match | Git data exists but refs don't match |
559
560### Output Directory Structure
561
562```
563work/migration-analysis-YYYYMMDD-HHMM/
564├── prod/
565│ ├── raw/
566│ │ ├── state-events.json # Phase 1
567│ │ ├── announcements.json # Phase 1
568│ │ └── deletions.json # Phase 1
569│ ├── git-sync-status.tsv # Phase 2
570│ └── category*.txt # Phase 2/3
571├── archive/
572│ └── (same structure as prod)
573├── comparison/
574│ ├── complete-in-both.txt # Phase 3
575│ ├── complete-prod-missing-archive.txt
576│ ├── complete-prod-incomplete-archive.txt
577│ ├── incomplete-in-both.txt
578│ ├── in-archive-not-prod.txt
579│ └── summary.txt
580├── logs/
581│ ├── parse-failures.txt # Phase 4
582│ └── purgatory-expired.txt # Phase 4
583└── results/
584 ├── no-action-required.txt # Phase 5
585 ├── action-required.txt # Phase 5
586 ├── manual-investigation.txt # Phase 5
587 └── summary.txt # Phase 5
588```
589
590## Why Migration May Require Attention
591
592Different GRASP implementations may handle edge cases differently. ngit-grasp has stricter validation and better observability, which can surface issues that were previously hidden:
593
594| Aspect | Typical Source Relay | ngit-grasp |
595|--------|---------------------|------------|
596| Git data validation | May accept partial data | Requires all git data to reproduce state |
597| PR refs cleanup | May not clear `refs/nostr/<event-id>` | Properly manages PR refs |
598| Parse failures | May silently ignore | Logs structured `[PARSE_FAIL]` entries |
599| Sync timeout | May have no timeout | Purgatory expires after configurable period |
600
601These differences explain why some repositories may need attention during migration - ngit-grasp's stricter validation catches issues that other implementations may have silently accepted.
602
603## Next Steps
604
605After running the analysis:
606
6071. **Review the summary** - Check `results/summary.txt` for the overview
6082. **Address action items** - Work through `results/action-required.txt`
6093. **Investigate edge cases** - Review `results/manual-investigation.txt`
6104. **Re-run analysis** - After fixing issues, re-run to verify
6115. **Plan cutover** - Schedule the switch when all issues are resolved
612
613### When to Re-run
614
615Re-run the analysis when:
616- Archive sync has had time to complete
617- You've fixed parse failures or re-announced events
618- You want to verify fixes before cutover
619
620```bash
621# Re-run with existing Phase 2 data (faster)
622./run-migration-analysis.sh ... --skip-phase-2 --output work/migration-analysis-20260122-1430
623```
624
625## Individual Scripts
626
627For advanced usage, you can run individual phase scripts:
628
629```bash
630# Phase 1: Fetch events
631./migration-scripts/01-fetch-events.sh wss://source-relay.example.com output/prod
632
633# Phase 2: Git sync check
634./migration-scripts/10-check-git-sync.sh output/prod/raw/state-events.json /var/lib/grasp-relay/git output/prod --categorize
635
636# Phase 3a: Categorize
637./migration-scripts/20-categorize.sh output/prod/git-sync-status.tsv output/prod
638
639# Phase 3b: Compare relays
640./migration-scripts/21-compare-relays.sh output/prod output/archive output/comparison
641
642# Phase 4a: Extract parse failures
643./migration-scripts/30-extract-parse-failures.sh ngit-grasp.service output/logs
644
645# Phase 4b: Extract purgatory expiry
646./migration-scripts/31-extract-purgatory-expiry.sh ngit-grasp.service output/logs
647
648# Phase 5: Final classification
649./migration-scripts/40-classify-actions.sh work/migration-analysis-20260122-1430
650```
651
652Each script has detailed help available with `--help` or by reading the script header.
653
654## relay.ngit.dev Migration Notes
655
656This section documents the specific configuration and lessons learned from migrating relay.ngit.dev from ngit-relay to ngit-grasp. Use this as a reference for similar deployments.
657
658### Deployment Configuration
659
660| Component | Value |
661|-----------|-------|
662| **Production relay** | `wss://relay.ngit.dev` |
663| **Production service** | `ngit-relay.service` |
664| **Production git path** | `/persistent/relay-ngit-dev-ngit-relay/data/repos` |
665| **Archive relay** | `ws://localhost:7443` (localhost only) |
666| **Archive service** | `ngit-grasp-relay-ngit-dev.service` |
667| **Archive git path** | `/persistent/grasp/relay-ngit-dev/git` |
668
669### Key Differences from Defaults
670
6711. **Git paths are non-standard**: The production relay uses `/persistent/relay-ngit-dev-ngit-relay/data/repos` instead of `/var/lib/ngit-relay/git`
672
6732. **Archive is localhost-only**: The archive relay listens on `ws://localhost:7443`, not a public URL. All analysis must run on the VPS.
674
6753. **Service names include instance**: NixOS multi-instance deployment uses `ngit-grasp-relay-ngit-dev.service`, not `ngit-grasp.service`
676
677### Analysis Command
678
679```bash
680# Run on VPS (archive is localhost-only)
681./docs/how-to/migration-scripts/run-migration-analysis.sh \
682 --prod-relay wss://relay.ngit.dev \
683 --archive-relay ws://localhost:7443 \
684 --prod-git /persistent/relay-ngit-dev-ngit-relay/data/repos \
685 --archive-git /persistent/grasp/relay-ngit-dev/git \
686 --service ngit-grasp-relay-ngit-dev.service
687```
688
689### Analysis Results (January 2026)
690
691| Category | Count | Notes |
692|----------|-------|-------|
693| Complete in both | ~400 | Ready for migration |
694| Complete in prod, missing from archive | 315 | Need re-sync |
695| Empty in both | 100 | Users never pushed git data |
696| Manual investigation | 5 | Unusual states |
697| Purgatory expired | 382 | Structured logging working |
698
699### Lessons Learned
700
7011. **Always verify paths first**: The default paths in examples didn't match the actual deployment. Use `systemctl cat <service>` to find real paths.
702
7032. **Check archive accessibility**: We initially tried to run analysis remotely, but the archive relay was localhost-only. Had to SSH to VPS.
704
7053. **Use archive service for Phase 4 (CRITICAL)**: Structured logging (`[PARSE_FAIL]`, `[PURGATORY_EXPIRED]`) is **ONLY** in the ngit-grasp archive service, NOT the ngit-relay production service. Running Phase 4 against `ngit-relay.service` produces zero results because ngit-relay doesn't emit structured logs. The scripts now validate this and error if you specify an ngit-relay service.
706
7074. **Install git on VPS**: Git wasn't installed on the minimal VPS. The scripts now check for this in prerequisites.
708
7095. **Permissions matter**: Some directories required `sudo` to access. Running as root or the service user resolved this.
710
711### Next Steps for relay.ngit.dev
712
7131. **Re-sync 315 repos**: Trigger archive to re-fetch from production
7142. **Investigate 5 edge cases**: Manual review of unusual states
7153. **Monitor purgatory**: 382 expired entries indicate sync issues to investigate
7164. **Plan cutover**: Once re-sync complete, switch DNS/proxy to ngit-grasp
717
718## ngit-relay Troubleshooting
719
720This section covers common issues encountered when running ngit-relay in production, including git permission errors and repository corruption. These issues were discovered during the relay.ngit.dev migration and may affect other deployments.
721
722### Git Permission Denied Errors
723
724#### Symptoms
725
726When cloning repositories, you see:
727
728```bash
729$ git clone https://relay.ngit.dev/npub.../repo.git
730Cloning into 'repo'...
731remote: warning: unable to access '/root/.config/git/attributes': Permission denied
732```
733
734Or in container logs:
735
736```
737warning: unable to access '/root/.config/git/attributes': Permission denied
738```
739
740#### Explanation
741
742This occurs when:
7431. Git operations run as a non-root user (typically `nginx` user, UID 101)
7442. Git tries to access `/root/.config/git/attributes` for global git configuration
7453. The `/root` directory has permissions `0700` (drwx------), preventing non-root users from traversing into it
7464. Even though the `attributes` file itself may be world-readable, the nginx user cannot reach it due to parent directory permissions
747
748**Root cause:** The container runs git commands via fcgiwrap as the nginx user, but `/root` is only accessible by root.
749
750#### Quick Fix (Temporary - Does Not Survive Container Restart)
751
752This fix resolves the issue immediately but will be lost when containers restart:
753
754```bash
755# For each ngit-relay container, exec in and create the git config directory
756sudo podman exec <container-name> sh -c "mkdir -p /root/.config/git && touch /root/.config/git/attributes && chmod 644 /root/.config/git/attributes"
757
758# Example for specific containers:
759sudo podman exec gitnostr-com-ngit-relay sh -c "mkdir -p /root/.config/git && touch /root/.config/git/attributes && chmod 644 /root/.config/git/attributes"
760
761sudo podman exec relay-ngit-dev-ngit-relay sh -c "mkdir -p /root/.config/git && touch /root/.config/git/attributes && chmod 644 /root/.config/git/attributes"
762```
763
764**Important:** This fix is temporary and will be lost when the container restarts. For a permanent solution, see the NixOS configuration below.
765
766#### Permanent Fix (NixOS Configuration)
767
768For NixOS deployments, add systemd services that automatically fix `/root` permissions after each container start:
769
770```nix
771# In your ngit-relay service configuration (e.g., services/relay-ngit-dev-ngit-relay.nix)
772
773systemd.services.relay-ngit-dev-fix-root-perms = {
774 description = "Fix /root permissions in relay.ngit.dev container for git access";
775 after = [ "podman-relay-ngit-dev-ngit-relay.service" ];
776 requires = [ "podman-relay-ngit-dev-ngit-relay.service" ];
777 wantedBy = [ "multi-user.target" ];
778 serviceConfig = {
779 Type = "oneshot";
780 RemainAfterExit = true;
781 ExecStart = "${pkgs.bash}/bin/bash -c 'sleep 5 && ${pkgs.podman}/bin/podman exec relay-ngit-dev-ngit-relay chmod 711 /root'";
782 Restart = "on-failure";
783 RestartSec = "10s";
784 };
785};
786```
787
788This changes `/root` permissions from `0700` to `0711`, allowing the nginx user to traverse through `/root` to reach `/root/.config/git/`.
789
790**Why 711?**
791- `7` (owner/root): Full read/write/execute
792- `1` (group): Execute only (traverse)
793- `1` (other): Execute only (traverse)
794
795This allows non-root users to traverse through `/root` to access subdirectories, while still protecting `/root` contents from being listed or read.
796
797#### Verification
798
799After applying the fix:
800
801```bash
802# Test that cloning works without permission warnings
803git clone https://relay.ngit.dev/npub.../repo.git
804
805# Should clone successfully with no "Permission denied" warnings
806
807# Verify /root permissions inside container
808sudo podman exec relay-ngit-dev-ngit-relay ls -ld /root
809# Should show: drwx--x--x (711)
810
811# Verify nginx user can access git config
812sudo podman exec relay-ngit-dev-ngit-relay su -s /bin/sh nginx -c "cat /root/.config/git/attributes"
813# Should succeed without "Permission denied"
814```
815
816### Git Repository Corruption
817
818#### Symptoms
819
820When cloning repositories, you see:
821
822```bash
823$ git clone https://relay.ngit.dev/npub.../repo.git
824Cloning into 'repo'...
825remote: fatal: bad tree object 8b765235809eb27159657eb4c97fb37d21c29bf0
826remote: aborting due to possible repository corruption on the remote side.
827fatal: early EOF
828fatal: fetch-pack: invalid index-pack output
829```
830
831Or when running `git fsck` on the server:
832
833```
834broken link from tree 7d60270e1904c30ae6cef7b465ef842a9f9f63c3
835 to tree 8b765235809eb27159657eb4c97fb37d21c29bf0
836missing tree 8b765235809eb27159657eb4c97fb37d21c29bf0
837```
838
839#### Explanation
840
841Repository corruption typically occurs due to:
842
8431. **Incomplete push operations**: A git push was interrupted mid-transfer, creating a commit that references objects that were never written to disk
8442. **Permission issues during push**: The git-receive-pack process couldn't write objects due to permission problems (e.g., files owned by wrong user)
8453. **Disk/filesystem issues**: Rare cases of disk errors or filesystem corruption
846
847**Common pattern:** A commit exists with references to tree objects, but those tree objects are missing from the repository. Sometimes individual blobs (files) exist as "dangling" objects but were never properly linked into the tree structure.
848
849**Warning signs:**
850- HEAD file or objects owned by root when they should be owned by the service user (UID 101)
851- Dangling blobs in `git fsck` output
852- Recent permission denied errors in logs
853
854#### How to Fix
855
856**Step 1: Locate the corrupted repository**
857
858```bash
859# SSH to the server
860ssh dc@ngit.dev
861
862# Find the repository path
863# For relay.ngit.dev: /persistent/relay-ngit-dev-ngit-relay/data/repos/npub.../repo.git
864# For gitnostr.com: /persistent/gitnostr-com-ngit-relay/data/repos/npub.../repo.git
865
866cd /persistent/relay-ngit-dev-ngit-relay/data/repos/npub1c03rad0r6q833vh57kyd3ndu2jry30nkr0wepqfpsm05vq7he25slryrnw/axepool.git
867```
868
869**Step 2: Diagnose the corruption**
870
871```bash
872# Run git fsck to identify missing/corrupted objects
873git fsck --full
874
875# Example output:
876# broken link from tree 7d60270e1904c30ae6cef7b465ef842a9f9f63c3
877# to tree 8b765235809eb27159657eb4c97fb37d21c29bf0
878# missing tree 8b765235809eb27159657eb4c97fb37d21c29bf0
879# dangling blob 94490b902c9bceb6f901cd0c7c25b685e3685d87
880
881# Check which commit references the missing object
882git log --all --oneline | head -10
883
884# Inspect the broken commit
885git cat-file -p <commit-hash>
886# This will show which tree is missing
887```
888
889**Step 3: Attempt automatic repair**
890
891Try these in order:
892
893```bash
894# Option A: Repack and garbage collect
895git gc --aggressive --prune=now
896
897# Then check if corruption is fixed
898git fsck --full
899
900# Option B: If that doesn't work, try recovering from pack files
901git unpack-objects < .git/objects/pack/*.pack
902git fsck --full
903```
904
905**Step 4: Manual reconstruction (if automatic repair fails)**
906
907If the missing tree object can be reconstructed from dangling blobs:
908
909```bash
910# 1. Identify what should be in the missing tree
911# Look at the commit message and nearby commits to understand the structure
912
913# 2. Find dangling blobs that might belong to the tree
914git fsck --full | grep "dangling blob"
915
916# 3. Examine each dangling blob to identify files
917git cat-file -p 94490b902c9bceb6f901cd0c7c25b685e3685d87
918
919# 4. Reconstruct the tree manually
920# This requires creating a new tree object with the correct structure
921# Example (advanced):
922git mktree <<EOF
923100644 blob <blob-hash> filename1.rs
924100644 blob <blob-hash> filename2.rs
925EOF
926# This outputs a new tree hash
927
928# 5. Create a new commit with the fixed tree
929git commit-tree <new-tree-hash> -p <parent-commit> -m "Reconstructed commit message"
930# This outputs a new commit hash
931
932# 6. Update the branch reference
933git update-ref refs/heads/<branch-name> <new-commit-hash>
934
935# 7. Clean up
936git gc --prune=now
937```
938
939**Step 5: Verify the fix**
940
941```bash
942# Run fsck again - should show no errors
943git fsck --full
944
945# Test clone locally
946git clone /path/to/repo.git /tmp/test-clone
947
948# Test clone via HTTP
949git clone https://relay.ngit.dev/npub.../repo.git /tmp/test-clone-http
950```
951
952**Step 6: Fix ownership and permissions**
953
954Ensure all repository files are owned by the correct user:
955
956```bash
957# For ngit-relay containers, files should be owned by UID 101 (nginx user)
958sudo chown -R 101:101 /persistent/relay-ngit-dev-ngit-relay/data/repos/npub.../repo.git
959
960# Verify
961ls -la /persistent/relay-ngit-dev-ngit-relay/data/repos/npub.../repo.git
962```
963
964**Step 7: Replicate fix to other instances (if applicable)**
965
966If you have multiple relay instances (e.g., gitnostr.com and relay.ngit.dev), replicate the fix:
967
968```bash
969# Copy the repaired pack files
970sudo cp /persistent/relay-ngit-dev-ngit-relay/data/repos/npub.../repo.git/objects/pack/* \
971 /persistent/gitnostr-com-ngit-relay/data/repos/npub.../repo.git/objects/pack/
972
973# Update the branch reference
974cd /persistent/gitnostr-com-ngit-relay/data/repos/npub.../repo.git
975git update-ref refs/heads/<branch-name> <new-commit-hash>
976
977# Fix ownership
978sudo chown -R 101:101 /persistent/gitnostr-com-ngit-relay/data/repos/npub.../repo.git
979
980# Clean up
981git gc --prune=now
982```
983
984#### Prevention
985
986To prevent future corruption:
987
9881. **Fix permission issues first**: Ensure the permission denied errors are resolved (see previous section)
9892. **Monitor for root-owned files**: Files in git repositories should be owned by UID 101, not root
9903. **Check disk health**: Run `df -h` and `smartctl` to ensure disk is healthy
9914. **Enable git fsck in monitoring**: Periodically run `git fsck` on repositories to catch corruption early
992
993```bash
994# Add to monitoring/cron (example)
995find /persistent/*/data/repos -name "*.git" -type d | while read repo; do
996 echo "Checking $repo"
997 git -C "$repo" fsck --full 2>&1 | grep -v "^Checking\|^dangling"
998done
999```
1000
1001#### Real-World Example: axepool.git Corruption
1002
1003During the relay.ngit.dev migration, the `axepool.git` repository was corrupted:
1004
1005**Problem:**
1006- Commit `e84518b` referenced tree `8b765235...` (the `src` directory)
1007- Tree `8b765235...` was missing from the repository
1008- Blob `94490b90...` (mint_client.rs) existed as a dangling object but wasn't linked
1009
1010**Root cause:**
1011- An incomplete push operation
1012- Permission issues (HEAD file was owned by root)
1013- The commit was created but the tree object was never written
1014
1015**Solution:**
10161. Identified the missing tree should contain: `lib.rs`, `main.rs`, `mint_client.rs`
10172. Found the dangling blob `94490b90...` was `mint_client.rs`
10183. Reconstructed the `src` tree with all three files
10194. Created new commit `e12bc3cf...` with the fixed tree
10205. Updated `refs/heads/add-missing-hooks` to point to the new commit
10216. Ran `git gc --prune=now` to clean up
10227. Replicated fix to gitnostr.com instance
1023
1024**Result:** Both relays now clone successfully with all files intact.
1025
1026### Additional Resources
1027
1028- **ngit-relay repository**: https://github.com/danconwaydev/ngit-relay
1029- **Git internals documentation**: https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain
1030- **Podman documentation**: https://docs.podman.io/