From f84c7d04ff5d3f9c6c56d78bc00c01814e7348e4 Mon Sep 17 00:00:00 2001 From: DanConwayDev Date: Fri, 23 Jan 2026 11:45:33 +0000 Subject: Restructure migration guide for practical usage Transforms the guide from a technical reference into a practical step-by-step guide with: - Quick Start section at the top with copy-paste commands - Prerequisites section with verification steps - Migration Overview explaining the 3-stage process - Running the Analysis section with all options documented - Understanding Results section explaining output files - Troubleshooting section for common issues - Architecture section (moved from top) for those wanting details - Next Steps section for post-analysis workflow The guide now follows a practical flow: get started fast, understand results, then dive into architecture details if needed. --- docs/how-to/migrate-ngit-relay-to-ngit-grasp.md | 484 ++++++++++++++++-------- 1 file changed, 336 insertions(+), 148 deletions(-) (limited to 'docs') diff --git a/docs/how-to/migrate-ngit-relay-to-ngit-grasp.md b/docs/how-to/migrate-ngit-relay-to-ngit-grasp.md index 4c3a4ba..975eb4c 100644 --- a/docs/how-to/migrate-ngit-relay-to-ngit-grasp.md +++ b/docs/how-to/migrate-ngit-relay-to-ngit-grasp.md @@ -1,207 +1,395 @@ -# Migrate ngit-relay to ngit-grasp on NixOS VPS +# Migrate ngit-relay to ngit-grasp -**Goal:** Replace an ngit-relay instance on a VPS running NixOS with ngit-grasp. +This guide walks you through migrating a production ngit-relay instance to ngit-grasp. The process involves analyzing your existing data to identify repositories that need attention before switching over. -**Specifics:** VPS running NixOS. +## Quick Start -## Approach +Run the migration analysis with a single command: -1. Deploy ngit-grasp with 'domain' of `.internal` and an `archiveService` of `` running on a different port. This will gather all the events and git data from the production service and relays/git servers/grasp servers that for repositories that list the service in their announcement event. To sync all git data may take an hour. +```bash +# Basic analysis (fetches events, compares relays) +./docs/how-to/migration-scripts/run-migration-analysis.sh \ + --prod-relay wss://relay.ngit.dev \ + --archive-relay wss://archive.relay.ngit.dev -2. Analyze the data to see which repositories have not been moved with complete data. Understand why and for each decide if action is needed / not needed to move it. +# Full analysis (includes git sync check - run on VPS) +./docs/how-to/migration-scripts/run-migration-analysis.sh \ + --prod-relay wss://relay.ngit.dev \ + --archive-relay wss://archive.relay.ngit.dev \ + --prod-git /var/lib/ngit-relay/git \ + --archive-git /var/lib/ngit-relay-archive/git \ + --service ngit-grasp.service +``` + +The script produces three output files: +- `results/no-action-required.txt` - Repos ready for migration +- `results/action-required.txt` - Repos needing intervention +- `results/manual-investigation.txt` - Repos needing human review + +See [Running the Analysis](#running-the-analysis) for detailed options. + +## Prerequisites + +### Required Tools + +- **nak** - Nostr Army Knife for fetching events ([install](https://github.com/fiatjaf/nak)) +- **jq** - JSON processing (install via package manager) + +### For Full Analysis (VPS) + +- SSH access to the VPS running ngit-relay +- Read access to git data directories +- Access to systemd journal (for log extraction) + +### Verify Installation + +```bash +# Check required tools +nak --version +jq --version + +# Check optional tools (for VPS phases) +journalctl --version +``` + +## Migration Overview + +The migration process has three stages: + +### Stage 1: Deploy Archive Instance + +Deploy ngit-grasp alongside your production ngit-relay: + +1. Configure ngit-grasp with: + - `domain` set to `.internal` (temporary) + - `archiveService` set to your production domain + - Running on a different port + +2. Let it sync for ~1 hour to gather all events and git data + +### Stage 2: Analyze Data + +Run the migration analysis to identify: +- Repositories successfully migrated (no action needed) +- Repositories with incomplete data (need investigation) +- Repositories with parse failures (may need re-announcement) + +### Stage 3: Switch Over + +Once all issues are resolved: +1. Set `domain` to your production URL +2. Disable archive mode +3. Update your reverse proxy to point to ngit-grasp + +## Running the Analysis + +### Basic Usage + +```bash +# Preview what will happen (dry run) +./run-migration-analysis.sh \ + --prod-relay wss://relay.ngit.dev \ + --archive-relay wss://archive.relay.ngit.dev \ + --dry-run + +# Run the analysis +./run-migration-analysis.sh \ + --prod-relay wss://relay.ngit.dev \ + --archive-relay wss://archive.relay.ngit.dev +``` + +### Full Analysis on VPS + +```bash +./run-migration-analysis.sh \ + --prod-relay wss://relay.ngit.dev \ + --archive-relay wss://archive.relay.ngit.dev \ + --prod-git /var/lib/ngit-relay/git \ + --archive-git /var/lib/ngit-relay-archive/git \ + --service ngit-grasp.service +``` + +### Phase Control + +Skip or run specific phases: + +```bash +# Skip Phase 2 (use cached git sync data) +./run-migration-analysis.sh ... --skip-phase-2 + +# Run only Phase 1 (fetch events) +./run-migration-analysis.sh ... --only-phase-1 + +# Resume from Phase 3 (using existing data) +./run-migration-analysis.sh ... --from-phase-3 --output work/migration-analysis-20260122-1430 +``` + +### All Options + +| Option | Description | +|--------|-------------| +| `--prod-relay ` | Production relay WebSocket URL (required) | +| `--archive-relay ` | Archive relay WebSocket URL (required) | +| `--prod-git ` | Git base directory for prod (enables Phase 2) | +| `--archive-git ` | Git base directory for archive (enables Phase 2) | +| `--service ` | Systemd service name (enables Phase 4) | +| `--output ` | Output directory (default: auto-generated) | +| `--skip-phase-N` | Skip phase N (1-5) | +| `--only-phase-N` | Run only phase N | +| `--from-phase-N` | Start from phase N | +| `--dry-run` | Show what would be executed | +| `--continue-on-error` | Continue even if a phase fails | + +## Understanding Results + +### Summary File + +The `results/summary.txt` file provides an overview: + +``` +## Overview + +| Category | Count | Percentage | +|----------|-------|------------| +| No Action Required | 450 | 85.7% | +| Action Required | 52 | 9.9% | +| Manual Investigation | 23 | 4.4% | +``` + +### No Action Required -3. Set the 'domain' to production URL, turn off archive mode, and point your reverse proxy at the new port. +Repositories in `no-action-required.txt` are ready for migration: -## Challenges +``` +myrepo | npub1abc... | complete in both prod and archive +oldrepo | npub1def... | deleted by user +testrepo | npub1ghi... | empty/blank in both (user never pushed) +``` + +**Common reasons:** +- `complete in both prod and archive` - Successfully migrated +- `deleted by user` - User requested deletion (kind 5 event) +- `empty/blank in both` - No git data was ever pushed +- `purgatory expired` - System already handled the timeout + +### Action Required + +Repositories in `action-required.txt` need intervention: + +``` +myrepo | npub1abc... | complete in prod, missing from archive | trigger re-sync or investigate +otherrepo | npub1def... | incomplete in both (prod=cat3, archive=cat2) | investigate git data source +``` + +**Common actions:** +- **Re-sync needed**: Trigger the archive to re-fetch from the source +- **Wait for sync**: Archive sync may still be in progress +- **Investigate git source**: Original git data may be incomplete +- **Fix parse failure**: Event format issue, may need re-announcement -- **ngit-relay accepts any commits/annotated tags** that were at that point of time referenced in the latest state event. **ngit-grasp requires all the git data** to reproduce the latest state. So if the git data is incomplete, it won't accept the repository. +### Manual Investigation -- **ngit-relay doesn't clear out refs/nostr/** where it doesn't have a PR event. Fortunately the 'PR' (as opposed to patches) functionality is not widely used so we just need to check a few repositories (shakespeare, ngit and gitworkshop). +Repositories in `manual-investigation.txt` have unusual states: + +``` +weirdrepo | npub1abc... | in archive (cat1) but not in prod | may be new announcement or deleted from prod +conflictrepo | npub1def... | complete in prod, missing from archive, parse failure logged | investigate parse failure +``` -## Analysis Categories +These require human judgment to determine the correct action. -### No action required: +## Troubleshooting -| Category | How to Detect | Source | -|----------|---------------|--------| -| **Git Data Complete - Moved** | prod cat1 AND archive cat1 (same repo) | Git sync check | -| **Invalid Announcement** (Won't Parse) | Log: `[PARSE_FAIL] kind=30617` | Archive logs | -| **Deletion Request** | kind 5 event tagging announcement | Event fetch | -| **Announcement Not on Prod But In Archive** | In archive announcements, not in prod | Event comparison | +### "nak not found" -### Action/decision required: +Install nak from https://github.com/fiatjaf/nak: -| Category | How to Detect | Source | -|----------|---------------|--------| -| **Invalid State Event** (Won't Parse) | Log: `[PARSE_FAIL] kind=30618` | Archive logs | -| **Purgatory Expired** (sync should have worked) | Log: `[PURGATORY_EXPIRED]` | Archive logs | -| **Incomplete Git Data** (both relays) | prod cat2/3/4 AND archive cat2/3/4 | Git sync check | -| **No Announcement In Archive** | In prod, not in archive, no deletion | Event comparison | -| **State but incomplete git in Archive** | archive cat3 or cat4 | Git sync check | +```bash +# Using Go +go install github.com/fiatjaf/nak@latest -### Manual investigation required: +# Or download binary from releases +``` -- Repos that don't fit above categories -- Repos with unexpected state (e.g., complete in prod, missing in archive, no log entries) +### "Permission denied" on git directories -## Analysis Script Architecture +Run with sudo or ensure your user has read access: -The analysis is split into modular phases for fast iteration. Phases 1-3 and 5 can run locally; Phase 2 and 4 require VPS access. +```bash +# Check permissions +ls -la /var/lib/ngit-relay/git + +# Run with sudo if needed +sudo ./run-migration-analysis.sh ... +``` + +### Phase 2 takes too long + +The git sync check processes each repository individually (~20 minutes total). To speed up iteration: + +1. Run Phase 2 once and save the output +2. Use `--skip-phase-2` for subsequent runs +3. Use `--from-phase-3` to re-run classification with existing data + +### No parse failures found + +This is expected if: +- ngit-grasp logging improvements aren't deployed yet +- No events actually failed to parse + +The analysis will continue without log data. + +### Event counts are multiples of 250 + +This suggests pagination may have failed. The scripts use `--paginate` by default, but if you see exactly 250, 500, 750 events, verify the relay is responding correctly. + +## Architecture + +### Analysis Phases + +The analysis is split into 5 modular phases: + +| Phase | Name | Time | Location | Description | +|-------|------|------|----------|-------------| +| 1 | Fetch Events | ~30s each | Local | Fetch events from both relays | +| 2 | Git Sync Check | ~20 min each | VPS | Compare state events to git data | +| 3 | Categorize & Compare | <1s | Local | Categorize and compare results | +| 4 | Extract Logs | <30s | VPS | Extract parse failures and purgatory expiry | +| 5 | Final Classification | <5s | Local | Combine all data into actionable results | + +### Phase Flow Diagram ``` ┌─────────────────────────────────────────────────────────────────┐ │ PHASE 1: Fetch Events (~30s, local) │ -│ migration-scripts/01-fetch-events.sh │ -├─────────────────────────────────────────────────────────────────┤ -│ Fetches from relay: │ -│ - kind 30618 (state events) │ -│ - kind 30617 (announcements) │ -│ - kind 5 (deletion requests) │ -│ │ -│ Run twice: once for prod (relay.ngit.dev), once for archive │ -│ Output: /{state,announcements,deletions}.json │ +│ Fetches kind 30618 (state), 30617 (announcements), 5 (deletion) │ +│ Run twice: once for prod, once for archive │ └─────────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────────┐ │ PHASE 2: Git Sync Check (~20 mins, VPS required) │ -│ migration-scripts/10-check-git-sync.sh │ -├─────────────────────────────────────────────────────────────────┤ -│ For each state event, compares refs to actual git data on disk. │ -│ │ -│ Run twice: │ -│ - prod: GIT_BASE=/persistent/relay-ngit-dev-ngit-relay/... │ -│ - archive: GIT_BASE=/persistent/grasp/sync-archive/git │ -│ │ -│ Output: git-sync-status.tsv │ -│ repo|npub|state_refs|git_refs|matches|status │ +│ Compares state event refs to actual git data on disk │ +│ Categorizes into: complete, empty, partial, no-match │ └─────────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────────┐ │ PHASE 3: Categorize & Compare (fast, local) │ -│ migration-scripts/20-categorize.sh │ -│ migration-scripts/21-compare-relays.sh │ -├─────────────────────────────────────────────────────────────────┤ -│ 20-categorize.sh applies 4-category logic: │ -│ - cat1: complete match (all refs match) │ -│ - cat2: empty/blank (no git data) │ -│ - cat3: partial match (some refs match) │ -│ - cat4: no match (git exists but refs don't match) │ -│ │ -│ 21-compare-relays.sh compares prod vs archive: │ -│ - complete-in-both.txt (no action needed) │ -│ - complete-prod-missing-archive.txt (needs investigation) │ -│ - complete-prod-incomplete-archive.txt (sync in progress?) │ -│ - incomplete-in-both.txt (git data incomplete) │ -│ - in-archive-not-prod.txt (deleted or new) │ -│ │ -│ Output: category-{1,2,3,4}.txt, comparison/*.txt, summary.txt │ +│ Compares prod vs archive categories │ +│ Identifies gaps and sync issues │ └─────────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────────┐ │ PHASE 4: Log-Based Categories (VPS required) │ -│ migration-scripts/30-extract-parse-failures.sh │ -│ migration-scripts/31-extract-purgatory-expiry.sh │ -├─────────────────────────────────────────────────────────────────┤ -│ Extracts structured log entries from journalctl: │ -│ - Parse failures: [PARSE_FAIL] kind=X event_id=Y reason=Z │ -│ - Purgatory expiry: [PURGATORY_EXPIRED] repo=X npub=Y │ -│ │ -│ NOTE: Requires logging improvements in ngit-grasp to emit │ -│ these structured log entries. See issue: TBD │ -│ │ -│ Output: parse-failures.txt, purgatory-expired.txt │ +│ Extracts [PARSE_FAIL] and [PURGATORY_EXPIRED] from logs │ +│ Provides context for why repos failed to sync │ └─────────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────────┐ │ PHASE 5: Final Classification (fast, local) │ -│ migration-scripts/40-classify-actions.sh │ -├─────────────────────────────────────────────────────────────────┤ -│ Combines all data sources to produce final classification: │ -│ │ -│ Inputs: │ -│ - category files (prod and archive) │ -│ - relay-gaps.txt │ -│ - parse-failures.txt │ -│ - purgatory-expired.txt │ -│ - deletions.json │ -│ │ -│ Output: │ -│ - no-action-required.txt (repo|reason) │ -│ - action-required.txt (repo|reason|suggested_action) │ -│ - manual-investigation.txt (repo|notes) │ +│ Combines all data sources │ +│ Outputs: no-action, action-required, manual-investigation │ └─────────────────────────────────────────────────────────────────┘ ``` -## Directory Structure +### Git Sync Categories + +Phase 2 categorizes repositories into 4 categories: + +| Category | Description | Meaning | +|----------|-------------|---------| +| 1 | Complete Match | All refs in state event match git data | +| 2 | Empty/Blank | No git data available | +| 3 | Partial Match | Some refs match, some don't | +| 4 | No Match | Git data exists but refs don't match | + +### Output Directory Structure ``` work/migration-analysis-YYYYMMDD-HHMM/ ├── prod/ │ ├── raw/ -│ │ ├── state-events.json # Phase 1 output -│ │ ├── announcements.json # Phase 1 output -│ │ └── deletions.json # Phase 1 output -│ ├── git-sync-status.tsv # Phase 2 output (optional) -│ ├── category1-complete-match.txt # Phase 2/3 output -│ ├── category2-empty-blank.txt # Phase 2/3 output -│ ├── category3-partial-match.txt # Phase 2/3 output -│ └── category4-no-match.txt # Phase 2/3 output +│ │ ├── state-events.json # Phase 1 +│ │ ├── announcements.json # Phase 1 +│ │ └── deletions.json # Phase 1 +│ ├── git-sync-status.tsv # Phase 2 +│ └── category*.txt # Phase 2/3 ├── archive/ -│ ├── raw/ -│ │ ├── state-events.json -│ │ ├── announcements.json -│ │ └── deletions.json -│ ├── git-sync-status.tsv -│ ├── category1-complete-match.txt -│ ├── category2-empty-blank.txt -│ ├── category3-partial-match.txt -│ └── category4-no-match.txt -├── logs/ -│ ├── parse-failures.txt # Phase 4 output -│ └── purgatory-expired.txt # Phase 4 output +│ └── (same structure as prod) ├── comparison/ -│ ├── complete-in-both.txt # Phase 3 output (no action) -│ ├── complete-prod-missing-archive.txt # Phase 3 output (investigate) -│ ├── complete-prod-incomplete-archive.txt # Phase 3 output (sync in progress?) -│ ├── incomplete-in-both.txt # Phase 3 output (git incomplete) -│ ├── in-archive-not-prod.txt # Phase 3 output (deleted/new) -│ └── summary.txt # Phase 3 output (human-readable) +│ ├── complete-in-both.txt # Phase 3 +│ ├── complete-prod-missing-archive.txt +│ ├── complete-prod-incomplete-archive.txt +│ ├── incomplete-in-both.txt +│ ├── in-archive-not-prod.txt +│ └── summary.txt +├── logs/ +│ ├── parse-failures.txt # Phase 4 +│ └── purgatory-expired.txt # Phase 4 └── results/ - ├── no-action-required.txt # Phase 5 output - ├── action-required.txt # Phase 5 output - └── manual-investigation.txt # Phase 5 output + ├── no-action-required.txt # Phase 5 + ├── action-required.txt # Phase 5 + ├── manual-investigation.txt # Phase 5 + └── summary.txt # Phase 5 ``` -## Prerequisites +## Key Differences: ngit-relay vs ngit-grasp + +Understanding these differences helps explain why some repositories need attention: -- `nak` - Nostr Army Knife for fetching events -- `jq` - JSON processing -- SSH access to VPS for Phase 2 and 4 -- Logging improvements in ngit-grasp for Phase 4 (see Dependencies) +| Aspect | ngit-relay | ngit-grasp | +|--------|------------|------------| +| Git data validation | Accepts commits/tags referenced in state event | Requires all git data to reproduce state | +| PR refs cleanup | Doesn't clear `refs/nostr/` | Properly manages PR refs | +| Parse failures | Silently ignores | Logs structured `[PARSE_FAIL]` entries | +| Sync timeout | No timeout | Purgatory expires after configurable period | -## Dependencies +## Next Steps -Phase 4 requires structured logging in ngit-grasp. Create a separate issue to add: +After running the analysis: -```rust -// On parse failure: -tracing::warn!( - target: "migration", - "[PARSE_FAIL] kind={} event_id={} reason=\"{}\"", - event.kind, event.id, reason -); +1. **Review the summary** - Check `results/summary.txt` for the overview +2. **Address action items** - Work through `results/action-required.txt` +3. **Investigate edge cases** - Review `results/manual-investigation.txt` +4. **Re-run analysis** - After fixing issues, re-run to verify +5. **Plan cutover** - Schedule the switch when all issues are resolved -// On purgatory expiry: -tracing::warn!( - target: "migration", - "[PURGATORY_EXPIRED] repo={} npub={}", - identifier, npub -); +### When to Re-run + +Re-run the analysis when: +- Archive sync has had time to complete +- You've fixed parse failures or re-announced events +- You want to verify fixes before cutover + +```bash +# Re-run with existing Phase 2 data (faster) +./run-migration-analysis.sh ... --skip-phase-2 --output work/migration-analysis-20260122-1430 ``` -## Gotchas +## Individual Scripts + +For advanced usage, you can run individual phase scripts: + +```bash +# Phase 1: Fetch events +./migration-scripts/01-fetch-events.sh wss://relay.ngit.dev output/prod + +# Phase 2: Git sync check +./migration-scripts/10-check-git-sync.sh output/prod/raw/state-events.json /var/lib/ngit-relay/git output/prod --categorize + +# Phase 3a: Categorize +./migration-scripts/20-categorize.sh output/prod/git-sync-status.tsv output/prod + +# Phase 3b: Compare relays +./migration-scripts/21-compare-relays.sh output/prod output/archive output/comparison + +# Phase 4a: Extract parse failures +./migration-scripts/30-extract-parse-failures.sh ngit-grasp.service output/logs + +# Phase 4b: Extract purgatory expiry +./migration-scripts/31-extract-purgatory-expiry.sh ngit-grasp.service output/logs + +# Phase 5: Final classification +./migration-scripts/40-classify-actions.sh work/migration-analysis-20260122-1430 +``` -- Always use `nak req` with `--paginate` flag so we don't miss any events. If we receive increments of 250 (e.g., exactly 500) then it's a red flag that we are not paginating and there are probably more events. -- Phase 1 and 2 should run back-to-back for an accurate snapshot. -- The git sync check (Phase 2) takes ~20 minutes per relay - this is the slow part. -- Existing analysis data from Jan 22 can be used for developing Phase 3/5 logic before re-running Phase 2. +Each script has detailed help available with `--help` or by reading the script header. -- cgit v1.2.3