# How-To: Test Sync Against Production Data **Problem:** Debug and improve sync behavior using real-world data from production relays **Difficulty:** Intermediate **Time:** 30 minutes per iteration ## Overview This guide helps you run ngit-grasp's sync system against production relays to discover unexpected errors, inefficiencies, and edge cases that don't appear in controlled tests. **Why production testing matters:** - Real data has inconsistencies, malformed events, and edge cases - Production relays may behave differently (rate limiting, timeouts, partial NIP-77 support) - Volume and patterns reveal performance bottlenecks - Sync discovery leads to cascading subscriptions we can't predict in tests ## Prerequisites - ngit-grasp compiles successfully (`cargo build`) - Familiarity with [GRASP-02 Proactive Sync](../explanation/grasp-02-proactive-sync.md) - Understanding of log levels and tracing ## Test Setup ### 1. Choose a Test Identity Pick a domain with manageable sync volume. Smaller domains mean fewer repos to sync, making logs tractable. **Recommended starting point:** ```bash --domain ngit.danconwaydev.com ``` This domain has few repo announcements listing it, so sync stays manageable. ### 2. Choose a Bootstrap Relay The bootstrap relay provides the initial set of announcements to discover repos: ```bash --sync-bootstrap-relay-url wss://git.shakespeare.diy ``` ### 3. Run with Time Limit Start with short runs (30 seconds) to capture manageable log volumes: ```bash # Clear any existing data for clean state rm -rf /tmp/ngit-test-* # Run for 30 seconds with sanitized output timeout 30s cargo run -- \ --sync-bootstrap-relay-url wss://git.shakespeare.diy \ --domain ngit.danconwaydev.com \ --git-data-path /tmp/ngit-test-git \ --relay-data-path /tmp/ngit-test-relay \ 2>&1 | ./scripts/sanitize-logs.sh | tee sync-test.log ``` **Note:** The `timeout` command returns exit code 124, which is expected. ## Log Sanitization Raw logs include full events and hundreds of event IDs per line, making them unwieldy for analysis. The sanitizer truncates long lines: ```bash ./scripts/sanitize-logs.sh < raw.log > sanitized.log # Or pipe directly cargo run -- [args] 2>&1 | ./scripts/sanitize-logs.sh ``` **Options:** - `--head-chars N` - First N characters to show (default: 200) - `--tail-chars N` - Last N characters to show (default: 100) Example output: ``` 2024-01-09T10:00:00Z DEBUG sync: Processing events ids=[abc123, def456, ghi789, jkl012...<1847 chars>...xyz999, end123] ``` ## What to Look For ### Phase 1: Connection & Bootstrap (0-5 seconds) **Expected behavior:** - Connection to bootstrap relay succeeds - Layer 1 (announcement) subscription starts - First batch of 30617/30618 events received **Red flags:** - Connection timeout or failure - NIP-77 negentropy errors (should fall back gracefully) - Immediate rate limiting ### Phase 2: Discovery Cascade (5-15 seconds) **Expected behavior:** - Self-subscriber batches fire as announcements are processed - New relays discovered from announcement `relays` tags - Layer 2 (repo tags) subscriptions created **Red flags:** - Excessive relay discovery (>10 relays rapidly) - Filter consolidation warnings (>70 filters) - Missing self-subscriber batch logs ### Phase 3: Steady State (15+ seconds) **Expected behavior:** - Historic sync batches completing (EOSE received) - Periodic health checks running - Events being saved to database **Red flags:** - Pending batches never confirming - Repeated connection/disconnect cycles - Memory growth (check with `top` in another terminal) ## Debugging Checklist When analyzing logs, look for these patterns: ### Errors to Investigate | Pattern | Possible Cause | Action | |---------|----------------|--------| | `error` (any) | Unexpected failure | Investigate immediately | | `connection failed` | Network/relay issue | Check relay URL, try different relay | | `rate limit` | Too many requests | Check consolidation, increase backoff | | `negentropy` + `error` | NIP-77 incompatibility | Should fall back - verify it does | | `timeout` | Slow relay or large sync | Increase timeouts or reduce scope | ### Warnings to Monitor | Pattern | Meaning | Action | |---------|---------|--------| | `consolidating filters` | Filter count high | Expected, but frequent = problem | | `backing off` | Health tracker retry | Normal, but watch for excessive | | `batch failed` | Historic sync incomplete | Check which batches, why | ### Debug Patterns to Verify | Pattern | What it shows | |---------|---------------| | `fresh_start` | Full sync initiated | | `quick_reconnect` | Incremental sync (<15min gap) | | `historic sync complete` | Sync finished successfully | | `sync_live` | Live subscriptions active | | `PendingBatch` | Items awaiting EOSE confirmation | ## Iterative Improvement Process ### Step 1: Run and Capture ```bash timeout 30s cargo run -- [args] 2>&1 | ./scripts/sanitize-logs.sh > iteration-1.log ``` ### Step 2: Identify Issues Scan logs for errors and unexpected patterns: ```bash grep -i error iteration-1.log grep -i warn iteration-1.log grep -i panic iteration-1.log ``` ### Step 3: Document Findings Add findings to this file's [Known Issues](#known-issues) section or create GitHub issues. ### Step 4: Fix and Re-test After code changes, run again to verify the fix. ### Step 5: Extend Duration Once 30-second runs are clean, extend to 2 minutes, then 5 minutes: ```bash timeout 120s cargo run -- [args] 2>&1 | ./scripts/sanitize-logs.sh > iteration-2.log ``` ## Logging Improvements If the logs aren't helpful enough, improve them. Common needs: ### Add Context to Existing Logs ```rust // Before tracing::debug!("Processing events"); // After tracing::debug!( relay = %relay_url, event_count = events.len(), "Processing events" ); ``` ### Add New Log Points Key places that may need more logging: - `src/sync/mod.rs` - SyncManager state transitions - `src/sync/relay_connection.rs` - Connection lifecycle - `src/sync/self_subscriber.rs` - Batch processing ### Reduce Noise If a log line appears too frequently: ```rust // Change from debug! to trace! tracing::trace!("Per-event detail that's too noisy"); ``` ## Known Issues *Document issues discovered during testing here. Delete this section when empty.* ### Template for New Issues ```markdown ### Issue: [Short description] **Discovered:** [Date] **Status:** [Open/Fixed in PR#xxx] **Symptoms:** - Log pattern observed **Root cause:** - [If known] **Fix:** - [If known] ``` --- ## Quick Reference ### Minimal Test Command ```bash timeout 30s cargo run -- \ --sync-bootstrap-relay-url wss://git.shakespeare.diy \ --domain ngit.danconwaydev.com \ --git-data-path /tmp/ngit-test-git \ --relay-data-path /tmp/ngit-test-relay \ 2>&1 | ./scripts/sanitize-logs.sh ``` ### With Metrics Endpoint ```bash timeout 30s cargo run -- \ --sync-bootstrap-relay-url wss://git.shakespeare.diy \ --domain ngit.danconwaydev.com \ --git-data-path /tmp/ngit-test-git \ --relay-data-path /tmp/ngit-test-relay \ --metrics-address 127.0.0.1:9090 \ 2>&1 | ./scripts/sanitize-logs.sh ``` Then in another terminal: `curl http://127.0.0.1:9090/metrics` ### Different Log Level The default is DEBUG. For more detail: ```bash RUST_LOG=trace cargo run -- [args] ``` For less noise: ```bash RUST_LOG=info cargo run -- [args] ``` --- *Part of the [ngit-grasp documentation](../README.md) using the [Diátaxis](https://diataxis.fr/) framework.*