upleb.uk

Public git repos — served from a NIP-34 GRASP relay at git.upleb.uk

summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorDanConwayDev <DanConwayDev@protonmail.com>2026-01-09 21:41:45 +0000
committerDanConwayDev <DanConwayDev@protonmail.com>2026-01-09 21:41:45 +0000
commit6a8346441ea605248e09858d0e6b9096a1c87259 (patch)
treea24768a427f1d4b7e9adaff2b37f1385440f1c29 /docs
parent5fed2e2f32cfb15fff042a39f3ac82abe8948ca0 (diff)
docs: add production sync testing guide and log sanitizer script
Add infrastructure for iterative debugging of sync against production data: - scripts/sanitize-logs.sh: Truncates verbose log lines for LLM analysis - docs/how-to/production-sync-testing.md: Step-by-step guide for testing sync against real relays, identifying issues, and improving logging
Diffstat (limited to 'docs')
-rw-r--r--docs/how-to/README.md12
-rw-r--r--docs/how-to/production-sync-testing.md283
2 files changed, 295 insertions, 0 deletions
diff --git a/docs/how-to/README.md b/docs/how-to/README.md
index ed5f014..ba58c08 100644
--- a/docs/how-to/README.md
+++ b/docs/how-to/README.md
@@ -36,6 +36,18 @@ How-to guides are **recipes** that show you how to solve specific problems or ac
36 36
37--- 37---
38 38
39### [Test Sync Against Production Data](production-sync-testing.md)
40**Problem:** Debug and improve sync using real-world data
41**Difficulty:** Intermediate
42
43**You'll learn:**
44- Run sync against production relays
45- Sanitize logs for LLM analysis
46- Identify common issues and patterns
47- Iteratively improve sync behavior
48
49---
50
39## Planned How-To Guides 51## Planned How-To Guides
40 52
41### Deploy ngit-grasp 53### Deploy ngit-grasp
diff --git a/docs/how-to/production-sync-testing.md b/docs/how-to/production-sync-testing.md
new file mode 100644
index 0000000..dfff20d
--- /dev/null
+++ b/docs/how-to/production-sync-testing.md
@@ -0,0 +1,283 @@
1# How-To: Test Sync Against Production Data
2
3**Problem:** Debug and improve sync behavior using real-world data from production relays
4**Difficulty:** Intermediate
5**Time:** 30 minutes per iteration
6
7## Overview
8
9This guide helps you run ngit-grasp's sync system against production relays to discover unexpected errors, inefficiencies, and edge cases that don't appear in controlled tests.
10
11**Why production testing matters:**
12- Real data has inconsistencies, malformed events, and edge cases
13- Production relays may behave differently (rate limiting, timeouts, partial NIP-77 support)
14- Volume and patterns reveal performance bottlenecks
15- Sync discovery leads to cascading subscriptions we can't predict in tests
16
17## Prerequisites
18
19- ngit-grasp compiles successfully (`cargo build`)
20- Familiarity with [GRASP-02 Proactive Sync](../explanation/grasp-02-proactive-sync.md)
21- Understanding of log levels and tracing
22
23## Test Setup
24
25### 1. Choose a Test Identity
26
27Pick a domain with manageable sync volume. Smaller domains mean fewer repos to sync, making logs tractable.
28
29**Recommended starting point:**
30```bash
31--domain ngit.danconwaydev.com
32```
33
34This domain has few repo announcements listing it, so sync stays manageable.
35
36### 2. Choose a Bootstrap Relay
37
38The bootstrap relay provides the initial set of announcements to discover repos:
39
40```bash
41--sync-bootstrap-relay wss://git.shakespeare.diy
42```
43
44### 3. Run with Time Limit
45
46Start with short runs (30 seconds) to capture manageable log volumes:
47
48```bash
49# Clear any existing data for clean state
50rm -rf /tmp/ngit-test-*
51
52# Run for 30 seconds with sanitized output
53timeout 30s cargo run -- \
54 --sync-bootstrap-relay wss://git.shakespeare.diy \
55 --domain ngit.danconwaydev.com \
56 --git-path /tmp/ngit-test-git \
57 --relay-data-path /tmp/ngit-test-relay \
58 2>&1 | ./scripts/sanitize-logs.sh | tee sync-test.log
59```
60
61**Note:** The `timeout` command returns exit code 124, which is expected.
62
63## Log Sanitization
64
65Raw logs include full events and hundreds of event IDs per line, making them unwieldy for analysis. The sanitizer truncates long lines:
66
67```bash
68./scripts/sanitize-logs.sh < raw.log > sanitized.log
69
70# Or pipe directly
71cargo run -- [args] 2>&1 | ./scripts/sanitize-logs.sh
72```
73
74**Options:**
75- `--head-chars N` - First N characters to show (default: 100)
76- `--tail-chars N` - Last N characters to show (default: 20)
77
78Example output:
79```
802024-01-09T10:00:00Z DEBUG sync: Processing events ids=[abc123, def456, ghi789, jkl012...<1847 chars>...xyz999, end123]
81```
82
83## What to Look For
84
85### Phase 1: Connection & Bootstrap (0-5 seconds)
86
87**Expected behavior:**
88- Connection to bootstrap relay succeeds
89- Layer 1 (announcement) subscription starts
90- First batch of 30617/30618 events received
91
92**Red flags:**
93- Connection timeout or failure
94- NIP-77 negentropy errors (should fall back gracefully)
95- Immediate rate limiting
96
97### Phase 2: Discovery Cascade (5-15 seconds)
98
99**Expected behavior:**
100- Self-subscriber batches fire as announcements are processed
101- New relays discovered from announcement `relays` tags
102- Layer 2 (repo tags) subscriptions created
103
104**Red flags:**
105- Excessive relay discovery (>10 relays rapidly)
106- Filter consolidation warnings (>70 filters)
107- Missing self-subscriber batch logs
108
109### Phase 3: Steady State (15+ seconds)
110
111**Expected behavior:**
112- Historic sync batches completing (EOSE received)
113- Periodic health checks running
114- Events being saved to database
115
116**Red flags:**
117- Pending batches never confirming
118- Repeated connection/disconnect cycles
119- Memory growth (check with `top` in another terminal)
120
121## Debugging Checklist
122
123When analyzing logs, look for these patterns:
124
125### Errors to Investigate
126
127| Pattern | Possible Cause | Action |
128|---------|----------------|--------|
129| `error` (any) | Unexpected failure | Investigate immediately |
130| `connection failed` | Network/relay issue | Check relay URL, try different relay |
131| `rate limit` | Too many requests | Check consolidation, increase backoff |
132| `negentropy` + `error` | NIP-77 incompatibility | Should fall back - verify it does |
133| `timeout` | Slow relay or large sync | Increase timeouts or reduce scope |
134
135### Warnings to Monitor
136
137| Pattern | Meaning | Action |
138|---------|---------|--------|
139| `consolidating filters` | Filter count high | Expected, but frequent = problem |
140| `backing off` | Health tracker retry | Normal, but watch for excessive |
141| `batch failed` | Historic sync incomplete | Check which batches, why |
142
143### Debug Patterns to Verify
144
145| Pattern | What it shows |
146|---------|---------------|
147| `fresh_start` | Full sync initiated |
148| `quick_reconnect` | Incremental sync (<15min gap) |
149| `historic sync complete` | Sync finished successfully |
150| `sync_live` | Live subscriptions active |
151| `PendingBatch` | Items awaiting EOSE confirmation |
152
153## Iterative Improvement Process
154
155### Step 1: Run and Capture
156
157```bash
158timeout 30s cargo run -- [args] 2>&1 | ./scripts/sanitize-logs.sh > iteration-1.log
159```
160
161### Step 2: Identify Issues
162
163Scan logs for errors and unexpected patterns:
164```bash
165grep -i error iteration-1.log
166grep -i warn iteration-1.log
167grep -i panic iteration-1.log
168```
169
170### Step 3: Document Findings
171
172Add findings to this file's [Known Issues](#known-issues) section or create GitHub issues.
173
174### Step 4: Fix and Re-test
175
176After code changes, run again to verify the fix.
177
178### Step 5: Extend Duration
179
180Once 30-second runs are clean, extend to 2 minutes, then 5 minutes:
181```bash
182timeout 120s cargo run -- [args] 2>&1 | ./scripts/sanitize-logs.sh > iteration-2.log
183```
184
185## Logging Improvements
186
187If the logs aren't helpful enough, improve them. Common needs:
188
189### Add Context to Existing Logs
190
191```rust
192// Before
193tracing::debug!("Processing events");
194
195// After
196tracing::debug!(
197 relay = %relay_url,
198 event_count = events.len(),
199 "Processing events"
200);
201```
202
203### Add New Log Points
204
205Key places that may need more logging:
206- `src/sync/mod.rs` - SyncManager state transitions
207- `src/sync/relay_connection.rs` - Connection lifecycle
208- `src/sync/self_subscriber.rs` - Batch processing
209
210### Reduce Noise
211
212If a log line appears too frequently:
213```rust
214// Change from debug! to trace!
215tracing::trace!("Per-event detail that's too noisy");
216```
217
218## Known Issues
219
220*Document issues discovered during testing here. Delete this section when empty.*
221
222### Template for New Issues
223
224```markdown
225### Issue: [Short description]
226
227**Discovered:** [Date]
228**Status:** [Open/Fixed in PR#xxx]
229
230**Symptoms:**
231- Log pattern observed
232
233**Root cause:**
234- [If known]
235
236**Fix:**
237- [If known]
238```
239
240---
241
242## Quick Reference
243
244### Minimal Test Command
245
246```bash
247timeout 30s cargo run -- \
248 --sync-bootstrap-relay wss://git.shakespeare.diy \
249 --domain ngit.danconwaydev.com \
250 --git-path /tmp/ngit-test-git \
251 --relay-data-path /tmp/ngit-test-relay \
252 2>&1 | ./scripts/sanitize-logs.sh
253```
254
255### With Metrics Endpoint
256
257```bash
258timeout 30s cargo run -- \
259 --sync-bootstrap-relay wss://git.shakespeare.diy \
260 --domain ngit.danconwaydev.com \
261 --git-path /tmp/ngit-test-git \
262 --relay-data-path /tmp/ngit-test-relay \
263 --metrics-address 127.0.0.1:9090 \
264 2>&1 | ./scripts/sanitize-logs.sh
265```
266
267Then in another terminal: `curl http://127.0.0.1:9090/metrics`
268
269### Different Log Level
270
271The default is DEBUG. For more detail:
272```bash
273RUST_LOG=trace cargo run -- [args]
274```
275
276For less noise:
277```bash
278RUST_LOG=info cargo run -- [args]
279```
280
281---
282
283*Part of the [ngit-grasp documentation](../README.md) using the [Diátaxis](https://diataxis.fr/) framework.*