1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
|
# How-To: Test Sync Against Production Data
> **Quick Start Prompt:** Run a 30-second production sync test following docs/how-to/production-sync-testing.md. Use the minimal test command with sanitized output. Analyze the log for errors, warnings, and unexpected patterns. Document findings as individual markdown files in work/active-issues/ and suggest code fixes or logging improvements.
**Problem:** Debug and improve sync behavior using real-world data from production relays
**Difficulty:** Intermediate
**Time:** 30 minutes per iteration
## Overview
This guide helps you run ngit-grasp's sync system against production relays to discover unexpected errors, inefficiencies, and edge cases that don't appear in controlled tests.
**Why production testing matters:**
- Real data has inconsistencies, malformed events, and edge cases
- Production relays may behave differently (rate limiting, timeouts, partial NIP-77 support)
- Volume and patterns reveal performance bottlenecks
- Sync discovery leads to cascading subscriptions we can't predict in tests
## Prerequisites
- ngit-grasp compiles successfully (`cargo build`)
- Familiarity with [GRASP-02 Proactive Sync](../explanation/grasp-02-proactive-sync.md)
- Understanding of log levels and tracing
## Test Setup
### 1. Choose a Test Identity
Pick a domain with manageable sync volume. Smaller domains mean fewer repos to sync, making logs tractable.
**Recommended starting point:**
```bash
--domain ngit.danconwaydev.com
```
This domain has few repo announcements listing it, so sync stays manageable.
### 2. Choose a Bootstrap Relay
The bootstrap relay provides the initial set of announcements to discover repos:
```bash
--sync-bootstrap-relay-url wss://git.shakespeare.diy
```
### 3. Run with Time Limit
Start with short runs (30 seconds) to capture manageable log volumes:
```bash
# Clear any existing data for clean state
rm -rf /tmp/ngit-test-*
# Run for 30 seconds with sanitized output
timeout 30s cargo run -- \
--sync-bootstrap-relay-url wss://git.shakespeare.diy \
--domain ngit.danconwaydev.com \
--git-data-path /tmp/ngit-test-git \
--relay-data-path /tmp/ngit-test-relay \
2>&1 | ./scripts/sanitize-logs.sh | tee sync-test.log
```
**Note:** The `timeout` command returns exit code 124, which is expected.
## Log Sanitization
Raw logs include full events and hundreds of event IDs per line, making them unwieldy for analysis. The sanitizer truncates long lines:
```bash
./scripts/sanitize-logs.sh < raw.log > sanitized.log
# Or pipe directly
cargo run -- [args] 2>&1 | ./scripts/sanitize-logs.sh
```
**Options:**
- `--head-chars N` - First N characters to show (default: 200)
- `--tail-chars N` - Last N characters to show (default: 100)
Example output:
```
2024-01-09T10:00:00Z DEBUG sync: Processing events ids=[abc123, def456, ghi789, jkl012...<1847 chars>...xyz999, end123]
```
## What to Look For
### Phase 1: Connection & Bootstrap (0-5 seconds)
**Expected behavior:**
- Connection to bootstrap relay succeeds
- Layer 1 (announcement) subscription starts
- First batch of 30617/30618 events received
**Red flags:**
- Connection timeout or failure
- NIP-77 negentropy errors (should fall back gracefully)
- Immediate rate limiting
### Phase 2: Discovery Cascade (5-15 seconds)
**Expected behavior:**
- Self-subscriber batches fire as announcements are processed
- New relays discovered from announcement `relays` tags
- Layer 2 (repo tags) subscriptions created
**Red flags:**
- Excessive relay discovery (>10 relays rapidly)
- Filter consolidation warnings (>70 filters)
- Missing self-subscriber batch logs
### Phase 3: Steady State (15+ seconds)
**Expected behavior:**
- Historic sync batches completing (EOSE received)
- Periodic health checks running
- Events being saved to database
**Red flags:**
- Pending batches never confirming
- Repeated connection/disconnect cycles
- Memory growth (check with `top` in another terminal)
## Debugging Checklist
When analyzing logs, look for these patterns:
### Errors to Investigate
| Pattern | Possible Cause | Action |
|---------|----------------|--------|
| `error` (any) | Unexpected failure | Investigate immediately |
| `connection failed` | Network/relay issue | Check relay URL, try different relay |
| `rate limit` | Too many requests | Check consolidation, increase backoff |
| `negentropy` + `error` | NIP-77 incompatibility | Should fall back - verify it does |
| `timeout` | Slow relay or large sync | Increase timeouts or reduce scope |
### Warnings to Monitor
| Pattern | Meaning | Action |
|---------|---------|--------|
| `consolidating filters` | Filter count high | Expected, but frequent = problem |
| `backing off` | Health tracker retry | Normal, but watch for excessive |
| `batch failed` | Historic sync incomplete | Check which batches, why |
### Debug Patterns to Verify
| Pattern | What it shows |
|---------|---------------|
| `fresh_start` | Full sync initiated |
| `quick_reconnect` | Incremental sync (<15min gap) |
| `historic sync complete` | Sync finished successfully |
| `sync_live` | Live subscriptions active |
| `PendingBatch` | Items awaiting EOSE confirmation |
## Iterative Improvement Process
### Step 1: Run and Capture
```bash
timeout 30s cargo run -- [args] 2>&1 | ./scripts/sanitize-logs.sh > iteration-1.log
```
### Step 2: Identify Issues
Scan logs for errors and unexpected patterns:
```bash
grep -i error iteration-1.log
grep -i warn iteration-1.log
grep -i panic iteration-1.log
```
### Step 3: Document Findings
Create individual markdown files in `work/active-issues/` for each issue discovered:
```bash
# Example: Document a connection timeout issue
cat > work/active-issues/connection-timeout-bootstrap.md <<'EOF'
# Issue: Connection Timeout on Bootstrap Relay
**Discovered:** 2026-01-09
**Status:** Open
## Symptoms
- Connection to wss://git.shakespeare.diy fails after 10s timeout
- Log shows: `error: connection failed: timeout`
- Occurs 100% of time with this relay
## Root Cause
[To be determined]
## Proposed Fix
- Increase connection timeout from 10s to 30s for initial bootstrap
- Add retry logic with exponential backoff
- Consider fallback bootstrap relays
## Code Location
- `src/sync/relay_connection.rs:45` - connection timeout constant
EOF
```
**Why individual files?**
- Keeps the how-to guide clean and focused
- Prevents accidental commits of transient issues to tracked files
- Easy to delete resolved issues or archive important ones
- Each file can be worked on independently
### Step 4: Fix and Re-test
After code changes, run again to verify the fix.
### Step 5: Extend Duration
Once 30-second runs are clean, extend to 2 minutes, then 5 minutes:
```bash
timeout 120s cargo run -- [args] 2>&1 | ./scripts/sanitize-logs.sh > iteration-2.log
```
## Logging Improvements
If the logs aren't helpful enough, improve them. Common needs:
### Add Context to Existing Logs
```rust
// Before
tracing::debug!("Processing events");
// After
tracing::debug!(
relay = %relay_url,
event_count = events.len(),
"Processing events"
);
```
### Add New Log Points
Key places that may need more logging:
- `src/sync/mod.rs` - SyncManager state transitions
- `src/sync/relay_connection.rs` - Connection lifecycle
- `src/sync/self_subscriber.rs` - Batch processing
### Reduce Noise
If a log line appears too frequently:
```rust
// Change from debug! to trace!
tracing::trace!("Per-event detail that's too noisy");
```
## Active Issues
Issues discovered during production sync testing are tracked in `work/active-issues/` as individual markdown files.
**View current issues:**
```bash
ls work/active-issues/
```
**Create a new issue:**
```bash
# Use kebab-case filename describing the issue
cat > work/active-issues/[issue-name].md <<'EOF'
# Issue: [Short Description]
**Discovered:** [Date]
**Status:** Open
## Symptoms
- Log patterns observed
- Reproduction steps if known
## Root Cause
[To be determined / Known cause]
## Proposed Fix
- Suggested code changes
- Alternative approaches
## Code Location
- File paths and line numbers where changes are needed
EOF
```
**Resolve an issue:**
```bash
# After fixing, either delete or move to archive
rm work/active-issues/resolved-issue.md
# OR
mv work/active-issues/important-issue.md docs/archive/2026-01-09-important-issue.md
```
---
## Quick Reference
### Minimal Test Command
```bash
timeout 30s cargo run -- \
--sync-bootstrap-relay-url wss://git.shakespeare.diy \
--domain ngit.danconwaydev.com \
--git-data-path /tmp/ngit-test-git \
--relay-data-path /tmp/ngit-test-relay \
2>&1 | ./scripts/sanitize-logs.sh
```
### With Metrics Endpoint
```bash
timeout 30s cargo run -- \
--sync-bootstrap-relay-url wss://git.shakespeare.diy \
--domain ngit.danconwaydev.com \
--git-data-path /tmp/ngit-test-git \
--relay-data-path /tmp/ngit-test-relay \
--metrics-address 127.0.0.1:9090 \
2>&1 | ./scripts/sanitize-logs.sh
```
Then in another terminal: `curl http://127.0.0.1:9090/metrics`
### Different Log Level
The default is DEBUG. For more detail:
```bash
RUST_LOG=trace cargo run -- [args]
```
For less noise:
```bash
RUST_LOG=info cargo run -- [args]
```
---
*Part of the [ngit-grasp documentation](../README.md) using the [Diátaxis](https://diataxis.fr/) framework.*
|