upleb.uk

Public git repos — served from a NIP-34 GRASP relay at git.upleb.uk

summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDanConwayDev <DanConwayDev@protonmail.com>2026-01-09 22:23:18 +0000
committerDanConwayDev <DanConwayDev@protonmail.com>2026-01-09 22:30:41 +0000
commitf1579d1c099869de67b1741b7775cbf651b34ef0 (patch)
tree36866d1d2ee998a6313b292bc27e14ca45f491c5
parentab43f7264a05f694f4a7b1f4a09f8add972381ab (diff)
docs: update production sync testing workflow to two-mode process
- Mode 1: Fix one existing issue, test, commit, report - Mode 2: Discover new issues with minimal documentation - Emphasize stopping after each cycle - Remove detailed investigation requirements - Simplify issue documentation format
-rw-r--r--.gitignore3
-rw-r--r--docs/how-to/production-sync-testing.md290
2 files changed, 213 insertions, 80 deletions
diff --git a/.gitignore b/.gitignore
index 33879b8..294e4b4 100644
--- a/.gitignore
+++ b/.gitignore
@@ -9,6 +9,9 @@ grasp-audit/target
9work/* 9work/*
10!work/README.md 10!work/README.md
11 11
12# Test runs (production sync testing data and logs)
13tmp/
14
12# Environment and data 15# Environment and data
13.env 16.env
14data/ 17data/
diff --git a/docs/how-to/production-sync-testing.md b/docs/how-to/production-sync-testing.md
index b0f93b0..d5c11ea 100644
--- a/docs/how-to/production-sync-testing.md
+++ b/docs/how-to/production-sync-testing.md
@@ -1,11 +1,38 @@
1# How-To: Test Sync Against Production Data 1# How-To: Test Sync Against Production Data
2 2
3> **Quick Start Prompt:** Run a 30-second production sync test following docs/how-to/production-sync-testing.md. Use the minimal test command with sanitized output. Analyze the log for errors, warnings, and unexpected patterns. Document findings as individual markdown files in work/active-issues/ and suggest code fixes or logging improvements. 3> **Quick Start Prompt:** Check work/active-issues/ for existing issues. If issues exist, pick the most important, fix it, test with cargo test, run clippy and fmt, commit, and report back with a brief 1-2 sentence summary of each issue you identified. If no issues exist, run a 30-second production sync test, analyze logs, create individual issue files in work/active-issues/ (one per issue with minimal description), then report summary listing each issue in 1-2 sentences.
4 4
5**Problem:** Debug and improve sync behavior using real-world data from production relays 5**Problem:** Debug and improve sync behavior using real-world data from production relays
6**Difficulty:** Intermediate 6**Difficulty:** Intermediate
7**Time:** 30 minutes per iteration 7**Time:** 30 minutes per iteration
8 8
9## Two-Mode Workflow
10
11This guide operates in two modes:
12
13### Mode 1: Fix Existing Issues
14**When:** There are files in `work/active-issues/` (excluding README.md)
15
161. Check for active issues: `ls work/active-issues/`
172. Pick the most important issue to fix
183. Implement the fix
194. Run `cargo test` to verify tests pass
205. Run `cargo clippy` to check for warnings
216. Run `cargo fmt` to format code
227. Commit changes with descriptive message
238. Report back - **DO NOT** do another issue or run more tests
24
25### Mode 2: Discover New Issues
26**When:** No active issues in `work/active-issues/`
27
281. Run 30-second production sync test (logs saved to `tmp/run-{timestamp}/`)
292. Analyze logs for errors, warnings, unexpected patterns
303. Document each issue as a separate markdown file in `work/active-issues/`
314. Keep issue files minimal - just enough to identify the issue
325. Report brief summary listing each issue in 1-2 sentences
336. **DO NOT** create separate detailed analysis files
347. **DO NOT** do thorough investigation or root cause analysis
35
9## Overview 36## Overview
10 37
11This guide helps you run ngit-grasp's sync system against production relays to discover unexpected errors, inefficiencies, and edge cases that don't appear in controlled tests. 38This guide helps you run ngit-grasp's sync system against production relays to discover unexpected errors, inefficiencies, and edge cases that don't appear in controlled tests.
@@ -45,23 +72,33 @@ The bootstrap relay provides the initial set of announcements to discover repos:
45 72
46### 3. Run with Time Limit 73### 3. Run with Time Limit
47 74
48Start with short runs (30 seconds) to capture manageable log volumes: 75Start with short runs (30 seconds) to capture manageable log volumes. Each run creates its own subdirectory in `tmp/` to keep data and logs isolated:
49 76
50```bash 77```bash
51# Clear any existing data for clean state 78# Create run directory with timestamp
52rm -rf /tmp/ngit-test-* 79RUN_DIR="tmp/run-$(date +%Y%m%d-%H%M%S)"
80mkdir -p "$RUN_DIR"
53 81
54# Run for 30 seconds with sanitized output 82# Run for 30 seconds with sanitized output
55timeout 30s cargo run -- \ 83timeout 30s cargo run -- \
56 --sync-bootstrap-relay-url wss://git.shakespeare.diy \ 84 --sync-bootstrap-relay-url wss://git.shakespeare.diy \
57 --domain ngit.danconwaydev.com \ 85 --domain ngit.danconwaydev.com \
58 --git-data-path /tmp/ngit-test-git \ 86 --git-data-path "$RUN_DIR/git-data" \
59 --relay-data-path /tmp/ngit-test-relay \ 87 --relay-data-path "$RUN_DIR/relay-data" \
60 2>&1 | ./scripts/sanitize-logs.sh | tee sync-test.log 88 2>&1 | ./scripts/sanitize-logs.sh | tee "$RUN_DIR/sync.log"
61``` 89```
62 90
63**Note:** The `timeout` command returns exit code 124, which is expected. 91**Note:** The `timeout` command returns exit code 124, which is expected.
64 92
93**Directory structure after run:**
94```
95tmp/
96└── run-20260109-143022/
97 ├── git-data/ # Git repository data
98 ├── relay-data/ # Relay database
99 └── sync.log # Sanitized log output
100```
101
65## Log Sanitization 102## Log Sanitization
66 103
67Raw logs include full events and hundreds of event IDs per line, making them unwieldy for analysis. The sanitizer truncates long lines: 104Raw logs include full events and hundreds of event IDs per line, making them unwieldy for analysis. The sanitizer truncates long lines:
@@ -152,73 +189,132 @@ When analyzing logs, look for these patterns:
152| `sync_live` | Live subscriptions active | 189| `sync_live` | Live subscriptions active |
153| `PendingBatch` | Items awaiting EOSE confirmation | 190| `PendingBatch` | Items awaiting EOSE confirmation |
154 191
155## Iterative Improvement Process 192## Mode 1: Fix Existing Issues (Detailed)
193
194When `work/active-issues/` contains issue files:
195
196### Step 1: Check for Active Issues
197
198```bash
199ls work/active-issues/
200```
201
202If any `.md` files exist (excluding README.md), you're in Mode 1.
203
204### Step 2: Pick Most Important Issue
205
206Review issue files and select based on:
207- Severity (errors > warnings > log quality)
208- Impact (functionality > performance > UX)
209- Complexity (quick fixes first to clear backlog)
210
211### Step 3: Implement the Fix
212
213Make the necessary code changes based on the issue description.
156 214
157### Step 1: Run and Capture 215### Step 4: Test, Lint, Format
158 216
159```bash 217```bash
160timeout 30s cargo run -- [args] 2>&1 | ./scripts/sanitize-logs.sh > iteration-1.log 218# Run tests
219cargo test
220
221# Check for warnings
222cargo clippy
223
224# Format code
225cargo fmt
161``` 226```
162 227
163### Step 2: Identify Issues 228### Step 5: Commit
164 229
165Scan logs for errors and unexpected patterns:
166```bash 230```bash
167grep -i error iteration-1.log 231git add .
168grep -i warn iteration-1.log 232git commit -m "fix: [brief description of what was fixed]"
169grep -i panic iteration-1.log
170``` 233```
171 234
172### Step 3: Document Findings 235### Step 6: Report Back
236
237**STOP HERE.** Report what was fixed. Do NOT:
238- Fix another issue
239- Run production sync test
240- Do additional investigation
241
242The workflow will cycle back through Mode 1 if more issues remain.
243
244## Mode 2: Discover New Issues (Detailed)
245
246When `work/active-issues/` is empty (or only contains README.md):
173 247
174Create individual markdown files in `work/active-issues/` for each issue discovered: 248### Step 1: Run Production Sync Test
175 249
176```bash 250```bash
177# Example: Document a connection timeout issue 251# Create run directory with timestamp
178cat > work/active-issues/connection-timeout-bootstrap.md <<'EOF' 252RUN_DIR="tmp/run-$(date +%Y%m%d-%H%M%S)"
179# Issue: Connection Timeout on Bootstrap Relay 253mkdir -p "$RUN_DIR"
180 254
181**Discovered:** 2026-01-09 255# Run 30-second test
182**Status:** Open 256timeout 30s cargo run -- \
257 --sync-bootstrap-relay-url wss://git.shakespeare.diy \
258 --domain ngit.danconwaydev.com \
259 --git-data-path "$RUN_DIR/git-data" \
260 --relay-data-path "$RUN_DIR/relay-data" \
261 2>&1 | ./scripts/sanitize-logs.sh | tee "$RUN_DIR/sync.log"
262```
183 263
184## Symptoms 264Each run is isolated in its own timestamped directory under `tmp/`, keeping data and logs organized.
185 265
186- Connection to wss://git.shakespeare.diy fails after 10s timeout 266### Step 2: Analyze Logs
187- Log shows: `error: connection failed: timeout`
188- Occurs 100% of time with this relay
189 267
190## Root Cause 268Scan for errors and unexpected patterns:
269```bash
270# Find the most recent run
271LATEST_RUN=$(ls -1t tmp/run-*/sync.log | head -n1)
191 272
192[To be determined] 273# Analyze for issues
274grep -i error "$LATEST_RUN"
275grep -i warn "$LATEST_RUN"
276grep -i panic "$LATEST_RUN"
277```
193 278
194## Proposed Fix 279### Step 3: Document Issues
195 280
196- Increase connection timeout from 10s to 30s for initial bootstrap 281Create **one markdown file per issue** in `work/active-issues/`:
197- Add retry logic with exponential backoff
198- Consider fallback bootstrap relays
199 282
200## Code Location 283```bash
284# Example: Minimal issue documentation
285cat > work/active-issues/bootstrap-disconnect.md <<'EOF'
286# Bootstrap relay disconnects when empty
201 287
202- `src/sync/relay_connection.rs:45` - connection timeout constant 288Bootstrap relay wss://git.shakespeare.diy disconnects after sync finds 0 events. Should persist since user-specified.
289
290Log: "Disconnecting empty relay relay=wss://git.shakespeare.diy"
291File: src/sync/mod.rs (check_disconnects function)
203EOF 292EOF
204``` 293```
205 294
206**Why individual files?** 295**Keep each file brief:**
207- Keeps the how-to guide clean and focused 296- Descriptive title (one line)
208- Prevents accidental commits of transient issues to tracked files 297- What happens (1-2 sentences max)
209- Easy to delete resolved issues or archive important ones 298- Relevant log excerpt (one line)
210- Each file can be worked on independently 299- File/function location if obvious (one line)
300- **NO** separate detailed analysis files
301- **NO** root cause analysis
302- **NO** proposed solutions (unless immediately obvious)
211 303
212### Step 4: Fix and Re-test 304### Step 4: Report Summary
213 305
214After code changes, run again to verify the fix. 306Provide a brief closing message with 1-2 sentence summary of **each issue** identified:
307- State what the issue is
308- Where it occurs (file/component)
309- Keep it concise
215 310
216### Step 5: Extend Duration 311**STOP HERE.** Do NOT:
312- Fix the issues immediately
313- Create separate detailed analysis markdown files
314- Do thorough investigations
315- Write lengthy explanations
217 316
218Once 30-second runs are clean, extend to 2 minutes, then 5 minutes: 317The workflow will cycle back through Mode 1 to fix issues one at a time.
219```bash
220timeout 120s cargo run -- [args] 2>&1 | ./scripts/sanitize-logs.sh > iteration-2.log
221```
222 318
223## Logging Improvements 319## Logging Improvements
224 320
@@ -253,52 +349,66 @@ If a log line appears too frequently:
253tracing::trace!("Per-event detail that's too noisy"); 349tracing::trace!("Per-event detail that's too noisy");
254``` 350```
255 351
256## Active Issues 352## Managing Active Issues
257 353
258Issues discovered during production sync testing are tracked in `work/active-issues/` as individual markdown files. 354Issues are tracked in `work/active-issues/` as individual markdown files.
259 355
260**View current issues:** 356**Check for active issues:**
261```bash 357```bash
262ls work/active-issues/ 358ls work/active-issues/
263``` 359```
264 360
265**Create a new issue:** 361**After fixing an issue:**
266```bash 362```bash
267# Use kebab-case filename describing the issue 363# Delete the resolved issue file
268cat > work/active-issues/[issue-name].md <<'EOF' 364rm work/active-issues/issue-name.md
269# Issue: [Short Description]
270 365
271**Discovered:** [Date] 366# Or archive if important for future reference
272**Status:** Open 367mv work/active-issues/issue-name.md docs/archive/2026-01-09-issue-name.md
273 368```
274## Symptoms
275 369
276- Log patterns observed 370**Issue file format (minimal):**
277- Reproduction steps if known 371```markdown
372# Brief title
278 373
279## Root Cause 374What happens (1-2 sentences).
280 375
281[To be determined / Known cause] 376Log evidence: "relevant log line"
377File: src/path/to/file.rs (function_name if known)
378```
282 379
283## Proposed Fix 380Keep documentation minimal - just enough to identify and locate the issue.
284 381
285- Suggested code changes 382---
286- Alternative approaches
287 383
288## Code Location 384## Workflow Summary
289 385
290- File paths and line numbers where changes are needed
291EOF
292``` 386```
293 387Check work/active-issues/
294**Resolve an issue:** 388
295```bash 389 ├─ Has issues? ──► Mode 1: Pick one issue
296# After fixing, either delete or move to archive 390 │ │
297rm work/active-issues/resolved-issue.md 391 │ ├─ Fix code
298# OR 392 │ ├─ cargo test
299mv work/active-issues/important-issue.md docs/archive/2026-01-09-important-issue.md 393 │ ├─ cargo clippy
394 │ ├─ cargo fmt
395 │ ├─ git commit
396 │ └─ Report & STOP
397
398 └─ No issues? ──► Mode 2: Run production sync
399
400 ├─ timeout 30s cargo run ...
401 ├─ Analyze logs
402 ├─ Document issues (minimal)
403 └─ Report summary & STOP
300``` 404```
301 405
406**Key Rules:**
407- Only do ONE thing per cycle (fix one issue OR discover issues)
408- Always stop after reporting
409- Keep issue documentation minimal
410- No root cause analysis during discovery
411
302--- 412---
303 413
304## Quick Reference 414## Quick Reference
@@ -306,28 +416,48 @@ mv work/active-issues/important-issue.md docs/archive/2026-01-09-important-issue
306### Minimal Test Command 416### Minimal Test Command
307 417
308```bash 418```bash
419# Create run directory
420RUN_DIR="tmp/run-$(date +%Y%m%d-%H%M%S)"
421mkdir -p "$RUN_DIR"
422
423# Run test
309timeout 30s cargo run -- \ 424timeout 30s cargo run -- \
310 --sync-bootstrap-relay-url wss://git.shakespeare.diy \ 425 --sync-bootstrap-relay-url wss://git.shakespeare.diy \
311 --domain ngit.danconwaydev.com \ 426 --domain ngit.danconwaydev.com \
312 --git-data-path /tmp/ngit-test-git \ 427 --git-data-path "$RUN_DIR/git-data" \
313 --relay-data-path /tmp/ngit-test-relay \ 428 --relay-data-path "$RUN_DIR/relay-data" \
314 2>&1 | ./scripts/sanitize-logs.sh 429 2>&1 | ./scripts/sanitize-logs.sh | tee "$RUN_DIR/sync.log"
315``` 430```
316 431
317### With Metrics Endpoint 432### With Metrics Endpoint
318 433
319```bash 434```bash
435# Create run directory
436RUN_DIR="tmp/run-$(date +%Y%m%d-%H%M%S)"
437mkdir -p "$RUN_DIR"
438
439# Run with metrics
320timeout 30s cargo run -- \ 440timeout 30s cargo run -- \
321 --sync-bootstrap-relay-url wss://git.shakespeare.diy \ 441 --sync-bootstrap-relay-url wss://git.shakespeare.diy \
322 --domain ngit.danconwaydev.com \ 442 --domain ngit.danconwaydev.com \
323 --git-data-path /tmp/ngit-test-git \ 443 --git-data-path "$RUN_DIR/git-data" \
324 --relay-data-path /tmp/ngit-test-relay \ 444 --relay-data-path "$RUN_DIR/relay-data" \
325 --metrics-address 127.0.0.1:9090 \ 445 --metrics-address 127.0.0.1:9090 \
326 2>&1 | ./scripts/sanitize-logs.sh 446 2>&1 | ./scripts/sanitize-logs.sh | tee "$RUN_DIR/sync.log"
327``` 447```
328 448
329Then in another terminal: `curl http://127.0.0.1:9090/metrics` 449Then in another terminal: `curl http://127.0.0.1:9090/metrics`
330 450
451### Cleanup Old Runs
452
453```bash
454# Remove runs older than 7 days
455find tmp/run-* -type d -mtime +7 -exec rm -rf {} +
456
457# Remove all test runs
458rm -rf tmp/run-*
459```
460
331### Different Log Level 461### Different Log Level
332 462
333The default is DEBUG. For more detail: 463The default is DEBUG. For more detail: