diff options
| author | DanConwayDev <DanConwayDev@protonmail.com> | 2026-01-09 22:23:18 +0000 |
|---|---|---|
| committer | DanConwayDev <DanConwayDev@protonmail.com> | 2026-01-09 22:30:41 +0000 |
| commit | f1579d1c099869de67b1741b7775cbf651b34ef0 (patch) | |
| tree | 36866d1d2ee998a6313b292bc27e14ca45f491c5 | |
| parent | ab43f7264a05f694f4a7b1f4a09f8add972381ab (diff) | |
docs: update production sync testing workflow to two-mode process
- Mode 1: Fix one existing issue, test, commit, report
- Mode 2: Discover new issues with minimal documentation
- Emphasize stopping after each cycle
- Remove detailed investigation requirements
- Simplify issue documentation format
| -rw-r--r-- | .gitignore | 3 | ||||
| -rw-r--r-- | docs/how-to/production-sync-testing.md | 290 |
2 files changed, 213 insertions, 80 deletions
| @@ -9,6 +9,9 @@ grasp-audit/target | |||
| 9 | work/* | 9 | work/* |
| 10 | !work/README.md | 10 | !work/README.md |
| 11 | 11 | ||
| 12 | # Test runs (production sync testing data and logs) | ||
| 13 | tmp/ | ||
| 14 | |||
| 12 | # Environment and data | 15 | # Environment and data |
| 13 | .env | 16 | .env |
| 14 | data/ | 17 | data/ |
diff --git a/docs/how-to/production-sync-testing.md b/docs/how-to/production-sync-testing.md index b0f93b0..d5c11ea 100644 --- a/docs/how-to/production-sync-testing.md +++ b/docs/how-to/production-sync-testing.md | |||
| @@ -1,11 +1,38 @@ | |||
| 1 | # How-To: Test Sync Against Production Data | 1 | # How-To: Test Sync Against Production Data |
| 2 | 2 | ||
| 3 | > **Quick Start Prompt:** Run a 30-second production sync test following docs/how-to/production-sync-testing.md. Use the minimal test command with sanitized output. Analyze the log for errors, warnings, and unexpected patterns. Document findings as individual markdown files in work/active-issues/ and suggest code fixes or logging improvements. | 3 | > **Quick Start Prompt:** Check work/active-issues/ for existing issues. If issues exist, pick the most important, fix it, test with cargo test, run clippy and fmt, commit, and report back with a brief 1-2 sentence summary of each issue you identified. If no issues exist, run a 30-second production sync test, analyze logs, create individual issue files in work/active-issues/ (one per issue with minimal description), then report summary listing each issue in 1-2 sentences. |
| 4 | 4 | ||
| 5 | **Problem:** Debug and improve sync behavior using real-world data from production relays | 5 | **Problem:** Debug and improve sync behavior using real-world data from production relays |
| 6 | **Difficulty:** Intermediate | 6 | **Difficulty:** Intermediate |
| 7 | **Time:** 30 minutes per iteration | 7 | **Time:** 30 minutes per iteration |
| 8 | 8 | ||
| 9 | ## Two-Mode Workflow | ||
| 10 | |||
| 11 | This guide operates in two modes: | ||
| 12 | |||
| 13 | ### Mode 1: Fix Existing Issues | ||
| 14 | **When:** There are files in `work/active-issues/` (excluding README.md) | ||
| 15 | |||
| 16 | 1. Check for active issues: `ls work/active-issues/` | ||
| 17 | 2. Pick the most important issue to fix | ||
| 18 | 3. Implement the fix | ||
| 19 | 4. Run `cargo test` to verify tests pass | ||
| 20 | 5. Run `cargo clippy` to check for warnings | ||
| 21 | 6. Run `cargo fmt` to format code | ||
| 22 | 7. Commit changes with descriptive message | ||
| 23 | 8. Report back - **DO NOT** do another issue or run more tests | ||
| 24 | |||
| 25 | ### Mode 2: Discover New Issues | ||
| 26 | **When:** No active issues in `work/active-issues/` | ||
| 27 | |||
| 28 | 1. Run 30-second production sync test (logs saved to `tmp/run-{timestamp}/`) | ||
| 29 | 2. Analyze logs for errors, warnings, unexpected patterns | ||
| 30 | 3. Document each issue as a separate markdown file in `work/active-issues/` | ||
| 31 | 4. Keep issue files minimal - just enough to identify the issue | ||
| 32 | 5. Report brief summary listing each issue in 1-2 sentences | ||
| 33 | 6. **DO NOT** create separate detailed analysis files | ||
| 34 | 7. **DO NOT** do thorough investigation or root cause analysis | ||
| 35 | |||
| 9 | ## Overview | 36 | ## Overview |
| 10 | 37 | ||
| 11 | This guide helps you run ngit-grasp's sync system against production relays to discover unexpected errors, inefficiencies, and edge cases that don't appear in controlled tests. | 38 | This guide helps you run ngit-grasp's sync system against production relays to discover unexpected errors, inefficiencies, and edge cases that don't appear in controlled tests. |
| @@ -45,23 +72,33 @@ The bootstrap relay provides the initial set of announcements to discover repos: | |||
| 45 | 72 | ||
| 46 | ### 3. Run with Time Limit | 73 | ### 3. Run with Time Limit |
| 47 | 74 | ||
| 48 | Start with short runs (30 seconds) to capture manageable log volumes: | 75 | Start with short runs (30 seconds) to capture manageable log volumes. Each run creates its own subdirectory in `tmp/` to keep data and logs isolated: |
| 49 | 76 | ||
| 50 | ```bash | 77 | ```bash |
| 51 | # Clear any existing data for clean state | 78 | # Create run directory with timestamp |
| 52 | rm -rf /tmp/ngit-test-* | 79 | RUN_DIR="tmp/run-$(date +%Y%m%d-%H%M%S)" |
| 80 | mkdir -p "$RUN_DIR" | ||
| 53 | 81 | ||
| 54 | # Run for 30 seconds with sanitized output | 82 | # Run for 30 seconds with sanitized output |
| 55 | timeout 30s cargo run -- \ | 83 | timeout 30s cargo run -- \ |
| 56 | --sync-bootstrap-relay-url wss://git.shakespeare.diy \ | 84 | --sync-bootstrap-relay-url wss://git.shakespeare.diy \ |
| 57 | --domain ngit.danconwaydev.com \ | 85 | --domain ngit.danconwaydev.com \ |
| 58 | --git-data-path /tmp/ngit-test-git \ | 86 | --git-data-path "$RUN_DIR/git-data" \ |
| 59 | --relay-data-path /tmp/ngit-test-relay \ | 87 | --relay-data-path "$RUN_DIR/relay-data" \ |
| 60 | 2>&1 | ./scripts/sanitize-logs.sh | tee sync-test.log | 88 | 2>&1 | ./scripts/sanitize-logs.sh | tee "$RUN_DIR/sync.log" |
| 61 | ``` | 89 | ``` |
| 62 | 90 | ||
| 63 | **Note:** The `timeout` command returns exit code 124, which is expected. | 91 | **Note:** The `timeout` command returns exit code 124, which is expected. |
| 64 | 92 | ||
| 93 | **Directory structure after run:** | ||
| 94 | ``` | ||
| 95 | tmp/ | ||
| 96 | └── run-20260109-143022/ | ||
| 97 | ├── git-data/ # Git repository data | ||
| 98 | ├── relay-data/ # Relay database | ||
| 99 | └── sync.log # Sanitized log output | ||
| 100 | ``` | ||
| 101 | |||
| 65 | ## Log Sanitization | 102 | ## Log Sanitization |
| 66 | 103 | ||
| 67 | Raw logs include full events and hundreds of event IDs per line, making them unwieldy for analysis. The sanitizer truncates long lines: | 104 | Raw logs include full events and hundreds of event IDs per line, making them unwieldy for analysis. The sanitizer truncates long lines: |
| @@ -152,73 +189,132 @@ When analyzing logs, look for these patterns: | |||
| 152 | | `sync_live` | Live subscriptions active | | 189 | | `sync_live` | Live subscriptions active | |
| 153 | | `PendingBatch` | Items awaiting EOSE confirmation | | 190 | | `PendingBatch` | Items awaiting EOSE confirmation | |
| 154 | 191 | ||
| 155 | ## Iterative Improvement Process | 192 | ## Mode 1: Fix Existing Issues (Detailed) |
| 193 | |||
| 194 | When `work/active-issues/` contains issue files: | ||
| 195 | |||
| 196 | ### Step 1: Check for Active Issues | ||
| 197 | |||
| 198 | ```bash | ||
| 199 | ls work/active-issues/ | ||
| 200 | ``` | ||
| 201 | |||
| 202 | If any `.md` files exist (excluding README.md), you're in Mode 1. | ||
| 203 | |||
| 204 | ### Step 2: Pick Most Important Issue | ||
| 205 | |||
| 206 | Review issue files and select based on: | ||
| 207 | - Severity (errors > warnings > log quality) | ||
| 208 | - Impact (functionality > performance > UX) | ||
| 209 | - Complexity (quick fixes first to clear backlog) | ||
| 210 | |||
| 211 | ### Step 3: Implement the Fix | ||
| 212 | |||
| 213 | Make the necessary code changes based on the issue description. | ||
| 156 | 214 | ||
| 157 | ### Step 1: Run and Capture | 215 | ### Step 4: Test, Lint, Format |
| 158 | 216 | ||
| 159 | ```bash | 217 | ```bash |
| 160 | timeout 30s cargo run -- [args] 2>&1 | ./scripts/sanitize-logs.sh > iteration-1.log | 218 | # Run tests |
| 219 | cargo test | ||
| 220 | |||
| 221 | # Check for warnings | ||
| 222 | cargo clippy | ||
| 223 | |||
| 224 | # Format code | ||
| 225 | cargo fmt | ||
| 161 | ``` | 226 | ``` |
| 162 | 227 | ||
| 163 | ### Step 2: Identify Issues | 228 | ### Step 5: Commit |
| 164 | 229 | ||
| 165 | Scan logs for errors and unexpected patterns: | ||
| 166 | ```bash | 230 | ```bash |
| 167 | grep -i error iteration-1.log | 231 | git add . |
| 168 | grep -i warn iteration-1.log | 232 | git commit -m "fix: [brief description of what was fixed]" |
| 169 | grep -i panic iteration-1.log | ||
| 170 | ``` | 233 | ``` |
| 171 | 234 | ||
| 172 | ### Step 3: Document Findings | 235 | ### Step 6: Report Back |
| 236 | |||
| 237 | **STOP HERE.** Report what was fixed. Do NOT: | ||
| 238 | - Fix another issue | ||
| 239 | - Run production sync test | ||
| 240 | - Do additional investigation | ||
| 241 | |||
| 242 | The workflow will cycle back through Mode 1 if more issues remain. | ||
| 243 | |||
| 244 | ## Mode 2: Discover New Issues (Detailed) | ||
| 245 | |||
| 246 | When `work/active-issues/` is empty (or only contains README.md): | ||
| 173 | 247 | ||
| 174 | Create individual markdown files in `work/active-issues/` for each issue discovered: | 248 | ### Step 1: Run Production Sync Test |
| 175 | 249 | ||
| 176 | ```bash | 250 | ```bash |
| 177 | # Example: Document a connection timeout issue | 251 | # Create run directory with timestamp |
| 178 | cat > work/active-issues/connection-timeout-bootstrap.md <<'EOF' | 252 | RUN_DIR="tmp/run-$(date +%Y%m%d-%H%M%S)" |
| 179 | # Issue: Connection Timeout on Bootstrap Relay | 253 | mkdir -p "$RUN_DIR" |
| 180 | 254 | ||
| 181 | **Discovered:** 2026-01-09 | 255 | # Run 30-second test |
| 182 | **Status:** Open | 256 | timeout 30s cargo run -- \ |
| 257 | --sync-bootstrap-relay-url wss://git.shakespeare.diy \ | ||
| 258 | --domain ngit.danconwaydev.com \ | ||
| 259 | --git-data-path "$RUN_DIR/git-data" \ | ||
| 260 | --relay-data-path "$RUN_DIR/relay-data" \ | ||
| 261 | 2>&1 | ./scripts/sanitize-logs.sh | tee "$RUN_DIR/sync.log" | ||
| 262 | ``` | ||
| 183 | 263 | ||
| 184 | ## Symptoms | 264 | Each run is isolated in its own timestamped directory under `tmp/`, keeping data and logs organized. |
| 185 | 265 | ||
| 186 | - Connection to wss://git.shakespeare.diy fails after 10s timeout | 266 | ### Step 2: Analyze Logs |
| 187 | - Log shows: `error: connection failed: timeout` | ||
| 188 | - Occurs 100% of time with this relay | ||
| 189 | 267 | ||
| 190 | ## Root Cause | 268 | Scan for errors and unexpected patterns: |
| 269 | ```bash | ||
| 270 | # Find the most recent run | ||
| 271 | LATEST_RUN=$(ls -1t tmp/run-*/sync.log | head -n1) | ||
| 191 | 272 | ||
| 192 | [To be determined] | 273 | # Analyze for issues |
| 274 | grep -i error "$LATEST_RUN" | ||
| 275 | grep -i warn "$LATEST_RUN" | ||
| 276 | grep -i panic "$LATEST_RUN" | ||
| 277 | ``` | ||
| 193 | 278 | ||
| 194 | ## Proposed Fix | 279 | ### Step 3: Document Issues |
| 195 | 280 | ||
| 196 | - Increase connection timeout from 10s to 30s for initial bootstrap | 281 | Create **one markdown file per issue** in `work/active-issues/`: |
| 197 | - Add retry logic with exponential backoff | ||
| 198 | - Consider fallback bootstrap relays | ||
| 199 | 282 | ||
| 200 | ## Code Location | 283 | ```bash |
| 284 | # Example: Minimal issue documentation | ||
| 285 | cat > work/active-issues/bootstrap-disconnect.md <<'EOF' | ||
| 286 | # Bootstrap relay disconnects when empty | ||
| 201 | 287 | ||
| 202 | - `src/sync/relay_connection.rs:45` - connection timeout constant | 288 | Bootstrap relay wss://git.shakespeare.diy disconnects after sync finds 0 events. Should persist since user-specified. |
| 289 | |||
| 290 | Log: "Disconnecting empty relay relay=wss://git.shakespeare.diy" | ||
| 291 | File: src/sync/mod.rs (check_disconnects function) | ||
| 203 | EOF | 292 | EOF |
| 204 | ``` | 293 | ``` |
| 205 | 294 | ||
| 206 | **Why individual files?** | 295 | **Keep each file brief:** |
| 207 | - Keeps the how-to guide clean and focused | 296 | - Descriptive title (one line) |
| 208 | - Prevents accidental commits of transient issues to tracked files | 297 | - What happens (1-2 sentences max) |
| 209 | - Easy to delete resolved issues or archive important ones | 298 | - Relevant log excerpt (one line) |
| 210 | - Each file can be worked on independently | 299 | - File/function location if obvious (one line) |
| 300 | - **NO** separate detailed analysis files | ||
| 301 | - **NO** root cause analysis | ||
| 302 | - **NO** proposed solutions (unless immediately obvious) | ||
| 211 | 303 | ||
| 212 | ### Step 4: Fix and Re-test | 304 | ### Step 4: Report Summary |
| 213 | 305 | ||
| 214 | After code changes, run again to verify the fix. | 306 | Provide a brief closing message with 1-2 sentence summary of **each issue** identified: |
| 307 | - State what the issue is | ||
| 308 | - Where it occurs (file/component) | ||
| 309 | - Keep it concise | ||
| 215 | 310 | ||
| 216 | ### Step 5: Extend Duration | 311 | **STOP HERE.** Do NOT: |
| 312 | - Fix the issues immediately | ||
| 313 | - Create separate detailed analysis markdown files | ||
| 314 | - Do thorough investigations | ||
| 315 | - Write lengthy explanations | ||
| 217 | 316 | ||
| 218 | Once 30-second runs are clean, extend to 2 minutes, then 5 minutes: | 317 | The workflow will cycle back through Mode 1 to fix issues one at a time. |
| 219 | ```bash | ||
| 220 | timeout 120s cargo run -- [args] 2>&1 | ./scripts/sanitize-logs.sh > iteration-2.log | ||
| 221 | ``` | ||
| 222 | 318 | ||
| 223 | ## Logging Improvements | 319 | ## Logging Improvements |
| 224 | 320 | ||
| @@ -253,52 +349,66 @@ If a log line appears too frequently: | |||
| 253 | tracing::trace!("Per-event detail that's too noisy"); | 349 | tracing::trace!("Per-event detail that's too noisy"); |
| 254 | ``` | 350 | ``` |
| 255 | 351 | ||
| 256 | ## Active Issues | 352 | ## Managing Active Issues |
| 257 | 353 | ||
| 258 | Issues discovered during production sync testing are tracked in `work/active-issues/` as individual markdown files. | 354 | Issues are tracked in `work/active-issues/` as individual markdown files. |
| 259 | 355 | ||
| 260 | **View current issues:** | 356 | **Check for active issues:** |
| 261 | ```bash | 357 | ```bash |
| 262 | ls work/active-issues/ | 358 | ls work/active-issues/ |
| 263 | ``` | 359 | ``` |
| 264 | 360 | ||
| 265 | **Create a new issue:** | 361 | **After fixing an issue:** |
| 266 | ```bash | 362 | ```bash |
| 267 | # Use kebab-case filename describing the issue | 363 | # Delete the resolved issue file |
| 268 | cat > work/active-issues/[issue-name].md <<'EOF' | 364 | rm work/active-issues/issue-name.md |
| 269 | # Issue: [Short Description] | ||
| 270 | 365 | ||
| 271 | **Discovered:** [Date] | 366 | # Or archive if important for future reference |
| 272 | **Status:** Open | 367 | mv work/active-issues/issue-name.md docs/archive/2026-01-09-issue-name.md |
| 273 | 368 | ``` | |
| 274 | ## Symptoms | ||
| 275 | 369 | ||
| 276 | - Log patterns observed | 370 | **Issue file format (minimal):** |
| 277 | - Reproduction steps if known | 371 | ```markdown |
| 372 | # Brief title | ||
| 278 | 373 | ||
| 279 | ## Root Cause | 374 | What happens (1-2 sentences). |
| 280 | 375 | ||
| 281 | [To be determined / Known cause] | 376 | Log evidence: "relevant log line" |
| 377 | File: src/path/to/file.rs (function_name if known) | ||
| 378 | ``` | ||
| 282 | 379 | ||
| 283 | ## Proposed Fix | 380 | Keep documentation minimal - just enough to identify and locate the issue. |
| 284 | 381 | ||
| 285 | - Suggested code changes | 382 | --- |
| 286 | - Alternative approaches | ||
| 287 | 383 | ||
| 288 | ## Code Location | 384 | ## Workflow Summary |
| 289 | 385 | ||
| 290 | - File paths and line numbers where changes are needed | ||
| 291 | EOF | ||
| 292 | ``` | 386 | ``` |
| 293 | 387 | Check work/active-issues/ | |
| 294 | **Resolve an issue:** | 388 | │ |
| 295 | ```bash | 389 | ├─ Has issues? ──► Mode 1: Pick one issue |
| 296 | # After fixing, either delete or move to archive | 390 | │ │ |
| 297 | rm work/active-issues/resolved-issue.md | 391 | │ ├─ Fix code |
| 298 | # OR | 392 | │ ├─ cargo test |
| 299 | mv work/active-issues/important-issue.md docs/archive/2026-01-09-important-issue.md | 393 | │ ├─ cargo clippy |
| 394 | │ ├─ cargo fmt | ||
| 395 | │ ├─ git commit | ||
| 396 | │ └─ Report & STOP | ||
| 397 | │ | ||
| 398 | └─ No issues? ──► Mode 2: Run production sync | ||
| 399 | │ | ||
| 400 | ├─ timeout 30s cargo run ... | ||
| 401 | ├─ Analyze logs | ||
| 402 | ├─ Document issues (minimal) | ||
| 403 | └─ Report summary & STOP | ||
| 300 | ``` | 404 | ``` |
| 301 | 405 | ||
| 406 | **Key Rules:** | ||
| 407 | - Only do ONE thing per cycle (fix one issue OR discover issues) | ||
| 408 | - Always stop after reporting | ||
| 409 | - Keep issue documentation minimal | ||
| 410 | - No root cause analysis during discovery | ||
| 411 | |||
| 302 | --- | 412 | --- |
| 303 | 413 | ||
| 304 | ## Quick Reference | 414 | ## Quick Reference |
| @@ -306,28 +416,48 @@ mv work/active-issues/important-issue.md docs/archive/2026-01-09-important-issue | |||
| 306 | ### Minimal Test Command | 416 | ### Minimal Test Command |
| 307 | 417 | ||
| 308 | ```bash | 418 | ```bash |
| 419 | # Create run directory | ||
| 420 | RUN_DIR="tmp/run-$(date +%Y%m%d-%H%M%S)" | ||
| 421 | mkdir -p "$RUN_DIR" | ||
| 422 | |||
| 423 | # Run test | ||
| 309 | timeout 30s cargo run -- \ | 424 | timeout 30s cargo run -- \ |
| 310 | --sync-bootstrap-relay-url wss://git.shakespeare.diy \ | 425 | --sync-bootstrap-relay-url wss://git.shakespeare.diy \ |
| 311 | --domain ngit.danconwaydev.com \ | 426 | --domain ngit.danconwaydev.com \ |
| 312 | --git-data-path /tmp/ngit-test-git \ | 427 | --git-data-path "$RUN_DIR/git-data" \ |
| 313 | --relay-data-path /tmp/ngit-test-relay \ | 428 | --relay-data-path "$RUN_DIR/relay-data" \ |
| 314 | 2>&1 | ./scripts/sanitize-logs.sh | 429 | 2>&1 | ./scripts/sanitize-logs.sh | tee "$RUN_DIR/sync.log" |
| 315 | ``` | 430 | ``` |
| 316 | 431 | ||
| 317 | ### With Metrics Endpoint | 432 | ### With Metrics Endpoint |
| 318 | 433 | ||
| 319 | ```bash | 434 | ```bash |
| 435 | # Create run directory | ||
| 436 | RUN_DIR="tmp/run-$(date +%Y%m%d-%H%M%S)" | ||
| 437 | mkdir -p "$RUN_DIR" | ||
| 438 | |||
| 439 | # Run with metrics | ||
| 320 | timeout 30s cargo run -- \ | 440 | timeout 30s cargo run -- \ |
| 321 | --sync-bootstrap-relay-url wss://git.shakespeare.diy \ | 441 | --sync-bootstrap-relay-url wss://git.shakespeare.diy \ |
| 322 | --domain ngit.danconwaydev.com \ | 442 | --domain ngit.danconwaydev.com \ |
| 323 | --git-data-path /tmp/ngit-test-git \ | 443 | --git-data-path "$RUN_DIR/git-data" \ |
| 324 | --relay-data-path /tmp/ngit-test-relay \ | 444 | --relay-data-path "$RUN_DIR/relay-data" \ |
| 325 | --metrics-address 127.0.0.1:9090 \ | 445 | --metrics-address 127.0.0.1:9090 \ |
| 326 | 2>&1 | ./scripts/sanitize-logs.sh | 446 | 2>&1 | ./scripts/sanitize-logs.sh | tee "$RUN_DIR/sync.log" |
| 327 | ``` | 447 | ``` |
| 328 | 448 | ||
| 329 | Then in another terminal: `curl http://127.0.0.1:9090/metrics` | 449 | Then in another terminal: `curl http://127.0.0.1:9090/metrics` |
| 330 | 450 | ||
| 451 | ### Cleanup Old Runs | ||
| 452 | |||
| 453 | ```bash | ||
| 454 | # Remove runs older than 7 days | ||
| 455 | find tmp/run-* -type d -mtime +7 -exec rm -rf {} + | ||
| 456 | |||
| 457 | # Remove all test runs | ||
| 458 | rm -rf tmp/run-* | ||
| 459 | ``` | ||
| 460 | |||
| 331 | ### Different Log Level | 461 | ### Different Log Level |
| 332 | 462 | ||
| 333 | The default is DEBUG. For more detail: | 463 | The default is DEBUG. For more detail: |