upleb.uk

Public git repos — served from a NIP-34 GRASP relay at git.upleb.uk

summaryrefslogtreecommitdiff
path: root/docs/archive/2026-01-relay-ngit-dev-migration/scripts/30-extract-parse-failures.sh
diff options
context:
space:
mode:
authorDanConwayDev <DanConwayDev@protonmail.com>2026-02-03 14:50:22 +0000
committerDanConwayDev <DanConwayDev@protonmail.com>2026-02-03 15:18:23 +0000
commit874a8abe1d076cfafd9baf919ec23d7d58200698 (patch)
treedce0d0d36bddc496ff32f8555a8790d8dc7be7e4 /docs/archive/2026-01-relay-ngit-dev-migration/scripts/30-extract-parse-failures.sh
parent9fd4350c57bbe986ebf65bf3ea4c996572e81884 (diff)
parent92a9a3bfe0bc522e8ae411991a366a3a6310d525 (diff)
Merge relay.ngit.dev migration: bug fixes and migration tooling
This merge includes critical bug fixes and comprehensive migration tooling developed during the relay.ngit.dev migration effort. Bug Fixes: - Fix git protocol error handling to return HTTP 200 with ERR pkt-line - Fix naughty list false positives and DNS failure identification - Fix database query filters in load_existing_events (remove .since()) - Fix OID fetch tracking to distinguish 0 OIDs from successful fetches - Fix purgatory event source tracking for filtered expiry logging - Implement OID retry logic for 'not our ref' errors Migration Tools & Documentation: - Complete 5-phase migration analysis pipeline with orchestration script - Phase 1: Event fetching from source relay - Phase 2: Git sync verification - Phase 3: Categorization and relay comparison - Phase 4: Log extraction (parse failures, purgatory expiry) - Phase 5: Action classification for migration decisions - Comprehensive migration guide with lessons learned - Troubleshooting guide for permission and corruption issues Configuration: - Add NGIT_LOG_LEVEL configuration option - Update git throttle limits to 60/minute - Improve logging throughout for better observability
Diffstat (limited to 'docs/archive/2026-01-relay-ngit-dev-migration/scripts/30-extract-parse-failures.sh')
-rwxr-xr-xdocs/archive/2026-01-relay-ngit-dev-migration/scripts/30-extract-parse-failures.sh774
1 files changed, 774 insertions, 0 deletions
diff --git a/docs/archive/2026-01-relay-ngit-dev-migration/scripts/30-extract-parse-failures.sh b/docs/archive/2026-01-relay-ngit-dev-migration/scripts/30-extract-parse-failures.sh
new file mode 100755
index 0000000..d762aae
--- /dev/null
+++ b/docs/archive/2026-01-relay-ngit-dev-migration/scripts/30-extract-parse-failures.sh
@@ -0,0 +1,774 @@
1#!/usr/bin/env bash
2#
3# 30-extract-parse-failures.sh - Extract parse failure events from systemd logs
4#
5# PHASE 4a of the GRASP relay to ngit-grasp migration analysis pipeline.
6# Extracts structured [PARSE_FAIL] log entries AND "Invalid announcement"
7# rejections from journalctl.
8#
9# USAGE:
10# ./30-extract-parse-failures.sh <service-name> <output-dir> [options]
11#
12# EXAMPLES:
13# # Extract from ngit-grasp service (last 30 days, default)
14# ./30-extract-parse-failures.sh ngit-grasp.service output/logs
15#
16# # Extract with custom time range
17# ./30-extract-parse-failures.sh ngit-grasp.service output/logs --since "2026-01-01"
18#
19# # Extract from specific time window
20# ./30-extract-parse-failures.sh ngit-grasp.service output/logs --since "2026-01-15" --until "2026-01-22"
21#
22# OPTIONS:
23# --since <date> Start date for log extraction (default: 30 days ago)
24# --until <date> End date for log extraction (default: now)
25# --dry-run Show what would be extracted without writing files
26#
27# ENRICHMENT:
28# The script automatically enriches parse failures with repo/npub information
29# by extracting from "Added rejected announcement" log entries which include
30# pubkey and identifier fields. Hex pubkeys are converted to npub format using
31# `nak encode npub <hex-pubkey>` if the nak tool is available.
32#
33# OUTPUT:
34# <output-dir>/parse-failures.txt
35#
36# OUTPUT FORMAT (TSV):
37# event_id<TAB>kind<TAB>reason<TAB>repo<TAB>npub
38#
39# EXPECTED LOG FORMATS:
40# The script looks for three types of log entries:
41#
42# 1. Structured [PARSE_FAIL] entries:
43# 2026-01-22T10:30:45Z ngit-grasp[1234]: [PARSE_FAIL] kind=30618 event_id=abc123... reason="invalid refs format" repo=myrepo npub=npub1...
44#
45# 2. "Invalid announcement" rejections (write policy):
46# Event rejected by write policy event_id=abc123... relay=wss://... kind=30617 reason=Invalid announcement: multiple clone tags found...
47#
48# 3. "Added rejected announcement" entries (for enrichment):
49# Added rejected announcement to two-tier index event_id=abc123... kind=30617 identifier=myrepo pubkey=hex...
50# These entries provide pubkey and identifier for enriching write policy rejections.
51#
52# NOTE: Builder logs ("Rejected repository announcement note1xxx:") are NOT extracted
53# because they use bech32 (note1) IDs while write policy logs use hex IDs. Extracting
54# both would cause double-counting since deduplication only works within each format.
55# Write policy logs contain the same events, so we don't lose any data.
56#
57# Required fields: kind, event_id, reason
58# Enrichment fields: repo (identifier), npub (converted from hex pubkey)
59#
60# DEPENDENCY:
61# This script requires logging improvements in ngit-grasp to emit structured
62# [PARSE_FAIL] log entries. Until those are implemented, this script will
63# find no matching entries (which is handled gracefully).
64#
65# "Invalid announcement" rejections are logged by the write policy and
66# should be present in any ngit-grasp deployment.
67#
68# See: docs/how-to/migrate-to-ngit-grasp.md (Dependencies section)
69#
70# Expected Rust logging code for [PARSE_FAIL]:
71# tracing::warn!(
72# target: "migration",
73# "[PARSE_FAIL] kind={} event_id={} reason=\"{}\" repo={} npub={}",
74# event.kind, event.id, reason, identifier, npub
75# );
76#
77# PREREQUISITES:
78# - journalctl (systemd)
79# - grep, awk, sed (standard Unix tools)
80# - Access to systemd journal (may require sudo or journal group membership)
81#
82# RUNTIME: Depends on log volume, typically < 30 seconds
83#
84# SEE ALSO:
85# docs/how-to/migrate-to-ngit-grasp.md - Full migration guide
86# 31-extract-purgatory-expiry.sh - Companion script for purgatory expiry logs
87#
88
89set -euo pipefail
90
91# Get script directory for sourcing helpers
92SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
93
94# Source the service validation helper
95if [[ -f "$SCRIPT_DIR/validate-service.sh" ]]; then
96 source "$SCRIPT_DIR/validate-service.sh"
97fi
98
99# Colors for output (disabled if not a terminal)
100if [[ -t 1 ]]; then
101 RED='\033[0;31m'
102 GREEN='\033[0;32m'
103 YELLOW='\033[0;33m'
104 BLUE='\033[0;34m'
105 NC='\033[0m'
106else
107 RED=''
108 GREEN=''
109 YELLOW=''
110 BLUE=''
111 NC=''
112fi
113
114log_info() {
115 echo -e "${BLUE}[INFO]${NC} $*" >&2
116}
117
118log_success() {
119 echo -e "${GREEN}[OK]${NC} $*" >&2
120}
121
122log_warn() {
123 echo -e "${YELLOW}[WARN]${NC} $*" >&2
124}
125
126log_error() {
127 echo -e "${RED}[ERROR]${NC} $*" >&2
128}
129
130usage() {
131 echo "Usage: $0 <service-name> <output-dir> [options]"
132 echo ""
133 echo "Arguments:"
134 echo " service-name Systemd service name (e.g., ngit-grasp.service)"
135 echo " output-dir Directory to store extracted log data"
136 echo ""
137 echo "Options:"
138 echo " --since <date> Start date (default: 30 days ago)"
139 echo " --until <date> End date (default: now)"
140 echo " --dry-run Show what would be extracted without writing"
141 echo ""
142 echo "Examples:"
143 echo " $0 ngit-grasp.service output/logs"
144 echo " $0 ngit-grasp.service output/logs --since '2026-01-01'"
145 echo " $0 ngit-grasp.service output/logs --since '2026-01-15' --until '2026-01-22'"
146 echo ""
147 echo "Expected log formats:"
148 echo " [PARSE_FAIL] kind=30618 event_id=abc123 reason=\"...\" repo=myrepo npub=npub1..."
149 echo " Event rejected by write policy event_id=abc123 ... kind=30617 reason=Invalid announcement: ..."
150 echo ""
151 echo "Enrichment:"
152 echo " Parse failures are automatically enriched with repo/npub from"
153 echo " 'Added rejected announcement' log entries. Hex pubkeys are converted"
154 echo " to npub format using 'nak encode npub' if available."
155 exit 1
156}
157
158# =============================================================================
159# AWK-BASED BATCH PARSING FUNCTIONS
160# =============================================================================
161# These functions use awk for efficient batch processing instead of per-line
162# grep calls. This provides ~400x speedup for large log files.
163#
164# NOTE: parse_builder_rejection_line() was removed to fix double-counting bug.
165# Builder logs use bech32 (note1) IDs while write policy logs use hex IDs.
166# Since deduplication only works within each format, extracting both caused
167# the same event to be counted twice. Write policy logs contain the same
168# events, so we don't lose any data by only extracting from that source.
169
170# Parse [PARSE_FAIL] log lines in batch using awk
171# Input: file containing log lines with [PARSE_FAIL]
172# Output: TSV lines: event_id<TAB>kind<TAB>reason<TAB>repo<TAB>npub
173parse_parse_fail_batch() {
174 local input_file="$1"
175 awk '
176 {
177 # Extract kind=VALUE
178 kind = ""
179 if (match($0, /kind=([0-9]+)/, m)) kind = m[1]
180
181 # Extract event_id=VALUE (hex string)
182 event_id = ""
183 if (match($0, /event_id=([a-f0-9]+)/, m)) event_id = m[1]
184
185 # Extract reason="VALUE" (quoted string)
186 reason = ""
187 if (match($0, /reason="([^"]*)"/, m)) reason = m[1]
188
189 # Extract repo=VALUE (optional)
190 repo = ""
191 if (match($0, /repo=([^ ]+)/, m)) repo = m[1]
192
193 # Extract npub=VALUE (optional)
194 npub = ""
195 if (match($0, /npub=([^ ]+)/, m)) npub = m[1]
196
197 # Output if we have required fields
198 if (kind != "" && event_id != "" && reason != "") {
199 print event_id "\t" kind "\t" reason "\t" repo "\t" npub
200 }
201 }
202 ' "$input_file"
203}
204
205# Parse "Invalid announcement" rejection log lines in batch using awk
206# Input: file containing "Event rejected by write policy" log lines
207# Output: TSV lines: event_id<TAB>kind<TAB>reason<TAB><empty><TAB><empty>
208parse_write_policy_rejection_batch() {
209 local input_file="$1"
210 awk '
211 {
212 # Extract event_id=VALUE (hex string)
213 event_id = ""
214 if (match($0, /event_id=([a-f0-9]+)/, m)) event_id = m[1]
215
216 # Extract kind=VALUE
217 kind = ""
218 if (match($0, /kind=([0-9]+)/, m)) kind = m[1]
219
220 # Extract reason=VALUE (everything after "reason=")
221 reason = ""
222 if (match($0, /reason=(.*)$/, m)) reason = m[1]
223
224 # Output if we have required fields (repo and npub are empty)
225 if (kind != "" && event_id != "" && reason != "") {
226 print event_id "\t" kind "\t" reason "\t\t"
227 }
228 }
229 ' "$input_file"
230}
231
232# Parse "Added rejected announcement" log lines in batch using awk
233# Input: file containing "Added rejected announcement to two-tier index" log lines
234# Output: TSV lines: event_id<TAB>identifier<TAB>pubkey_hex
235parse_rejected_announcement_batch() {
236 local input_file="$1"
237 awk '
238 {
239 # Extract event_id=VALUE (hex string)
240 event_id = ""
241 if (match($0, /event_id=([a-f0-9]+)/, m)) event_id = m[1]
242
243 # Extract identifier=VALUE (repo name)
244 identifier = ""
245 if (match($0, /identifier=([^ ]+)/, m)) identifier = m[1]
246
247 # Extract pubkey=VALUE (hex string)
248 pubkey = ""
249 if (match($0, /pubkey=([a-f0-9]+)/, m)) pubkey = m[1]
250
251 # Output if we have all required fields
252 if (event_id != "" && identifier != "" && pubkey != "") {
253 print event_id "\t" identifier "\t" pubkey
254 }
255 }
256 ' "$input_file"
257}
258
259# Enrich parse failures with repo/npub by looking up event_id in "Added rejected announcement" log entries
260# This is critical because "Invalid announcement" rejections only log event_id and kind,
261# not the repo name or npub. Without enrichment, Phase 5 shows event_id|kind instead
262# of repo|npub in action-required.txt, making the output unusable.
263#
264# Arguments:
265# $1 - parse failures file to enrich (modified in place)
266# $2 - lookup file containing event_id -> identifier|pubkey mappings from logs
267#
268# The function:
269# 1. Uses the lookup table built from "Added rejected announcement" log entries
270# 2. For each parse failure with empty repo/npub, looks up the event_id
271# 3. Populates repo and npub columns from the lookup
272# 4. Converts hex pubkeys to npub format using `nak encode npub` if available
273#
274# OPTIMIZATION: This function uses batch processing for efficiency:
275# - Uses awk for O(n) join instead of per-line grep (O(n*m))
276# - Batches all pubkey->npub conversions in a single nak call
277# - This reduces runtime from minutes to seconds for large datasets
278enrich_with_repo_npub() {
279 local parse_failures_file="$1"
280 local lookup_file="$2"
281
282 # Validate lookup file exists and has content
283 if [[ ! -f "$lookup_file" ]] || [[ ! -s "$lookup_file" ]]; then
284 log_warn "No enrichment data available - repo/npub columns will remain empty"
285 return 0
286 fi
287
288 log_info "Enriching parse failures with repo/npub from log entries..."
289
290 # Check if we have nak for pubkey->npub conversion
291 local can_convert_npub=false
292 if command -v nak &> /dev/null; then
293 can_convert_npub=true
294 log_info " Using 'nak' for pubkey->npub conversion"
295 else
296 log_warn " 'nak' not found - will use hex pubkeys instead of npub"
297 fi
298
299 local lookup_count
300 lookup_count=$(wc -l < "$lookup_file")
301 lookup_count="${lookup_count//[^0-9]/}"
302 log_info " Lookup table has $lookup_count entries"
303
304 # STEP 1: Extract unique pubkeys that need conversion
305 # Get pubkeys from lookup file (column 3), deduplicate
306 local unique_pubkeys_file npub_map_file
307 unique_pubkeys_file=$(mktemp)
308 npub_map_file=$(mktemp)
309
310 cut -f3 "$lookup_file" | sort -u > "$unique_pubkeys_file"
311 local unique_pubkey_count
312 unique_pubkey_count=$(wc -l < "$unique_pubkeys_file")
313 unique_pubkey_count="${unique_pubkey_count//[^0-9]/}"
314 log_info " Converting $unique_pubkey_count unique pubkeys to npub format..."
315
316 # STEP 2: Batch convert all pubkeys to npub in a single nak call
317 # nak reads hex pubkeys from stdin (one per line) and outputs npubs
318 if [[ "$can_convert_npub" == true && "$unique_pubkey_count" -gt 0 ]]; then
319 # Create mapping file: pubkey_hex<TAB>npub
320 # nak encode npub reads from stdin and outputs one npub per line
321 paste "$unique_pubkeys_file" <(nak encode npub < "$unique_pubkeys_file" 2>/dev/null) > "$npub_map_file" || {
322 # Fallback: if batch conversion fails, use hex pubkeys
323 log_warn " Batch npub conversion failed, using hex pubkeys"
324 awk '{print $1 "\t" $1}' "$unique_pubkeys_file" > "$npub_map_file"
325 }
326 else
327 # No nak available, use hex pubkeys as-is
328 awk '{print $1 "\t" $1}' "$unique_pubkeys_file" > "$npub_map_file"
329 fi
330
331 rm -f "$unique_pubkeys_file"
332
333 # STEP 3: Use awk for efficient join (O(n) instead of O(n*m) grep per line)
334 # This joins parse_failures with lookup_file on event_id, then with npub_map on pubkey
335 local enriched_file
336 enriched_file=$(mktemp)
337
338 # Copy header lines
339 grep '^#' "$parse_failures_file" > "$enriched_file" 2>/dev/null || true
340
341 # Use awk to perform the join efficiently
342 # Input files (order matters for ARGIND):
343 # 1. npub_map_file: pubkey_hex<TAB>npub
344 # 2. lookup_file: event_id<TAB>identifier<TAB>pubkey_hex
345 # 3. parse_failures_file: event_id<TAB>kind<TAB>reason<TAB>repo<TAB>npub
346 awk -F'\t' -v OFS='\t' '
347 # Track which file we are processing
348 FNR==1 { file_num++ }
349
350 # First file: npub_map (pubkey_hex -> npub)
351 file_num==1 {
352 npub_map[$1] = $2
353 next
354 }
355 # Second file: lookup (event_id -> identifier, pubkey_hex)
356 file_num==2 {
357 lookup_repo[$1] = $2
358 lookup_pubkey[$1] = $3
359 next
360 }
361 # Third file: parse_failures
362 /^#/ { next } # Skip headers (already copied)
363 {
364 event_id = $1
365 kind = $2
366 reason = $3
367 repo = $4
368 npub = $5
369
370 # If repo/npub empty, try to enrich from lookup
371 if (repo == "" && event_id in lookup_repo) {
372 repo = lookup_repo[event_id]
373 }
374 if (npub == "" && event_id in lookup_pubkey) {
375 pubkey = lookup_pubkey[event_id]
376 if (pubkey in npub_map) {
377 npub = npub_map[pubkey]
378 } else {
379 npub = pubkey # Fallback to hex
380 }
381 }
382
383 print event_id, kind, reason, repo, npub
384 }
385 ' "$npub_map_file" "$lookup_file" "$parse_failures_file" >> "$enriched_file"
386
387 rm -f "$npub_map_file"
388
389 # Count enriched entries
390 local enriched_count total_count
391 total_count=$(grep -v '^#' "$parse_failures_file" | wc -l)
392 total_count="${total_count//[^0-9]/}"
393 # Count entries that have non-empty repo AND npub after enrichment
394 enriched_count=$(grep -v '^#' "$enriched_file" | awk -F'\t' '$4 != "" && $5 != ""' | wc -l)
395 enriched_count="${enriched_count//[^0-9]/}"
396
397 # Replace original with enriched version
398 mv "$enriched_file" "$parse_failures_file"
399
400 log_info " Enriched $enriched_count of $total_count parse failures with repo/npub"
401 log_success "Enrichment complete"
402}
403
404# Parse "Added rejected announcement" log entries to build enrichment lookup table
405# Input: log line containing "Added rejected announcement to two-tier index"
406# Output: TSV line: event_id<TAB>identifier<TAB>pubkey_hex
407parse_rejected_announcement_line() {
408 local line="$1"
409
410 local event_id identifier pubkey_hex
411
412 # Extract event_id=VALUE (hex string)
413 event_id=$(echo "$line" | grep -oP 'event_id=\K[a-f0-9]+' || echo "")
414
415 # Extract identifier=VALUE (repo name)
416 identifier=$(echo "$line" | grep -oP 'identifier=\K[^ ]+' || echo "")
417
418 # Extract pubkey=VALUE (hex string)
419 pubkey_hex=$(echo "$line" | grep -oP 'pubkey=\K[a-f0-9]+' || echo "")
420
421 # Only output if we have all required fields
422 if [[ -n "$event_id" && -n "$identifier" && -n "$pubkey_hex" ]]; then
423 printf '%s\t%s\t%s\n' "$event_id" "$identifier" "$pubkey_hex"
424 fi
425}
426
427# Main
428main() {
429 if [[ $# -lt 2 ]]; then
430 usage
431 fi
432
433 local service="$1"
434 local output_dir="$2"
435 shift 2
436
437 # Default time range: last 30 days
438 local since_date
439 since_date=$(date -d "30 days ago" "+%Y-%m-%d" 2>/dev/null || date -v-30d "+%Y-%m-%d" 2>/dev/null || echo "")
440 local until_date=""
441 local dry_run=false
442
443 # Parse options
444 while [[ $# -gt 0 ]]; do
445 case "$1" in
446 --since)
447 since_date="$2"
448 shift 2
449 ;;
450 --until)
451 until_date="$2"
452 shift 2
453 ;;
454 --dry-run)
455 dry_run=true
456 shift
457 ;;
458 *)
459 log_error "Unknown option: $1"
460 usage
461 ;;
462 esac
463 done
464
465 # Validate service name format
466 if [[ ! "$service" =~ \.service$ ]]; then
467 service="${service}.service"
468 fi
469
470 # Validate service is appropriate for structured logging
471 # This prevents the common mistake of using ngit-relay instead of ngit-grasp
472 if type validate_service_for_structured_logging &>/dev/null; then
473 # Use non-interactive mode if not a terminal, skip log check (we'll do our own)
474 local interactive="true"
475 [[ ! -t 0 ]] && interactive="false"
476
477 if ! validate_service_for_structured_logging "$service" "false" "$interactive"; then
478 log_error "Service validation failed. Use an ngit-grasp service for structured logging."
479 exit 1
480 fi
481 else
482 # Fallback validation if helper not available
483 if [[ "$service" == *"ngit-relay"* ]]; then
484 log_error "Service name appears to be ngit-relay: $service"
485 log_error "Structured logging ([PARSE_FAIL]) only exists in ngit-grasp services."
486 log_error "Please use the ngit-grasp archive service instead."
487 log_error ""
488 log_error "To find the correct service:"
489 log_error " systemctl list-units 'ngit-grasp*' --all"
490 exit 1
491 fi
492 fi
493
494 log_info "Extracting parse failures from systemd logs"
495 log_info "Service: $service"
496 log_info "Output: $output_dir"
497 log_info "Time range: ${since_date:-beginning} to ${until_date:-now}"
498
499 # Check if journalctl is available
500 if ! command -v journalctl &> /dev/null; then
501 log_error "journalctl not found. This script requires systemd."
502 exit 1
503 fi
504
505 # Validate service exists (check if journalctl can find any logs for it)
506 # Note: We don't require the service to be running, just that it has logs
507 if ! journalctl --no-pager -u "$service" -n 1 &>/dev/null; then
508 log_warn "Could not query logs for service: $service"
509 log_warn "This may indicate the service doesn't exist or you lack permissions."
510 log_warn ""
511 log_warn "To list available ngit-grasp services:"
512 log_warn " systemctl list-units 'ngit-grasp*' --all"
513 log_warn " journalctl --list-boots # Check if you have journal access"
514 log_warn ""
515 # Continue anyway - the service might exist but have no logs yet
516 fi
517
518 # Build journalctl command
519 local journal_cmd="journalctl -u $service --no-pager -o short-iso"
520
521 if [[ -n "$since_date" ]]; then
522 journal_cmd="$journal_cmd --since '$since_date'"
523 fi
524
525 if [[ -n "$until_date" ]]; then
526 journal_cmd="$journal_cmd --until '$until_date'"
527 fi
528
529 log_info "Running: $journal_cmd | grep '[PARSE_FAIL]' or 'Invalid announcement'"
530
531 if [[ "$dry_run" == true ]]; then
532 log_info "[DRY RUN] Would extract to: $output_dir/parse-failures.txt"
533
534 # Show sample of what would be extracted
535 log_info "Checking for matching log entries..."
536 local parse_fail_count invalid_announcement_count
537 parse_fail_count=$(eval "$journal_cmd" 2>/dev/null | grep -c '\[PARSE_FAIL\]' || echo "0")
538 parse_fail_count="${parse_fail_count//[^0-9]/}" # Strip non-numeric characters
539 parse_fail_count="${parse_fail_count:-0}"
540
541 invalid_announcement_count=$(eval "$journal_cmd" 2>/dev/null | grep 'Event rejected by write policy' | grep -c 'Invalid announcement' || echo "0")
542 invalid_announcement_count="${invalid_announcement_count//[^0-9]/}"
543 invalid_announcement_count="${invalid_announcement_count:-0}"
544
545 log_info "Found $parse_fail_count [PARSE_FAIL] entries"
546 log_info "Found $invalid_announcement_count 'Invalid announcement' rejections"
547
548 if [[ "$parse_fail_count" -eq 0 && "$invalid_announcement_count" -eq 0 ]]; then
549 log_warn "No matching entries found in logs."
550 log_warn "This is expected if ngit-grasp logging improvements are not yet deployed."
551 log_warn "See: docs/how-to/migrate-to-ngit-grasp.md (Dependencies section)"
552 fi
553
554 exit 0
555 fi
556
557 # Create output directory
558 mkdir -p "$output_dir"
559
560 local output_file="$output_dir/parse-failures.txt"
561 local temp_file
562 temp_file=$(mktemp)
563
564 # Extract and parse log entries using streaming (avoids loading all logs into memory)
565 log_info "Extracting log entries..."
566
567 # Create temp files for intermediate results
568 local temp_stderr temp_parse_fail temp_write_policy_rejection temp_rejected_announcement
569 temp_stderr=$(mktemp)
570 temp_parse_fail=$(mktemp)
571 temp_write_policy_rejection=$(mktemp)
572 temp_rejected_announcement=$(mktemp)
573
574 # Extract [PARSE_FAIL] entries directly to temp file (streaming)
575 log_info " Searching for [PARSE_FAIL] entries..."
576 eval "$journal_cmd" 2>"$temp_stderr" | grep '\[PARSE_FAIL\]' > "$temp_parse_fail" || true
577
578 local journal_stderr
579 journal_stderr=$(cat "$temp_stderr" 2>/dev/null || true)
580 if [[ -n "$journal_stderr" ]]; then
581 log_warn "journalctl reported: $journal_stderr"
582 fi
583
584 # Extract "Event rejected by write policy" with "Invalid announcement" (streaming)
585 # NOTE: We only extract from write policy logs (hex IDs), not builder logs (note1 IDs)
586 # to avoid double-counting. Both log sources contain the same events.
587 log_info " Searching for write policy rejections..."
588 eval "$journal_cmd" 2>/dev/null | grep 'Event rejected by write policy' | grep 'Invalid announcement' > "$temp_write_policy_rejection" || true
589
590 # Extract "Added rejected announcement" entries for enrichment (streaming)
591 # These contain pubkey and identifier which we use to enrich write policy rejections
592 log_info " Searching for rejected announcement entries (for enrichment)..."
593 eval "$journal_cmd" 2>/dev/null | grep 'Added rejected announcement to two-tier index' > "$temp_rejected_announcement" || true
594
595 rm -f "$temp_stderr"
596
597 # Check if we found anything
598 local parse_fail_line_count write_policy_line_count rejected_announcement_line_count
599 parse_fail_line_count=$(wc -l < "$temp_parse_fail")
600 parse_fail_line_count="${parse_fail_line_count//[^0-9]/}"
601 write_policy_line_count=$(wc -l < "$temp_write_policy_rejection")
602 write_policy_line_count="${write_policy_line_count//[^0-9]/}"
603 rejected_announcement_line_count=$(wc -l < "$temp_rejected_announcement")
604 rejected_announcement_line_count="${rejected_announcement_line_count//[^0-9]/}"
605
606 log_info " Found $parse_fail_line_count [PARSE_FAIL] log lines"
607 log_info " Found $write_policy_line_count write policy rejection log lines"
608 log_info " Found $rejected_announcement_line_count rejected announcement log lines (for enrichment)"
609
610 local total_invalid_announcement_lines=$write_policy_line_count
611
612 if [[ "$parse_fail_line_count" -eq 0 && "$total_invalid_announcement_lines" -eq 0 ]]; then
613 log_warn "No matching entries found in logs."
614 log_warn ""
615 log_warn "This is expected if ngit-grasp logging improvements are not yet deployed."
616 log_warn "The script looks for:"
617 log_warn ""
618 log_warn " 1. [PARSE_FAIL] kind=30618 event_id=abc123 reason=\"...\" repo=myrepo npub=npub1..."
619 log_warn " 2. Event rejected by write policy event_id=... kind=30617 reason=Invalid announcement: ..."
620 log_warn ""
621 log_warn "See: docs/how-to/migrate-to-ngit-grasp.md (Dependencies section)"
622 log_warn ""
623
624 # Create empty output file with header comment
625 {
626 echo "# Parse failures and invalid announcements extracted from $service"
627 echo "# Time range: ${since_date:-beginning} to ${until_date:-now}"
628 echo "# Extracted: $(date -Iseconds)"
629 echo "#"
630 echo "# Includes:"
631 echo "# - [PARSE_FAIL] structured log entries"
632 echo "# - \"Invalid announcement\" rejections"
633 echo "#"
634 echo "# Format: event_id<TAB>kind<TAB>reason<TAB>repo<TAB>npub"
635 echo "# Note: repo and npub may be empty for some entries"
636 echo "#"
637 echo "# NOTE: No matching entries found."
638 echo "# This is expected if ngit-grasp logging improvements are not yet deployed."
639 } > "$output_file"
640
641 rm -f "$temp_parse_fail" "$temp_write_policy_rejection" "$temp_rejected_announcement"
642 log_info "Created empty output file: $output_file"
643 exit 0
644 fi
645
646 # Write header
647 {
648 echo "# Parse failures and invalid announcements extracted from $service"
649 echo "# Time range: ${since_date:-beginning} to ${until_date:-now}"
650 echo "# Extracted: $(date -Iseconds)"
651 echo "#"
652 echo "# Includes:"
653 echo "# - [PARSE_FAIL] structured log entries"
654 echo "# - \"Invalid announcement\" rejections"
655 echo "#"
656 echo "# Format: event_id<TAB>kind<TAB>reason<TAB>repo<TAB>npub"
657 echo "# Note: repo and npub may be empty for some entries"
658 } > "$output_file"
659
660 # Parse [PARSE_FAIL] entries using batch awk processing
661 log_info " Parsing [PARSE_FAIL] entries..."
662 local parse_fail_count=0
663 if [[ "$parse_fail_line_count" -gt 0 ]]; then
664 parse_parse_fail_batch "$temp_parse_fail" >> "$output_file"
665 parse_fail_count=$(grep -v '^#' "$output_file" | wc -l)
666 parse_fail_count="${parse_fail_count//[^0-9]/}"
667 fi
668
669 # Parse write policy rejection entries using batch awk processing
670 log_info " Parsing write policy rejection entries..."
671 local write_policy_count=0
672 if [[ "$write_policy_line_count" -gt 0 ]]; then
673 local before_count
674 before_count=$(grep -v '^#' "$output_file" 2>/dev/null | wc -l || echo "0")
675 before_count="${before_count//[^0-9]/}"
676 before_count="${before_count:-0}"
677 parse_write_policy_rejection_batch "$temp_write_policy_rejection" >> "$output_file"
678 local after_count
679 after_count=$(grep -v '^#' "$output_file" 2>/dev/null | wc -l || echo "0")
680 after_count="${after_count//[^0-9]/}"
681 after_count="${after_count:-0}"
682 write_policy_count=$((after_count - before_count))
683 fi
684
685 local invalid_announcement_count=$write_policy_count
686
687 # Build enrichment lookup table from "Added rejected announcement" entries
688 local enrichment_lookup_file
689 enrichment_lookup_file=$(mktemp)
690
691 log_info " Building enrichment lookup table..."
692 if [[ "$rejected_announcement_line_count" -gt 0 ]]; then
693 parse_rejected_announcement_batch "$temp_rejected_announcement" > "$enrichment_lookup_file"
694 fi
695
696 rm -f "$temp_parse_fail" "$temp_write_policy_rejection" "$temp_rejected_announcement"
697
698 # Deduplicate by event_id (first column) - keep first occurrence
699 log_info " Deduplicating entries..."
700 local deduped_file
701 deduped_file=$(mktemp)
702 # Preserve header lines (starting with #) and deduplicate data lines
703 grep '^#' "$output_file" > "$deduped_file"
704 grep -v '^#' "$output_file" | sort -t$'\t' -k1,1 -u >> "$deduped_file"
705 mv "$deduped_file" "$output_file"
706
707 # Deduplicate enrichment lookup table by event_id
708 if [[ -s "$enrichment_lookup_file" ]]; then
709 sort -t$'\t' -k1,1 -u "$enrichment_lookup_file" > "$enrichment_lookup_file.deduped"
710 mv "$enrichment_lookup_file.deduped" "$enrichment_lookup_file"
711 fi
712
713 # Enrich with repo/npub from "Added rejected announcement" log entries
714 # This is critical for usability - without it, action-required.txt shows
715 # event_id|kind instead of repo|npub, making parse failures unidentifiable
716 enrich_with_repo_npub "$output_file" "$enrichment_lookup_file"
717
718 rm -f "$enrichment_lookup_file"
719
720 # Count final entries (excluding header lines)
721 local count
722 count=$(grep -v '^#' "$output_file" | wc -l)
723 count="${count//[^0-9]/}" # Strip whitespace
724 count="${count:-0}"
725
726 rm -f "$temp_file"
727
728 # Summary
729 echo ""
730 log_info "=== Extraction Summary ==="
731 log_info "Service: $service"
732 log_info "Time range: ${since_date:-beginning} to ${until_date:-now}"
733 log_success "Extracted $count total entries"
734 log_info " - [PARSE_FAIL] entries: $parse_fail_count"
735 log_info " - Invalid announcement rejections: $invalid_announcement_count"
736 echo ""
737 log_info "Output file: $output_file"
738
739 if [[ $count -gt 0 ]]; then
740 echo ""
741 log_info "Sample entries (first 5):"
742 # Use a subshell to avoid SIGPIPE issues with set -e
743 # New format: event_id<TAB>kind<TAB>reason<TAB>repo<TAB>npub
744 (grep -v '^#' "$output_file" | head -5 | while IFS=$'\t' read -r event_id kind reason repo npub; do
745 echo " kind=$kind event_id=${event_id:0:16}... reason=\"${reason:0:60}...\""
746 done) || true
747 fi
748
749 # Breakdown by kind
750 if [[ $count -gt 0 ]]; then
751 echo ""
752 log_info "Breakdown by event kind:"
753 # Use a subshell to avoid SIGPIPE issues with set -e
754 # kind is now column 2
755 (grep -v '^#' "$output_file" | awk -F'\t' '{print $2}' | sort | uniq -c | sort -rn | while read -r cnt kind; do
756 echo " kind $kind: $cnt failures"
757 done) || true
758 fi
759
760 # Breakdown by reason pattern (for invalid announcements)
761 if [[ $invalid_announcement_count -gt 0 ]]; then
762 echo ""
763 log_info "Breakdown by reason pattern:"
764 # Extract the main reason type (before the colon details)
765 (grep -v '^#' "$output_file" | awk -F'\t' '{print $3}' | sed 's/:.*//' | sort | uniq -c | sort -rn | head -10 | while read -r cnt reason; do
766 echo " $reason: $cnt"
767 done) || true
768 fi
769
770 # Explicit success exit
771 exit 0
772}
773
774main "$@"