| Age | Commit message (Collapse) | Author |
|
Change protocol error detection to only match WebSocket-specific errors
(websocket, invalid frame) instead of generic 'protocol' keyword which
was incorrectly catching transient git protocol errors.
Git protocol errors like 'fatal: protocol error: bad line length' are
transient network issues that should use backoff/retry, not permanent
naughty list blocking. Only WebSocket/Nostr protocol violations indicate
persistent infrastructure problems.
Fixes production false positive:
- relay.ngit.dev: git protocol error + remote warning misclassified
Add production test cases for git protocol errors and warning combinations.
|
|
Strip URLs (http://, https://, git://, ws://, wss://) from error messages
before classification to prevent false positives from repository names,
paths, or identifiers containing keywords like 'ssl', 'certificate', etc.
- Add strip_urls() function to remove URLs before pattern matching
- Add WebSocket protocol support (ws://, wss://) for relay errors
- Filter remote warnings that don't indicate infrastructure problems
- Use more specific SSL/TLS patterns to avoid npub substring matches
- Reduce test suite from 40 to 13 tests, keeping only edge cases
Fixes false positives seen in production:
- git.shakespeare.diy: 'repository not found' with npub containing 'ssl'
- relay.ngit.dev: HTTP 500 error with npub containing 'ssl'
- gitnostr.com: remote permission warning misclassified as protocol error
|
|
caught a production bug where npub in url string contained "dns"
triggering false positive
|
|
Implement domain-level naughty list tracking for git remotes, reusing the
existing NaughtyListTracker from relay sync. This prevents repeated attempts
to fetch from git domains with persistent infrastructure issues (SSL/TLS
certificate errors, DNS failures).
Changes:
- Updated NaughtyListTracker to track both relay URLs and git domains
- Added git_naughty_list field to RealSyncContext for error classification
- Modified fetch_oids() to classify git fetch errors and record naughty domains
- Updated sync_identifier_next_url() to filter out naughty domains during URL selection
- Added git_naughty_list parameter to ThrottleManager for domain queue processing
- Threaded naughty list through start_sync_loop and all sync functions
- Updated all tests to pass naughty list parameter
The naughty list uses 12-hour expiration (configurable) to allow domains to
recover from infrastructure issues. First occurrence logs WARN, repeats log DEBUG.
|
|
Add naughty list tracking for relays with persistent infrastructure issues
(DNS failures, TLS certificate errors, protocol violations) to reduce log
noise and provide better visibility via metrics.
Key features:
- Classify errors into naughty (persistent) vs transient (temporary)
- Track naughty relays with category, reason, and occurrence count
- Log WARN on first naughty occurrence, DEBUG on repeats
- Automatic expiration after 12 hours (configurable)
- Prometheus metrics for monitoring naughty relays by category
- Periodic cleanup task integrated with health checker
Components added:
- src/sync/naughty_list.rs: Core naughty list tracker with error classification
- NaughtyListTracker integration in RelayHealthTracker
- Connection error handling updates in sync manager
- Naughty list metrics (total by category, detailed info per relay)
- Config option for naughty_list_expiration_hours (default: 12)
Closes DNS lookup failures and TLS certificate errors tracking issues.
|