upleb.uk

Public git repos — served from a NIP-34 GRASP relay at git.upleb.uk

summaryrefslogtreecommitdiff
path: root/src/sync/relay_connection.rs
AgeCommit message (Collapse)Author
2026-01-21refactor: use mark_negentropy_unsupported() consistentlyDanConwayDev
Refactor internal code to use the mark_negentropy_unsupported() method instead of direct field access for improved readability.
2026-01-21fix: fall back to REQ+EOSE when negentropy retry failsDanConwayDev
When negentropy retry makes no progress (relay returns zero events), this indicates the relay's negentropy implementation is broken. Instead of marking the batch as failed, we now: 1. Mark the relay as not supporting NIP-77 so future batches skip negentropy and use REQ+EOSE directly 2. Fall back to REQ+EOSE using semantic filters (kind/author/tags) for the current batch, which may succeed where ID-based queries fail This addresses the issue where some relays (e.g., azzamo.net, snort.social) return event IDs during negentropy diff but fail to serve those events when requested by ID.
2026-01-10fix: detect NIP-77 NOTICE immediately during negentropy syncDanConwayDev
Previously, when a relay didn't support NIP-77, the negentropy_sync_diff function would wait for the full client.sync() timeout even after receiving a NOTICE message that marked the relay as not supporting NIP-77. This change uses tokio::select! to race the sync operation against a polling task that checks the nip77_supported flag every 10ms. When a NOTICE is received (detected in the message handler), the poll task detects the status change and immediately returns an error, allowing quick fallback to REQ+EOSE without waiting for timeouts. Benefits: - Fast failure (within 10ms) when relay sends NIP-77 NOTICE - No artificial timeout reduction that could hurt legitimate operations - Maintains full timeout for relays that actually support NIP-77
2026-01-10fix: return error when negentropy has failures to enable REQ fallbackDanConwayDev
When negentropy sync times out or has other failures, it now properly returns Err() instead of Ok() with empty reconciliation. This ensures historic_sync increments failed_count and triggers fallback to REQ+EOSE instead of treating it as a successful sync with 0 events. Resolves issue where bootstrap relay timeouts were marked as complete instead of falling back to traditional sync.
2026-01-09improve: detect and skip negentropy for unsupported relaysDanConwayDev
- Upgrade NOTICE log level to INFO when relay rejects negentropy (envelope/NEG- errors) - Track NIP-77 support status per relay connection to avoid repeated failed attempts - Mark relay as unsupported when NOTICE rejection or timeout occurs - Skip negentropy on subsequent syncs during same connection session - Reset support status on reconnect to allow retry after relay upgrades This reduces log noise and eliminates 10-second timeout delays on each historic sync attempt for relays that don't support NIP-77 negentropy. Fixes negentropy-timeout-10-seconds issue by learning from relay behavior.
2026-01-09fix: downgrade negentropy timeout warning to debug levelDanConwayDev
Negentropy diff timeouts are expected when relays don't support NIP-77. The relay responds with NOTICE 'unknown envelope label' and the timeout is hit before we recognize this is unsupported rather than a failure. Changes: - Downgrade from warn! to debug! in negentropy_sync_filter() (src/sync/relay_connection.rs:493) - Add comment explaining timeouts are common for non-NIP-77 relays - Update message to clarify timeout typically means no NIP-77 support The existing fallback mechanism (lines 505-509) properly handles this case and logs a one-time warning about falling back to REQ+EOSE. Discovered via production sync testing against wss://git.shakespeare.diy
2026-01-09chore: cargo fmtDanConwayDev
2026-01-09feat: replace owner-npub with relay-owner-nsec for persistent operator identityDanConwayDev
Replace the owner-npub configuration option with relay-owner-nsec to provide a persistent cryptographic identity for the relay operator. This addresses NIP-42 authentication requirements discovered during sync debugging. Motivation: - Some relays (e.g., relay.damus.io) require NIP-42 authentication for advanced features like NIP-77 negentropy sync - Previously used random ephemeral keys per connection, providing no persistent identity - Other relays can now recognize us by pubkey for reputation-based rate limiting - Ensures consistency between NIP-11 pubkey and authentication key Changes: - Config: relay_owner_nsec with auto-load/generate from .relay-owner.nsec - NIP-11: Pubkey derived from nsec instead of separate npub field - Sync: RelayConnection now uses operator keys for NIP-42 auth - Docs: Updated README, .env.example, and added .relay-owner.nsec to gitignore Key Features: - Auto-generates key on first run and saves to .relay-owner.nsec - Loads existing key from file on subsequent runs - Can override via CLI flag or environment variable - Enables reputation building across relay network - Future-ready for event signing and WoT calculations Testing: - 225/232 tests passing (7 pre-existing purgatory failures unrelated) - Verified key generation, loading, and NIP-11 derivation - Release build successful Related: work/sync-debug-analysis.md, work/relay-owner-nsec-implementation.md
2026-01-08fix: sync-bootstrap-relay-url scheme optionalDanConwayDev
2025-12-22chore: cargo fmt and clippyDanConwayDev
2025-12-22fix: sync consoldate subscription countDanConwayDev
2025-12-22sync: add req rate-limit detection and cooldownDanConwayDev
2025-12-19feat(sync): implement pagination for historic_sync REQ+EOSE flowDanConwayDev
Add automatic pagination support for non-Negentropy historic sync to handle large result sets efficiently. When a subscription receives >= 75 events, the system automatically fetches the next page using the 'until' parameter. Changes: - Add PaginationState struct to track event counts and min timestamps - Add pagination_state HashMap to PendingBatch for per-subscription tracking - Add PAGINATION_THRESHOLD constant (75 events) - Pass pending_sync_index to event processor for state updates - Track events and timestamps as they arrive - Check threshold on EOSE and launch follow-up subscriptions - Initialize pagination state when creating historic sync subscriptions - Update test fixtures in algorithms.rs The pagination continues recursively until a page returns fewer than 75 events, ensuring complete historic data retrieval without overwhelming relay limits.
2025-12-19sync: fix autoclose on EOSE for historic filtersDanConwayDev
2025-12-19refactor: rename connect_and_subscribe to connectDanConwayDev
Separated connection from subscription logic. The RelayConnection.connect() method now only handles WebSocket connection establishment. Subscriptions are managed separately via handle_connect_or_reconnect. Changes: - Renamed RelayConnection::connect_and_subscribe() to connect() - Removed subscription logic from connect method - Updated call site in try_connect_relay() - Removed unused build_announcement_filter import
2025-12-19fix: prevent CLOSED messages from terminating relay connectionsDanConwayDev
The system was incorrectly treating subscription-specific CLOSED messages as connection-wide disconnects, causing live subscriptions to be terminated immediately after historic_sync completed. Two bugs fixed: 1. relay_connection.rs: Removed break on RelayMessage::Closed - it's subscription-specific, not connection-wide 2. mod.rs: Removed disconnect handling for RelayEvent::Closed - only log at DEBUG level and continue All 41 sync tests now pass including previously failing live sync tests.
2025-12-18sync removing dead codeDanConwayDev
2025-12-11fix: resolve all fmt and clippy warningsDanConwayDev
Main lib (src/): - Add #[allow(dead_code)] for build_info field (stored to prevent Prometheus unregistration) - Add #[allow(dead_code)] for first_seen field (reserved for future rate limiting) - Replace .or_insert_with(RelaySyncNeeds::default) with .or_default() - Replace manual div_ceil implementations with .div_ceil(100) Test code (tests/): - Replace .expect(&format!(...)) with .unwrap_or_else(|_| panic!(...)) - Remove needless borrows in fetch_metrics() calls - Add #[allow(dead_code)] and #[allow(unused_imports)] to test helpers module grasp-audit: - Apply cargo fmt to fix formatting
2025-12-11feat: implement NIP-77 negentropy sync for historical dataDanConwayDev
Replace EOSE-based sync completion with negentropy reconciliation for: - Initial connect (fresh sync) - Daily sync (Layer 1 announcements) - Stale reconnect (>15 min) Key changes: - Add NegentropySyncResult struct with remote_only, local_only, received fields - Add supports_negentropy() using try-and-fallback approach - Add negentropy_sync_filter() using nostr-sdk client.sync() API - Modify handle_connect_or_reconnect() to use negentropy for fresh/stale sync - Modify daily_sync() to use negentropy for Layer 1 - Single-warning logging per relay when negentropy fails Quick reconnects (<15 min) unchanged - still use REQ with since filter. If negentropy unsupported, gracefully falls back to REQ+EOSE flow.
2025-12-11docs(sync): document why RelayConnection uses Client instead of Relay directlyDanConwayDev
nostr-sdk 0.44's Relay::new() is pub(crate), making it impossible to construct a Relay directly from outside the crate. Relays can only be created through Client::add_relay() or RelayPool::add_relay(). This commit: - Adds 'Why Client instead of Relay directly?' section to struct docs - Updates run_event_loop() docs to explain the API constraint - Removes outdated 'Future Refactoring' suggestion (not feasible)
2025-12-11refactor: use Relay::notifications() for event-driven disconnect detectionDanConwayDev
Replace the 1-second polling loop with nostr-sdk's relay-level notification system that provides immediate disconnect detection via RelayNotification::RelayStatus. Key changes: - Use relay.notifications() instead of client.notifications() - Handle RelayNotification::RelayStatus { Disconnected | Terminated } to detect connection loss immediately without polling - Remove tokio::select! with interval timer - now uses simple match loop - Handle additional notification types (Authenticated, AuthenticationFailed) Why this is better: - Event-driven vs polling: no wasted CPU cycles checking every second - Immediate detection: disconnect triggers notification instantly - Uses nostr-sdk's built-in mechanism that was previously inaccessible at pool level (RelayStatus notifications are filtered out in RelayPoolNotification) Technical note: RelayNotification::RelayStatus is only available via Relay::notifications(), not Client::notifications(), because the pool-level broadcast filters out status change events. Future refactoring opportunity: Consider restructuring RelayConnection to hold a Relay directly instead of wrapping a Client, since we only manage one relay per connection anyway.
2025-12-11fix: wire up relay disconnection detection for metricsDanConwayDev
- Add periodic health check in RelayConnection::run_event_loop that polls nostr-sdk's relay.is_connected() every second to detect dead connections - When event channel closes without explicit Closed/Shutdown, send DisconnectNotification to SyncManager (fixes case where TCP drops silently) - Enable test_relay_connected_status test which validates the ngit_sync_relay_connected metric correctly reflects connection state The issue was that when a remote relay stops abruptly, nostr-sdk's notification receiver blocks indefinitely waiting for data. TCP disconnect detection without keepalive can take minutes. The health check polls nostr-sdk's internal relay status which detects disconnection promptly.
2025-12-11sync: add sync_base_backoff_secs config for better testingDanConwayDev
2025-12-11sync: improve connection timeout handlingDanConwayDev
2025-12-11fix(sync): improve metrics recording and connection failure detectionDanConwayDev
Changes: - Fix connection attempt metrics: record success/failure based on actual connection result instead of pre-emptively recording failure - Add health tracker integration on connection failure: call record_failure() and record_health_state() in error path - Add connection verification in relay_connection.rs: wait 500ms after connect() then verify is_connected() to detect silent failures - Add configurable disconnect check interval via NGIT_SYNC_DISCONNECT_CHECK_INTERVAL_SECS env var - Update TestRelay with fast test settings: startup_delay=0, jitter=0, disconnect_check_interval=1s - Add debug output to metrics tests for investigation Note: Tests may still fail due to 5-second base backoff in health tracker. A follow-up task will add NGIT_SYNC_BASE_BACKOFF_SECS config parameter to allow faster test cycles. Related: metrics-wiring-plan.md Tasks 1 & 2
2025-12-10sync: fix connection registration issueDanConwayDev
2025-12-10sync: implement filter consolidation systemDanConwayDev
2025-12-10sync v4 mvpDanConwayDev
2025-12-10stub of sync v4DanConwayDev
2025-12-10improve sync designDanConwayDev