| Age | Commit message (Collapse) | Author |
|
- Fix relay_connected() helper to check v >= 2 (Syncing/Connected states)
- Fix unit test to use status value 3 (Connected) instead of 1 (Connecting)
- Fix clippy warning: use .to_vec() instead of .iter().cloned().collect()
All 61 sync integration tests now passing.
All 238 unit tests passing.
Clippy clean.
|
|
- Add ConnectionStatus::ConnectedDegraded (status=4 in metrics)
- Track batch failures via PendingBatch.failed field
- Track relay-level failures via RelayState.historic_sync_had_failures
- Transition to ConnectedDegraded when any batch fails during historic sync
- Add is_live_sync_active() helper for cleaner match patterns
- Update state machine diagram with ConnectedDegraded transitions
- Update metrics docs with status=4 and example queries
Fixes issue where relays with failed negentropy retries would
incorrectly transition to Connected status despite missing data.
Now operators can distinguish 'fully synced' vs 'degraded (partial data)'.
|
|
Add retry protection to negentropy event validation:
- Track retry_count in PendingBatch (incremented on each retry attempt)
- Detect when retry makes zero progress (relay returns no requested events)
- Abort retry and complete batch with partial results when stuck
- Log error with full details when retry protection triggers
This prevents infinite loops when:
- Relay has bugs and returns wrong events for ID queries
- Relay is malicious and returns unrelated events
- Relay has eventual consistency issues
- Network corruption causes incorrect responses
The protection triggers when received_count == 0 on a retry (relay
returned nothing we asked for), indicating the relay will never
provide the missing events.
Future work: Track failed batches in Prometheus metrics
(sync_failed_batches_total) for monitoring and alerting.
|
|
Add validation that all events requested by ID during negentropy sync
are actually received from the relay. When events are missing:
- Log detailed information (requested/received/missing counts and IDs)
- Create retry subscriptions for missing events (chunked by 300)
- Update batch to track only missing events in next round
- Only complete batch after all events received or retry fails
This handles relays that have limits on ID-based queries (e.g., max 150
events per query) by automatically retrying in smaller chunks.
Also excludes purgatory and rejected announcement events from negentropy
requests to avoid re-requesting events we know we can't/won't store.
Note: Current implementation lacks retry limit - infinite loop protection
needed (tracked as future work).
|
|
Add automatic pagination support for non-Negentropy historic sync to handle
large result sets efficiently. When a subscription receives >= 75 events,
the system automatically fetches the next page using the 'until' parameter.
Changes:
- Add PaginationState struct to track event counts and min timestamps
- Add pagination_state HashMap to PendingBatch for per-subscription tracking
- Add PAGINATION_THRESHOLD constant (75 events)
- Pass pending_sync_index to event processor for state updates
- Track events and timestamps as they arrive
- Check threshold on EOSE and launch follow-up subscriptions
- Initialize pagination state when creating historic sync subscriptions
- Update test fixtures in algorithms.rs
The pagination continues recursively until a page returns fewer than 75 events,
ensuring complete historic data retrieval without overwhelming relay limits.
|
|
|
|
|
|
When negentropy (NIP-77) sync was enabled, the RelaySyncIndex was never
updated to reflect historical sync completion. This caused the three-way
diff algorithm in compute_actions() to malfunction, leading to:
- Repeated sync attempts for the same items
- Incorrect filter counting for consolidation
- Potential premature relay disconnection
This fix unifies both sync paths (REQ+EOSE and Negentropy) through a
consistent PendingBatch flow:
1. Added SyncMethod enum to distinguish between sync types
2. Updated PendingBatch struct to include sync_method field
3. Extracted confirm_batch() method for unified batch confirmation
4. Modified negentropy_sync_and_process() to:
- Create a PendingBatch before sync
- Add batch to pending_sync_index
- On success: Remove batch and call confirm_batch()
- On failure: Remove batch without confirming
The confirm_batch() method moves repos and root_events from the batch
to the RelayState.repos and RelayState.root_events, ensuring the
three-way diff works correctly regardless of sync method.
Closes: negentropy-sync-state-tracking.md
|
|
Main lib (src/):
- Add #[allow(dead_code)] for build_info field (stored to prevent Prometheus unregistration)
- Add #[allow(dead_code)] for first_seen field (reserved for future rate limiting)
- Replace .or_insert_with(RelaySyncNeeds::default) with .or_default()
- Replace manual div_ceil implementations with .div_ceil(100)
Test code (tests/):
- Replace .expect(&format!(...)) with .unwrap_or_else(|_| panic!(...))
- Remove needless borrows in fetch_metrics() calls
- Add #[allow(dead_code)] and #[allow(unused_imports)] to test helpers module
grasp-audit:
- Apply cargo fmt to fix formatting
|
|
|