| Age | Commit message (Collapse) | Author |
|
Previously, purgatory sync was using '--depth=1' when fetching OIDs from
remote servers. This created shallow clones with only 1-2 commits instead
of the complete git history.
The fix removes the '--depth=1' flag, allowing git to fetch the complete
commit history chain when fetching specific commit OIDs. This is the
correct behavior for GRASP - users cloning from our relay should get the
full repository history.
Changes:
- Remove '--depth=1' from git fetch command in RealSyncContext::fetch_oids
- Update comment to clarify that full history is fetched
Impact:
- Production repositories will now contain full git history
- Users cloning from the relay will get complete commit chains
- No more 'shallow' files in git repositories
- May be slightly slower due to fetching more data, but correctness is prioritized
Testing:
- All 564 tests pass (276 unit + 288 integration)
- No regressions in existing functionality
Fixes issue documented in work/active-issues/shallow-git-fetch.md
|
|
Implement domain-level naughty list tracking for git remotes, reusing the
existing NaughtyListTracker from relay sync. This prevents repeated attempts
to fetch from git domains with persistent infrastructure issues (SSL/TLS
certificate errors, DNS failures).
Changes:
- Updated NaughtyListTracker to track both relay URLs and git domains
- Added git_naughty_list field to RealSyncContext for error classification
- Modified fetch_oids() to classify git fetch errors and record naughty domains
- Updated sync_identifier_next_url() to filter out naughty domains during URL selection
- Added git_naughty_list parameter to ThrottleManager for domain queue processing
- Threaded naughty list through start_sync_loop and all sync functions
- Updated all tests to pass naughty list parameter
The naughty list uses 12-hour expiration (configurable) to allow domains to
recover from infrastructure issues. First occurrence logs WARN, repeats log DEBUG.
|
|
|
|
- Fix relay_connected() helper to check v >= 2 (Syncing/Connected states)
- Fix unit test to use status value 3 (Connected) instead of 1 (Connecting)
- Fix clippy warning: use .to_vec() instead of .iter().cloned().collect()
All 61 sync integration tests now passing.
All 238 unit tests passing.
Clippy clean.
|
|
The mock was creating multiple clone tags (one per URL), which violated
NIP-34 format and triggered validation errors added in commit 92bfbd3.
NIP-34 specifies: single clone tag with multiple values
["clone", "https://url1.com", "https://url2.com", ...]
NOT multiple clone tags:
["clone", "https://url1.com"]
["clone", "https://url2.com"]
This regression caused 7 purgatory::sync::functions tests to fail because
RepositoryAnnouncement::from_event() now correctly rejects announcements
with multiple clone tags.
Fixes:
- next_url_skips_throttled_domains
- next_url_skips_tried_urls
- next_url_filters_our_domain
- next_url_with_specific_domain
- get_throttled_domains_returns_only_throttled_with_untried
- sync_identifier_enqueues_throttled_domains_when_incomplete
- sync_identifier_tries_multiple_urls_until_complete
All 232 unit tests now pass.
|
|
- Update nostr-relay-builder, nostr-sdk, nostr-lmdb to latest revision
- Update grasp-audit nostr-sdk dependency
- Fix clippy warnings:
- Replace .clone() with std::slice::from_ref() in src/git/sync.rs
- Change &PathBuf to &Path in tests/common/git_server.rs
- Replace vec![] with array literal in src/purgatory/sync/functions.rs
- Update PR_TEST_COMMIT_HASH in grasp-audit due to event generation changes
All 249 tests passing, no breaking changes required.
|
|
|
|
- Prefix unused variable auth_result with underscore
- Prefix unused field git_data_path with underscore in Purgatory struct
- Add #[allow(clippy::too_many_arguments)] to handle_receive_pack
- Replace len() >= 1 with !is_empty()
- Replace .last() with .next_back() on DoubleEndedIterator
- Fix doc list item overindentation
- Replace map_or(true, ...) with is_none_or(...)
- Replace map_or(false, ...) with is_some_and(...)
|
|
Add support for extracting clone URLs from PR/PR-Update events (kind 1618/1619)
during purgatory sync, per NIP-34 specification. This enables fetching PR commits
from URLs specified in the PR event itself, not just from repository announcement
clone URLs.
Changes:
- Add collect_pr_clone_urls() to SyncContext trait
- Implement in RealSyncContext: extract clone tags from PR events in purgatory
- Implement in MockSyncContext: configurable PR clone URLs for testing
- Update sync_identifier_next_url to merge PR clone URLs with announcement URLs
- Update get_throttled_domains_with_untried_urls with same merge logic
- Add unit tests for PR clone URL extraction and filtering
|
|
Implement the production SyncContext that connects to real systems:
- RealSyncContext struct holding purgatory, database, git_data_path,
our_domain, and local_relay references
- fetch_repository_data: delegates to git::authorization module
- collect_needed_oids: collects commit hashes from state events
(branches/tags) and PR events (c-tag) in purgatory
- oid_exists: delegates to git::oid_exists function
- fetch_oids: uses git fetch --depth=1 to retrieve specific OIDs
from remote servers, running in spawn_blocking for async safety
- process_newly_available_git_data: delegates to the unified function
in git::sync module for consistent post-git-data processing
- has_pending_events: delegates to purgatory method
- find_target_repo: finds first existing owner repository on disk
- our_domain: returns configured domain for clone URL filtering
This enables the purgatory sync loop to use real database queries,
git operations, and event processing instead of mocks.
|
|
Replace ~100 lines of duplicated post-push processing in handle_receive_pack
with a single call to the unified process_newly_available_git_data function.
The unified function handles all post-git-data-available processing:
- Discovering satisfiable events from purgatory (state and PR events)
- Syncing OIDs to authorized owner repos
- Aligning refs (+ setting HEAD) in all owner repos
- Saving events to database
- Notifying WebSocket subscribers
- Removing from purgatory
This ensures consistent behavior regardless of how git data arrives
(git push vs purgatory sync fetching from remote servers).
Also mark test-only internal methods with #[cfg(test)] to silence
dead code warnings.
|
|
Implement the unified function that handles all post-git-data-available
processing, regardless of how data arrived (git push or purgatory sync).
This function:
- Discovers satisfiable events from purgatory (state and PR events)
- Syncs OIDs to authorized owner repos
- Aligns refs and sets HEAD
- Saves events to database
- Notifies WebSocket subscribers
- Removes from purgatory
New additions:
- ProcessResult struct for tracking processing outcomes
- process_newly_available_git_data async function in src/git/sync.rs
- Helper functions: extract_identifier_from_repo_path, extract_identifier_from_pr_event
- Purgatory::find_prs_for_identifier method for PR event discovery
- Unit tests for all helper functions
Also fixes:
- Simplified extract_domain to avoid url crate dependency
- Removed unused imports in sync/loop.rs
|
|
Implement the main sync loop that runs in the background and processes
identifiers that are ready for git data synchronization:
- Runs every 1 second (hardcoded interval, not configurable)
- Finds all ready identifiers where !in_progress && next_attempt <= now
- Spawns parallel tasks for each ready identifier
- Each task calls sync_identifier to try fetching git data from remotes
- Applies backoff when sync completes but events remain in purgatory
- Removes identifiers from queue when sync completes or no events remain
The loop integrates with the existing sync infrastructure:
- Uses SyncContext trait for testability
- Uses ThrottleManager for domain-based rate limiting
- Uses sync_identifier for the actual fetch orchestration
This enables automatic background fetching of git data for events in
purgatory, complementing the existing push-triggered sync path.
|
|
Implement the main sync orchestration function and trigger-based queue
processing for throttled domains:
sync_identifier function:
- Orchestrates syncing git data for a single identifier
- Tries all non-throttled URLs in sequence
- Checks completion after each fetch (no pending events or all OIDs fetched)
- Enqueues with throttled domains when non-throttled URLs are exhausted
- Returns true if complete, false if events remain (for backoff)
ThrottleManager enhancements:
- Add set_context() to provide SyncContext for queue processing
- Add try_process_next() to spawn tasks when capacity frees
- Add process_queued_identifier() to handle queued work
- Update complete_request() to trigger processing on completion
- Update enqueue_identifier() to trigger processing when capacity available
- Add internal methods for non-Arc testing compatibility
Generic function updates:
- Add ?Sized bound to sync_identifier_next_url, sync_identifier_from_url,
sync_identifier, and get_throttled_domains_with_untried_urls for
dynamic dispatch support (Arc<dyn SyncContext>)
Tests:
- sync_identifier_tries_multiple_urls_until_complete: verifies sequential
URL fetching until all OIDs are available
- sync_identifier_enqueues_throttled_domains_when_incomplete: verifies
throttled domains get the identifier enqueued for later processing
- has_queued_work_reflects_queue_state: verifies queue state tracking
|
|
Implement sync_identifier_next_url and sync_identifier_from_url functions
that provide the core URL selection and fetch logic for purgatory sync.
sync_identifier_next_url:
- Pure URL selection logic with no side effects
- Filters out our own domain and already-tried URLs
- Respects domain throttling when domain parameter is None
- Can target a specific domain when domain parameter is Some
sync_identifier_from_url:
- Fetches OIDs from a specific URL via the SyncContext
- Tracks request start/completion with ThrottleManager for rate limiting
- Calls process_newly_available_git_data on successful fetch
Also adds get_throttled_domains_with_untried_urls helper for the main
sync loop to know which DomainThrottle queues to enqueue identifiers to.
These functions are designed to be called by both:
- Main sync loop (tries non-throttled URLs immediately)
- DomainThrottle queue processing (when capacity frees up)
Includes 10 unit tests covering:
- Throttled domain skipping
- Tried URL skipping
- Our domain filtering
- Specific domain targeting
- Fetch success/failure handling
- Throttle request tracking
|
|
Implement the abstraction layer for purgatory sync operations:
- SyncContext trait: defines interface for repository data fetching,
OID existence checks, git fetch operations, and event processing
- ProcessResult: captures outcomes when releasing events from purgatory
- MockSyncContext: test mock with builder pattern for configuring:
- Clone URLs and which OIDs each URL provides
- Needed OIDs (simulates purgatory state)
- URL failure simulation
- Fetch logging for assertions
The trait uses async_trait for async method support and requires
Send + Sync for use in concurrent sync operations.
This abstraction enables unit testing of sync logic without I/O,
while the real implementation (to be added later) will connect
to actual database, git, and relay systems.
|
|
Implements ThrottleManager which manages all per-domain DomainThrottle
instances and provides:
- Throttle status checking via is_throttled() for sync URL selection
- Request tracking via start_request()/complete_request()
- Identifier queue management via enqueue_identifier()
- Automatic domain throttle creation on first access
- Thread-safe access via DashMap with Mutex-wrapped throttles
The manager uses the configured max_concurrent and max_per_minute limits
for all domains. Trigger-based queue processing (set_context,
process_queued_identifier) will be added after SyncContext is available.
Tests verify:
- is_throttled reflects domain capacity correctly
- enqueue_identifier creates domain throttle if needed
- start_request creates domain throttle if needed
|
|
Implement per-domain throttling for purgatory sync operations:
- Concurrent request limit (max in-flight requests per domain)
- Rate limit (max requests per minute via sliding window)
- Fair round-robin queue processing across identifiers
- In-progress tracking to prevent duplicate fetches
- Tried URL tracking per identifier
Add indexmap dependency for ordered iteration in round-robin queue.
Includes 6 unit tests covering:
- Concurrent limit enforcement
- Rate limit enforcement (sliding window)
- Round-robin fair processing
- In-progress identifier skipping
- Round-robin index adjustment on removal
- Tried URL merging on re-enqueue
|
|
Implement the sync queue entry struct that tracks sync state per identifier:
- next_attempt: when the next sync should be attempted
- attempt_count: for backoff calculation (resets on new events)
- in_progress: prevents concurrent syncs for same identifier
Backoff schedule: 20s → 40s → 80s → 120s (capped at 2 minutes)
This is the foundation for the identifier-based purgatory sync system
that will replace the current per-event syncing approach.
|