| Age | Commit message (Collapse) | Author |
|
Implement the main sync orchestration function and trigger-based queue
processing for throttled domains:
sync_identifier function:
- Orchestrates syncing git data for a single identifier
- Tries all non-throttled URLs in sequence
- Checks completion after each fetch (no pending events or all OIDs fetched)
- Enqueues with throttled domains when non-throttled URLs are exhausted
- Returns true if complete, false if events remain (for backoff)
ThrottleManager enhancements:
- Add set_context() to provide SyncContext for queue processing
- Add try_process_next() to spawn tasks when capacity frees
- Add process_queued_identifier() to handle queued work
- Update complete_request() to trigger processing on completion
- Update enqueue_identifier() to trigger processing when capacity available
- Add internal methods for non-Arc testing compatibility
Generic function updates:
- Add ?Sized bound to sync_identifier_next_url, sync_identifier_from_url,
sync_identifier, and get_throttled_domains_with_untried_urls for
dynamic dispatch support (Arc<dyn SyncContext>)
Tests:
- sync_identifier_tries_multiple_urls_until_complete: verifies sequential
URL fetching until all OIDs are available
- sync_identifier_enqueues_throttled_domains_when_incomplete: verifies
throttled domains get the identifier enqueued for later processing
- has_queued_work_reflects_queue_state: verifies queue state tracking
|
|
Implement sync_identifier_next_url and sync_identifier_from_url functions
that provide the core URL selection and fetch logic for purgatory sync.
sync_identifier_next_url:
- Pure URL selection logic with no side effects
- Filters out our own domain and already-tried URLs
- Respects domain throttling when domain parameter is None
- Can target a specific domain when domain parameter is Some
sync_identifier_from_url:
- Fetches OIDs from a specific URL via the SyncContext
- Tracks request start/completion with ThrottleManager for rate limiting
- Calls process_newly_available_git_data on successful fetch
Also adds get_throttled_domains_with_untried_urls helper for the main
sync loop to know which DomainThrottle queues to enqueue identifiers to.
These functions are designed to be called by both:
- Main sync loop (tries non-throttled URLs immediately)
- DomainThrottle queue processing (when capacity frees up)
Includes 10 unit tests covering:
- Throttled domain skipping
- Tried URL skipping
- Our domain filtering
- Specific domain targeting
- Fetch success/failure handling
- Throttle request tracking
|
|
Implement the abstraction layer for purgatory sync operations:
- SyncContext trait: defines interface for repository data fetching,
OID existence checks, git fetch operations, and event processing
- ProcessResult: captures outcomes when releasing events from purgatory
- MockSyncContext: test mock with builder pattern for configuring:
- Clone URLs and which OIDs each URL provides
- Needed OIDs (simulates purgatory state)
- URL failure simulation
- Fetch logging for assertions
The trait uses async_trait for async method support and requires
Send + Sync for use in concurrent sync operations.
This abstraction enables unit testing of sync logic without I/O,
while the real implementation (to be added later) will connect
to actual database, git, and relay systems.
|
|
Implements ThrottleManager which manages all per-domain DomainThrottle
instances and provides:
- Throttle status checking via is_throttled() for sync URL selection
- Request tracking via start_request()/complete_request()
- Identifier queue management via enqueue_identifier()
- Automatic domain throttle creation on first access
- Thread-safe access via DashMap with Mutex-wrapped throttles
The manager uses the configured max_concurrent and max_per_minute limits
for all domains. Trigger-based queue processing (set_context,
process_queued_identifier) will be added after SyncContext is available.
Tests verify:
- is_throttled reflects domain capacity correctly
- enqueue_identifier creates domain throttle if needed
- start_request creates domain throttle if needed
|
|
Implement per-domain throttling for purgatory sync operations:
- Concurrent request limit (max in-flight requests per domain)
- Rate limit (max requests per minute via sliding window)
- Fair round-robin queue processing across identifiers
- In-progress tracking to prevent duplicate fetches
- Tried URL tracking per identifier
Add indexmap dependency for ordered iteration in round-robin queue.
Includes 6 unit tests covering:
- Concurrent limit enforcement
- Rate limit enforcement (sliding window)
- Round-robin fair processing
- In-progress identifier skipping
- Round-robin index adjustment on removal
- Tried URL merging on re-enqueue
|
|
Implement the sync queue entry struct that tracks sync state per identifier:
- next_attempt: when the next sync should be attempted
- attempt_count: for backoff calculation (resets on new events)
- in_progress: prevents concurrent syncs for same identifier
Backoff schedule: 20s → 40s → 80s → 120s (capped at 2 minutes)
This is the foundation for the identifier-based purgatory sync system
that will replace the current per-event syncing approach.
|