| Age | Commit message (Collapse) | Author |
|
Implement the production SyncContext that connects to real systems:
- RealSyncContext struct holding purgatory, database, git_data_path,
our_domain, and local_relay references
- fetch_repository_data: delegates to git::authorization module
- collect_needed_oids: collects commit hashes from state events
(branches/tags) and PR events (c-tag) in purgatory
- oid_exists: delegates to git::oid_exists function
- fetch_oids: uses git fetch --depth=1 to retrieve specific OIDs
from remote servers, running in spawn_blocking for async safety
- process_newly_available_git_data: delegates to the unified function
in git::sync module for consistent post-git-data processing
- has_pending_events: delegates to purgatory method
- find_target_repo: finds first existing owner repository on disk
- our_domain: returns configured domain for clone URL filtering
This enables the purgatory sync loop to use real database queries,
git operations, and event processing instead of mocks.
|
|
Implement the main sync loop that runs in the background and processes
identifiers that are ready for git data synchronization:
- Runs every 1 second (hardcoded interval, not configurable)
- Finds all ready identifiers where !in_progress && next_attempt <= now
- Spawns parallel tasks for each ready identifier
- Each task calls sync_identifier to try fetching git data from remotes
- Applies backoff when sync completes but events remain in purgatory
- Removes identifiers from queue when sync completes or no events remain
The loop integrates with the existing sync infrastructure:
- Uses SyncContext trait for testability
- Uses ThrottleManager for domain-based rate limiting
- Uses sync_identifier for the actual fetch orchestration
This enables automatic background fetching of git data for events in
purgatory, complementing the existing push-triggered sync path.
|
|
Implement the main sync orchestration function and trigger-based queue
processing for throttled domains:
sync_identifier function:
- Orchestrates syncing git data for a single identifier
- Tries all non-throttled URLs in sequence
- Checks completion after each fetch (no pending events or all OIDs fetched)
- Enqueues with throttled domains when non-throttled URLs are exhausted
- Returns true if complete, false if events remain (for backoff)
ThrottleManager enhancements:
- Add set_context() to provide SyncContext for queue processing
- Add try_process_next() to spawn tasks when capacity frees
- Add process_queued_identifier() to handle queued work
- Update complete_request() to trigger processing on completion
- Update enqueue_identifier() to trigger processing when capacity available
- Add internal methods for non-Arc testing compatibility
Generic function updates:
- Add ?Sized bound to sync_identifier_next_url, sync_identifier_from_url,
sync_identifier, and get_throttled_domains_with_untried_urls for
dynamic dispatch support (Arc<dyn SyncContext>)
Tests:
- sync_identifier_tries_multiple_urls_until_complete: verifies sequential
URL fetching until all OIDs are available
- sync_identifier_enqueues_throttled_domains_when_incomplete: verifies
throttled domains get the identifier enqueued for later processing
- has_queued_work_reflects_queue_state: verifies queue state tracking
|
|
Implement sync_identifier_next_url and sync_identifier_from_url functions
that provide the core URL selection and fetch logic for purgatory sync.
sync_identifier_next_url:
- Pure URL selection logic with no side effects
- Filters out our own domain and already-tried URLs
- Respects domain throttling when domain parameter is None
- Can target a specific domain when domain parameter is Some
sync_identifier_from_url:
- Fetches OIDs from a specific URL via the SyncContext
- Tracks request start/completion with ThrottleManager for rate limiting
- Calls process_newly_available_git_data on successful fetch
Also adds get_throttled_domains_with_untried_urls helper for the main
sync loop to know which DomainThrottle queues to enqueue identifiers to.
These functions are designed to be called by both:
- Main sync loop (tries non-throttled URLs immediately)
- DomainThrottle queue processing (when capacity frees up)
Includes 10 unit tests covering:
- Throttled domain skipping
- Tried URL skipping
- Our domain filtering
- Specific domain targeting
- Fetch success/failure handling
- Throttle request tracking
|
|
Implement the abstraction layer for purgatory sync operations:
- SyncContext trait: defines interface for repository data fetching,
OID existence checks, git fetch operations, and event processing
- ProcessResult: captures outcomes when releasing events from purgatory
- MockSyncContext: test mock with builder pattern for configuring:
- Clone URLs and which OIDs each URL provides
- Needed OIDs (simulates purgatory state)
- URL failure simulation
- Fetch logging for assertions
The trait uses async_trait for async method support and requires
Send + Sync for use in concurrent sync operations.
This abstraction enables unit testing of sync logic without I/O,
while the real implementation (to be added later) will connect
to actual database, git, and relay systems.
|
|
Implements ThrottleManager which manages all per-domain DomainThrottle
instances and provides:
- Throttle status checking via is_throttled() for sync URL selection
- Request tracking via start_request()/complete_request()
- Identifier queue management via enqueue_identifier()
- Automatic domain throttle creation on first access
- Thread-safe access via DashMap with Mutex-wrapped throttles
The manager uses the configured max_concurrent and max_per_minute limits
for all domains. Trigger-based queue processing (set_context,
process_queued_identifier) will be added after SyncContext is available.
Tests verify:
- is_throttled reflects domain capacity correctly
- enqueue_identifier creates domain throttle if needed
- start_request creates domain throttle if needed
|
|
Implement per-domain throttling for purgatory sync operations:
- Concurrent request limit (max in-flight requests per domain)
- Rate limit (max requests per minute via sliding window)
- Fair round-robin queue processing across identifiers
- In-progress tracking to prevent duplicate fetches
- Tried URL tracking per identifier
Add indexmap dependency for ordered iteration in round-robin queue.
Includes 6 unit tests covering:
- Concurrent limit enforcement
- Rate limit enforcement (sliding window)
- Round-robin fair processing
- In-progress identifier skipping
- Round-robin index adjustment on removal
- Tried URL merging on re-enqueue
|
|
Implement the sync queue entry struct that tracks sync state per identifier:
- next_attempt: when the next sync should be attempted
- attempt_count: for backoff calculation (resets on new events)
- in_progress: prevents concurrent syncs for same identifier
Backoff schedule: 20s → 40s → 80s → 120s (capped at 2 minutes)
This is the foundation for the identifier-based purgatory sync system
that will replace the current per-event syncing approach.
|