| Age | Commit message (Collapse) | Author |
|
- Add rejected events index to architecture.md with two-tier system explanation
- Document NGIT_REJECTED_HOT_CACHE_DURATION_SECS and NGIT_REJECTED_COLD_INDEX_EXPIRY_SECS in configuration.md
- Add comprehensive rejected events metrics section to monitoring.md with Grafana queries and alerts
- Explain negentropy integration with rejected index in grasp-02-proactive-sync.md
- Document state event authorization defense-in-depth and rejection tracking in inline-authorization.md
This integrates information from work/rejected-events-index-summary.md into the main documentation,
ensuring architecture docs accurately reflect the implemented rejected events index system.
|
|
Resolves naming conflict with RelayHealthState::Degraded by using a more
explicit name that clearly indicates the connection status relates to
historic sync failures, not connection health degradation.
Changes:
- ConnectionStatus::ConnectedDegraded → ConnectedHistoricSyncFailures
- Updated all documentation and comments
- Updated Prometheus metric descriptions
- Metric value remains 4 for backward compatibility
This makes it clear that:
- ConnectedHistoricSyncFailures = connection lifecycle (missing historic data)
- RelayHealthState::Degraded = connection health (reliability issues)
These are orthogonal concerns - a relay can be ConnectedHistoricSyncFailures
but Healthy, or Connected but Degraded.
|
|
- Add ConnectionStatus::ConnectedDegraded (status=4 in metrics)
- Track batch failures via PendingBatch.failed field
- Track relay-level failures via RelayState.historic_sync_had_failures
- Transition to ConnectedDegraded when any batch fails during historic sync
- Add is_live_sync_active() helper for cleaner match patterns
- Update state machine diagram with ConnectedDegraded transitions
- Update metrics docs with status=4 and example queries
Fixes issue where relays with failed negentropy retries would
incorrectly transition to Connected status despite missing data.
Now operators can distinguish 'fully synced' vs 'degraded (partial data)'.
|
|
- Add ConnectionStatus::Syncing state between Connecting and Connected
- Track historic_sync_completed and historic_sync_completed_at in RelayState
- Auto-detect sync completion via check_and_complete_historic_sync()
- Update metrics: ngit_sync_relay_connected now shows 0-3 (disconnected/connecting/syncing/connected)
- Update Prometheus metric documentation with new status values
- Add state machine diagram showing Syncing transition
- Operators can now distinguish 'connected but catching up' vs 'fully synced'
|
|
|
|
|