upleb.uk

Public git repos — served from a NIP-34 GRASP relay at git.upleb.uk

summaryrefslogtreecommitdiff
path: root/docs/explanation/grasp-02-proactive-sync.md
diff options
context:
space:
mode:
authorDanConwayDev <DanConwayDev@protonmail.com>2025-12-04 18:43:49 +0000
committerDanConwayDev <DanConwayDev@protonmail.com>2025-12-04 18:43:49 +0000
commitdd403b17e7c74db9443d0891a9de1f0f0f9f89eb (patch)
tree177dd9f664dde3565492c1d11016dabfeda28bbc /docs/explanation/grasp-02-proactive-sync.md
parent950c2e4e68448d2abcad90a31bfffaca6d7bc47e (diff)
feat(sync): Phase 6 - observability and production readiness
- Add SyncMetrics with full Prometheus integration - Track sync gaps via catchup events - Update Grafana dashboard with sync panels - Document all sync configuration options - Update design doc with implementation notes
Diffstat (limited to 'docs/explanation/grasp-02-proactive-sync.md')
-rw-r--r--docs/explanation/grasp-02-proactive-sync.md128
1 files changed, 128 insertions, 0 deletions
diff --git a/docs/explanation/grasp-02-proactive-sync.md b/docs/explanation/grasp-02-proactive-sync.md
index a8af3f4..98531ec 100644
--- a/docs/explanation/grasp-02-proactive-sync.md
+++ b/docs/explanation/grasp-02-proactive-sync.md
@@ -745,3 +745,131 @@ pub struct SyncConfig {
7458. **Dynamic subscription addition** with periodic consolidation 7458. **Dynamic subscription addition** with periodic consolidation
7469. **Custom acceptance policy** excluding rate limiting defaults 7469. **Custom acceptance policy** excluding rate limiting defaults
74710. **Catchup as failure signal** - events found during catchup/daily indicate live sync gaps, tracked in Prometheus 74710. **Catchup as failure signal** - events found during catchup/daily indicate live sync gaps, tracked in Prometheus
748
749---
750
751## Implementation Notes (Phase 6)
752
753This section documents the final implementation as of Phase 6 (Observability & Production Readiness).
754
755### What Was Actually Built
756
757The implementation closely follows the design document with the following completed components:
758
759#### Phase 1: Basic Sync (commit b167f1b)
760- [`SyncManager`](../../src/sync/manager.rs) - Main coordinator for proactive sync
761- Single relay sync via `NGIT_SYNC_RELAY_URL` configuration
762- Event validation through existing [`Nip34WritePolicy`](../../src/nostr/builder.rs)
763
764#### Phase 2: Three-Layer Filters (commit bf558b0)
765- [`FilterService`](../../src/sync/filter.rs) - Builds three-layer filter strategy
766- Layer 1: All kind 30617+30618 (announcements)
767- Layer 2: A/a tag filters for repository events
768- Layer 3: E/e tag filters for related events (PRs, Issues)
769- Multi-relay discovery from stored announcements
770
771#### Phase 3: Health Tracking (commit f639ecf)
772- [`RelayHealthTracker`](../../src/sync/health.rs) - DashMap-based health tracking
773- Three states: Healthy → Degraded → Dead
774- Exponential backoff: 5s → 10s → 20s → ... → max (default 1h)
775- Dead relay detection after 24h continuous failures
776- Startup jitter (0-10s) to prevent thundering herd
777
778#### Phase 4: Dynamic Subscriptions (commit a19ff57)
779- [`SubscriptionManager`](../../src/sync/subscription.rs) - Per-connection subscription tracking
780- Dynamic Layer 2 subscriptions when new announcements arrive
781- Dynamic Layer 3 subscriptions when new PRs/Issues arrive
782- Filter consolidation at threshold (150 filters)
783
784#### Phase 5: Catchup & Gap Detection (commit 950c2e4)
785- [`NegentropyService`](../../src/sync/negentropy.rs) - Gap-filling catchup operations
786- Startup catchup (configurable delay)
787- Reconnection catchup (limited lookback)
788- Daily catchup (not yet implemented - placeholder)
789
790#### Phase 6: Observability (this phase)
791- [`SyncMetrics`](../../src/sync/metrics.rs) - Full Prometheus integration
792- Grafana dashboard panels for sync monitoring
793- Documentation updates
794
795### Differences from Original Design
796
7971. **Negentropy (NIP-77)**: Simplified gap-filling was used instead of full NIP-77 negentropy reconciliation, as nostr-sdk 0.44 lacks built-in negentropy support. The current implementation uses timestamp-based catchup queries.
798
7992. **Filter Consolidation Threshold**: Set at 150 filters (as designed) based on typical relay filter limits.
800
8013. **Health Tracking**: Implemented exactly as designed - in-memory only (not persisted to database), which is acceptable for production as health state rebuilds quickly on restart.
802
8034. **Metric Label Strategy**: Used simpler numeric encoding for health status (1=healthy, 2=degraded, 3=dead) instead of multiple label values per relay, reducing cardinality.
804
8055. **Event Source Tracking**: Implemented four source types (`live`, `startup`, `reconnect`, `daily`) instead of the original (`direct`, `live_sync`, `catchup`, `daily_catchup`).
806
807### Three-Layer Filter Strategy (As Implemented)
808
809```
810Layer 1: Discovery Layer
811├── Query: kinds [30617, 30618] (announcements)
812├── Applied: At startup and during sync
813└── Purpose: Discover all repositories across network
814
815Layer 2: Repository Events
816├── Query: Events with A/a tags pointing to tracked repos
817├── Format: A tag = "30617:<pubkey>:<identifier>"
818├── Triggered: When new announcement is accepted
819└── Purpose: Get PRs, issues, patches for repositories
820
821Layer 3: Related Events
822├── Query: Events with E/e tags pointing to tracked PRs/Issues
823├── Triggered: When new PR/Issue is accepted
824└── Purpose: Get comments, reviews, status updates
825```
826
827### Prometheus Metrics (As Implemented)
828
829| Metric | Type | Labels | Description |
830|--------|------|--------|-------------|
831| `ngit_sync_relay_connected` | Gauge | relay | Connection status (1/0) |
832| `ngit_sync_connection_attempts_total` | Counter | relay, result | Attempts by outcome |
833| `ngit_sync_relay_status` | Gauge | relay | Health state (1/2/3) |
834| `ngit_sync_relay_failures` | Gauge | relay | Consecutive failures |
835| `ngit_sync_events_total` | Counter | source | Events by source type |
836| `ngit_sync_gap_events_total` | Counter | relay | Gap events filled |
837| `ngit_sync_relays_tracked_total` | Gauge | - | Total relays discovered |
838| `ngit_sync_relays_connected_total` | Gauge | - | Currently connected |
839| `ngit_sync_relays_dead_total` | Gauge | - | Dead relay count |
840
841### Configuration Options (As Implemented)
842
843All configuration via environment variables or CLI flags:
844
845| Option | Type | Default | Description |
846|--------|------|---------|-------------|
847| `NGIT_SYNC_RELAY_URL` | String | None | Primary sync relay URL |
848| `NGIT_SYNC_MAX_BACKOFF_SECS` | u64 | 3600 | Max backoff delay (seconds) |
849| `NGIT_SYNC_STARTUP_DELAY_SECS` | u64 | 30 | Catchup delay after startup |
850| `NGIT_SYNC_RECONNECT_DELAY_SECS` | u64 | 10 | Catchup delay after reconnect |
851| `NGIT_SYNC_RECONNECT_LOOKBACK_DAYS` | u64 | 3 | Days to look back on reconnect |
852
853### Module Structure (As Implemented)
854
855```
856src/sync/
857├── mod.rs # Module exports, constants
858├── manager.rs # SyncManager - orchestrates sync
859├── connection.rs # SyncConnection - per-relay WebSocket
860├── filter.rs # FilterService - three-layer filters
861├── health.rs # RelayHealthTracker - health states
862├── metrics.rs # SyncMetrics - Prometheus integration
863├── negentropy.rs # NegentropyService - gap-filling
864└── subscription.rs # SubscriptionManager - dynamic subs
865```
866
867### Production Readiness Checklist
868
869- [x] All metrics exposed at `/metrics` endpoint
870- [x] Health state tracking with configurable backoff
871- [x] Dead relay detection and minimal retry
872- [x] Startup jitter to prevent thundering herd
873- [x] Grafana dashboard with sync panels
874- [x] Configuration documented
875- [x] Integration tests passing