upleb.uk

Public git repos — served from a NIP-34 GRASP relay at git.upleb.uk

summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/explanation/grasp-02-proactive-sync.md128
-rw-r--r--docs/grafana/ngit-grasp-dashboard.json334
-rw-r--r--docs/reference/configuration.md137
3 files changed, 599 insertions, 0 deletions
diff --git a/docs/explanation/grasp-02-proactive-sync.md b/docs/explanation/grasp-02-proactive-sync.md
index a8af3f4..98531ec 100644
--- a/docs/explanation/grasp-02-proactive-sync.md
+++ b/docs/explanation/grasp-02-proactive-sync.md
@@ -745,3 +745,131 @@ pub struct SyncConfig {
7458. **Dynamic subscription addition** with periodic consolidation 7458. **Dynamic subscription addition** with periodic consolidation
7469. **Custom acceptance policy** excluding rate limiting defaults 7469. **Custom acceptance policy** excluding rate limiting defaults
74710. **Catchup as failure signal** - events found during catchup/daily indicate live sync gaps, tracked in Prometheus 74710. **Catchup as failure signal** - events found during catchup/daily indicate live sync gaps, tracked in Prometheus
748
749---
750
751## Implementation Notes (Phase 6)
752
753This section documents the final implementation as of Phase 6 (Observability & Production Readiness).
754
755### What Was Actually Built
756
757The implementation closely follows the design document with the following completed components:
758
759#### Phase 1: Basic Sync (commit b167f1b)
760- [`SyncManager`](../../src/sync/manager.rs) - Main coordinator for proactive sync
761- Single relay sync via `NGIT_SYNC_RELAY_URL` configuration
762- Event validation through existing [`Nip34WritePolicy`](../../src/nostr/builder.rs)
763
764#### Phase 2: Three-Layer Filters (commit bf558b0)
765- [`FilterService`](../../src/sync/filter.rs) - Builds three-layer filter strategy
766- Layer 1: All kind 30617+30618 (announcements)
767- Layer 2: A/a tag filters for repository events
768- Layer 3: E/e tag filters for related events (PRs, Issues)
769- Multi-relay discovery from stored announcements
770
771#### Phase 3: Health Tracking (commit f639ecf)
772- [`RelayHealthTracker`](../../src/sync/health.rs) - DashMap-based health tracking
773- Three states: Healthy → Degraded → Dead
774- Exponential backoff: 5s → 10s → 20s → ... → max (default 1h)
775- Dead relay detection after 24h continuous failures
776- Startup jitter (0-10s) to prevent thundering herd
777
778#### Phase 4: Dynamic Subscriptions (commit a19ff57)
779- [`SubscriptionManager`](../../src/sync/subscription.rs) - Per-connection subscription tracking
780- Dynamic Layer 2 subscriptions when new announcements arrive
781- Dynamic Layer 3 subscriptions when new PRs/Issues arrive
782- Filter consolidation at threshold (150 filters)
783
784#### Phase 5: Catchup & Gap Detection (commit 950c2e4)
785- [`NegentropyService`](../../src/sync/negentropy.rs) - Gap-filling catchup operations
786- Startup catchup (configurable delay)
787- Reconnection catchup (limited lookback)
788- Daily catchup (not yet implemented - placeholder)
789
790#### Phase 6: Observability (this phase)
791- [`SyncMetrics`](../../src/sync/metrics.rs) - Full Prometheus integration
792- Grafana dashboard panels for sync monitoring
793- Documentation updates
794
795### Differences from Original Design
796
7971. **Negentropy (NIP-77)**: Simplified gap-filling was used instead of full NIP-77 negentropy reconciliation, as nostr-sdk 0.44 lacks built-in negentropy support. The current implementation uses timestamp-based catchup queries.
798
7992. **Filter Consolidation Threshold**: Set at 150 filters (as designed) based on typical relay filter limits.
800
8013. **Health Tracking**: Implemented exactly as designed - in-memory only (not persisted to database), which is acceptable for production as health state rebuilds quickly on restart.
802
8034. **Metric Label Strategy**: Used simpler numeric encoding for health status (1=healthy, 2=degraded, 3=dead) instead of multiple label values per relay, reducing cardinality.
804
8055. **Event Source Tracking**: Implemented four source types (`live`, `startup`, `reconnect`, `daily`) instead of the original (`direct`, `live_sync`, `catchup`, `daily_catchup`).
806
807### Three-Layer Filter Strategy (As Implemented)
808
809```
810Layer 1: Discovery Layer
811├── Query: kinds [30617, 30618] (announcements)
812├── Applied: At startup and during sync
813└── Purpose: Discover all repositories across network
814
815Layer 2: Repository Events
816├── Query: Events with A/a tags pointing to tracked repos
817├── Format: A tag = "30617:<pubkey>:<identifier>"
818├── Triggered: When new announcement is accepted
819└── Purpose: Get PRs, issues, patches for repositories
820
821Layer 3: Related Events
822├── Query: Events with E/e tags pointing to tracked PRs/Issues
823├── Triggered: When new PR/Issue is accepted
824└── Purpose: Get comments, reviews, status updates
825```
826
827### Prometheus Metrics (As Implemented)
828
829| Metric | Type | Labels | Description |
830|--------|------|--------|-------------|
831| `ngit_sync_relay_connected` | Gauge | relay | Connection status (1/0) |
832| `ngit_sync_connection_attempts_total` | Counter | relay, result | Attempts by outcome |
833| `ngit_sync_relay_status` | Gauge | relay | Health state (1/2/3) |
834| `ngit_sync_relay_failures` | Gauge | relay | Consecutive failures |
835| `ngit_sync_events_total` | Counter | source | Events by source type |
836| `ngit_sync_gap_events_total` | Counter | relay | Gap events filled |
837| `ngit_sync_relays_tracked_total` | Gauge | - | Total relays discovered |
838| `ngit_sync_relays_connected_total` | Gauge | - | Currently connected |
839| `ngit_sync_relays_dead_total` | Gauge | - | Dead relay count |
840
841### Configuration Options (As Implemented)
842
843All configuration via environment variables or CLI flags:
844
845| Option | Type | Default | Description |
846|--------|------|---------|-------------|
847| `NGIT_SYNC_RELAY_URL` | String | None | Primary sync relay URL |
848| `NGIT_SYNC_MAX_BACKOFF_SECS` | u64 | 3600 | Max backoff delay (seconds) |
849| `NGIT_SYNC_STARTUP_DELAY_SECS` | u64 | 30 | Catchup delay after startup |
850| `NGIT_SYNC_RECONNECT_DELAY_SECS` | u64 | 10 | Catchup delay after reconnect |
851| `NGIT_SYNC_RECONNECT_LOOKBACK_DAYS` | u64 | 3 | Days to look back on reconnect |
852
853### Module Structure (As Implemented)
854
855```
856src/sync/
857├── mod.rs # Module exports, constants
858├── manager.rs # SyncManager - orchestrates sync
859├── connection.rs # SyncConnection - per-relay WebSocket
860├── filter.rs # FilterService - three-layer filters
861├── health.rs # RelayHealthTracker - health states
862├── metrics.rs # SyncMetrics - Prometheus integration
863├── negentropy.rs # NegentropyService - gap-filling
864└── subscription.rs # SubscriptionManager - dynamic subs
865```
866
867### Production Readiness Checklist
868
869- [x] All metrics exposed at `/metrics` endpoint
870- [x] Health state tracking with configurable backoff
871- [x] Dead relay detection and minimal retry
872- [x] Startup jitter to prevent thundering herd
873- [x] Grafana dashboard with sync panels
874- [x] Configuration documented
875- [x] Integration tests passing
diff --git a/docs/grafana/ngit-grasp-dashboard.json b/docs/grafana/ngit-grasp-dashboard.json
index bd1b6fe..3b9b216 100644
--- a/docs/grafana/ngit-grasp-dashboard.json
+++ b/docs/grafana/ngit-grasp-dashboard.json
@@ -641,6 +641,340 @@
641 ], 641 ],
642 "title": "Events Stored vs Rejected (5m)", 642 "title": "Events Stored vs Rejected (5m)",
643 "type": "timeseries" 643 "type": "timeseries"
644 },
645 {
646 "collapsed": false,
647 "gridPos": { "h": 1, "w": 24, "x": 0, "y": 48 },
648 "id": 40,
649 "title": "Proactive Sync",
650 "type": "row"
651 },
652 {
653 "datasource": { "type": "prometheus", "uid": "${datasource}" },
654 "fieldConfig": {
655 "defaults": {
656 "color": { "mode": "palette-classic" },
657 "custom": {
658 "axisCenteredZero": false,
659 "axisColorMode": "text",
660 "axisLabel": "",
661 "axisPlacement": "auto",
662 "barAlignment": 0,
663 "drawStyle": "line",
664 "fillOpacity": 10,
665 "gradientMode": "none",
666 "hideFrom": { "legend": false, "tooltip": false, "viz": false },
667 "lineInterpolation": "linear",
668 "lineWidth": 1,
669 "pointSize": 5,
670 "scaleDistribution": { "type": "linear" },
671 "showPoints": "never",
672 "spanNulls": false,
673 "stacking": { "group": "A", "mode": "none" },
674 "thresholdsStyle": { "mode": "off" }
675 },
676 "mappings": [],
677 "thresholds": { "mode": "absolute", "steps": [] },
678 "unit": "short"
679 }
680 },
681 "gridPos": { "h": 8, "w": 12, "x": 0, "y": 49 },
682 "id": 41,
683 "options": {
684 "legend": { "calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true },
685 "tooltip": { "mode": "multi", "sort": "none" }
686 },
687 "targets": [
688 {
689 "expr": "ngit_sync_relays_connected_total",
690 "legendFormat": "Connected",
691 "refId": "A"
692 },
693 {
694 "expr": "ngit_sync_relays_tracked_total",
695 "legendFormat": "Tracked",
696 "refId": "B"
697 },
698 {
699 "expr": "ngit_sync_relays_dead_total",
700 "legendFormat": "Dead",
701 "refId": "C"
702 }
703 ],
704 "title": "Sync Relays Over Time",
705 "type": "timeseries"
706 },
707 {
708 "datasource": { "type": "prometheus", "uid": "${datasource}" },
709 "fieldConfig": {
710 "defaults": {
711 "color": { "mode": "palette-classic" },
712 "custom": { "hideFrom": { "legend": false, "tooltip": false, "viz": false } },
713 "mappings": [],
714 "unit": "short"
715 },
716 "overrides": [
717 {
718 "matcher": { "id": "byName", "options": "healthy" },
719 "properties": [{ "id": "color", "value": { "fixedColor": "green", "mode": "fixed" } }]
720 },
721 {
722 "matcher": { "id": "byName", "options": "degraded" },
723 "properties": [{ "id": "color", "value": { "fixedColor": "yellow", "mode": "fixed" } }]
724 },
725 {
726 "matcher": { "id": "byName", "options": "dead" },
727 "properties": [{ "id": "color", "value": { "fixedColor": "red", "mode": "fixed" } }]
728 }
729 ]
730 },
731 "gridPos": { "h": 8, "w": 6, "x": 12, "y": 49 },
732 "id": 42,
733 "options": {
734 "legend": { "displayMode": "list", "placement": "right", "showLegend": true },
735 "pieType": "pie",
736 "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
737 "tooltip": { "mode": "single", "sort": "none" }
738 },
739 "targets": [
740 {
741 "expr": "count(ngit_sync_relay_status == 1)",
742 "legendFormat": "healthy",
743 "refId": "A"
744 },
745 {
746 "expr": "count(ngit_sync_relay_status == 2)",
747 "legendFormat": "degraded",
748 "refId": "B"
749 },
750 {
751 "expr": "count(ngit_sync_relay_status == 3)",
752 "legendFormat": "dead",
753 "refId": "C"
754 }
755 ],
756 "title": "Relay Health Distribution",
757 "type": "piechart"
758 },
759 {
760 "datasource": { "type": "prometheus", "uid": "${datasource}" },
761 "fieldConfig": {
762 "defaults": {
763 "color": { "mode": "thresholds" },
764 "mappings": [],
765 "thresholds": {
766 "mode": "absolute",
767 "steps": [
768 { "color": "green", "value": null },
769 { "color": "yellow", "value": 1 },
770 { "color": "red", "value": 5 }
771 ]
772 },
773 "unit": "short"
774 }
775 },
776 "gridPos": { "h": 4, "w": 3, "x": 18, "y": 49 },
777 "id": 43,
778 "options": {
779 "colorMode": "value",
780 "graphMode": "none",
781 "justifyMode": "auto",
782 "orientation": "auto",
783 "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
784 "textMode": "auto"
785 },
786 "targets": [
787 {
788 "expr": "ngit_sync_relays_dead_total",
789 "legendFormat": "Dead",
790 "refId": "A"
791 }
792 ],
793 "title": "Dead Relays",
794 "type": "stat"
795 },
796 {
797 "datasource": { "type": "prometheus", "uid": "${datasource}" },
798 "fieldConfig": {
799 "defaults": {
800 "color": { "mode": "thresholds" },
801 "mappings": [],
802 "thresholds": {
803 "mode": "absolute",
804 "steps": [{ "color": "blue", "value": null }]
805 },
806 "unit": "short"
807 }
808 },
809 "gridPos": { "h": 4, "w": 3, "x": 21, "y": 49 },
810 "id": 44,
811 "options": {
812 "colorMode": "value",
813 "graphMode": "none",
814 "justifyMode": "auto",
815 "orientation": "auto",
816 "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
817 "textMode": "auto"
818 },
819 "targets": [
820 {
821 "expr": "ngit_sync_relays_connected_total",
822 "legendFormat": "Connected",
823 "refId": "A"
824 }
825 ],
826 "title": "Connected Relays",
827 "type": "stat"
828 },
829 {
830 "datasource": { "type": "prometheus", "uid": "${datasource}" },
831 "fieldConfig": {
832 "defaults": {
833 "color": { "mode": "palette-classic" },
834 "custom": {
835 "axisCenteredZero": false,
836 "axisColorMode": "text",
837 "axisLabel": "",
838 "axisPlacement": "auto",
839 "barAlignment": 0,
840 "drawStyle": "bars",
841 "fillOpacity": 50,
842 "gradientMode": "none",
843 "hideFrom": { "legend": false, "tooltip": false, "viz": false },
844 "lineInterpolation": "linear",
845 "lineWidth": 1,
846 "pointSize": 5,
847 "scaleDistribution": { "type": "linear" },
848 "showPoints": "never",
849 "spanNulls": false,
850 "stacking": { "group": "A", "mode": "normal" },
851 "thresholdsStyle": { "mode": "off" }
852 },
853 "mappings": [],
854 "thresholds": { "mode": "absolute", "steps": [] },
855 "unit": "short"
856 },
857 "overrides": [
858 {
859 "matcher": { "id": "byName", "options": "success" },
860 "properties": [{ "id": "color", "value": { "fixedColor": "green", "mode": "fixed" } }]
861 },
862 {
863 "matcher": { "id": "byName", "options": "failure" },
864 "properties": [{ "id": "color", "value": { "fixedColor": "red", "mode": "fixed" } }]
865 }
866 ]
867 },
868 "gridPos": { "h": 4, "w": 6, "x": 18, "y": 53 },
869 "id": 45,
870 "options": {
871 "legend": { "calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true },
872 "tooltip": { "mode": "multi", "sort": "none" }
873 },
874 "targets": [
875 {
876 "expr": "increase(ngit_sync_connection_attempts_total{result=\"success\"}[5m])",
877 "legendFormat": "success",
878 "refId": "A"
879 },
880 {
881 "expr": "increase(ngit_sync_connection_attempts_total{result=\"failure\"}[5m])",
882 "legendFormat": "failure",
883 "refId": "B"
884 }
885 ],
886 "title": "Connection Attempts (5m)",
887 "type": "timeseries"
888 },
889 {
890 "datasource": { "type": "prometheus", "uid": "${datasource}" },
891 "fieldConfig": {
892 "defaults": {
893 "color": { "mode": "palette-classic" },
894 "custom": {
895 "axisCenteredZero": false,
896 "axisColorMode": "text",
897 "axisLabel": "",
898 "axisPlacement": "auto",
899 "barAlignment": 0,
900 "drawStyle": "line",
901 "fillOpacity": 10,
902 "gradientMode": "none",
903 "hideFrom": { "legend": false, "tooltip": false, "viz": false },
904 "lineInterpolation": "linear",
905 "lineWidth": 1,
906 "pointSize": 5,
907 "scaleDistribution": { "type": "linear" },
908 "showPoints": "never",
909 "spanNulls": false,
910 "stacking": { "group": "A", "mode": "none" },
911 "thresholdsStyle": { "mode": "off" }
912 },
913 "mappings": [],
914 "thresholds": { "mode": "absolute", "steps": [] },
915 "unit": "short"
916 }
917 },
918 "gridPos": { "h": 8, "w": 12, "x": 0, "y": 57 },
919 "id": 46,
920 "options": {
921 "legend": { "calcs": ["sum"], "displayMode": "table", "placement": "right", "showLegend": true },
922 "tooltip": { "mode": "multi", "sort": "none" }
923 },
924 "targets": [
925 {
926 "expr": "rate(ngit_sync_events_total[5m])",
927 "legendFormat": "{{source}}",
928 "refId": "A"
929 }
930 ],
931 "title": "Synced Events by Source (5m)",
932 "type": "timeseries"
933 },
934 {
935 "datasource": { "type": "prometheus", "uid": "${datasource}" },
936 "fieldConfig": {
937 "defaults": {
938 "color": { "mode": "palette-classic" },
939 "custom": {
940 "axisCenteredZero": false,
941 "axisColorMode": "text",
942 "axisLabel": "",
943 "axisPlacement": "auto",
944 "barAlignment": 0,
945 "drawStyle": "bars",
946 "fillOpacity": 50,
947 "gradientMode": "none",
948 "hideFrom": { "legend": false, "tooltip": false, "viz": false },
949 "lineInterpolation": "linear",
950 "lineWidth": 1,
951 "pointSize": 5,
952 "scaleDistribution": { "type": "linear" },
953 "showPoints": "never",
954 "spanNulls": false,
955 "stacking": { "group": "A", "mode": "normal" },
956 "thresholdsStyle": { "mode": "off" }
957 },
958 "mappings": [],
959 "thresholds": { "mode": "absolute", "steps": [] },
960 "unit": "short"
961 }
962 },
963 "gridPos": { "h": 8, "w": 12, "x": 12, "y": 57 },
964 "id": 47,
965 "options": {
966 "legend": { "calcs": ["sum"], "displayMode": "table", "placement": "right", "showLegend": true },
967 "tooltip": { "mode": "multi", "sort": "none" }
968 },
969 "targets": [
970 {
971 "expr": "increase(ngit_sync_gap_events_total[1h])",
972 "legendFormat": "{{relay}}",
973 "refId": "A"
974 }
975 ],
976 "title": "Gap Events Filled by Relay (1h)",
977 "type": "timeseries"
644 } 978 }
645 ], 979 ],
646 "refresh": "30s", 980 "refresh": "30s",
diff --git a/docs/reference/configuration.md b/docs/reference/configuration.md
index e2ec9aa..80ae45c 100644
--- a/docs/reference/configuration.md
+++ b/docs/reference/configuration.md
@@ -265,6 +265,143 @@ NGIT_DATABASE_BACKEND=lmdb
265 265
266--- 266---
267 267
268### Proactive Sync Configuration (GRASP-02)
269
270These options configure the proactive sync feature that synchronizes events from other relays.
271
272#### `NGIT_SYNC_RELAY_URL`
273
274**Description:** URL of the primary relay to sync events from
275**Type:** String (WebSocket URL)
276**Default:** None (sync disabled)
277**Required:** No
278
279**Examples:**
280```bash
281# Sync from a public relay
282NGIT_SYNC_RELAY_URL=wss://relay.example.com
283
284# Sync from another GRASP relay
285NGIT_SYNC_RELAY_URL=wss://git.nostr.dev
286
287# Local testing
288NGIT_SYNC_RELAY_URL=ws://127.0.0.1:8081
289```
290
291**Notes:**
292- When set, enables proactive sync feature
293- The relay will discover additional relays from repository announcements
294- Synced events go through the same validation as directly-submitted events
295- Use WebSocket protocol (`ws://` or `wss://`)
296
297---
298
299#### `NGIT_SYNC_MAX_BACKOFF_SECS`
300
301**Description:** Maximum backoff time in seconds for sync relay reconnection
302**Type:** Integer (seconds)
303**Default:** `3600` (1 hour)
304**Required:** No
305
306**Examples:**
307```bash
308# Default: 1 hour max backoff
309NGIT_SYNC_MAX_BACKOFF_SECS=3600
310
311# Aggressive: 5 minute max backoff
312NGIT_SYNC_MAX_BACKOFF_SECS=300
313
314# Conservative: 2 hour max backoff
315NGIT_SYNC_MAX_BACKOFF_SECS=7200
316```
317
318**Notes:**
319- Backoff starts at 5 seconds and doubles on each failure
320- Capped at this maximum value
321- After 24 hours of failures, relay is marked "dead" and retried daily
322- Lower values mean more reconnection attempts
323
324---
325
326#### `NGIT_SYNC_STARTUP_DELAY_SECS`
327
328**Description:** Delay in seconds before running startup catchup
329**Type:** Integer (seconds)
330**Default:** `30`
331**Required:** No
332
333**Examples:**
334```bash
335# Default: 30 second delay
336NGIT_SYNC_STARTUP_DELAY_SECS=30
337
338# Quick startup (testing)
339NGIT_SYNC_STARTUP_DELAY_SECS=5
340
341# Production: longer warm-up
342NGIT_SYNC_STARTUP_DELAY_SECS=60
343```
344
345**Notes:**
346- Allows connections to stabilize before catchup
347- Reduces load on remote relays at startup
348- Set to 0 for immediate catchup (not recommended)
349
350---
351
352#### `NGIT_SYNC_RECONNECT_DELAY_SECS`
353
354**Description:** Delay in seconds before running catchup after reconnection
355**Type:** Integer (seconds)
356**Default:** `10`
357**Required:** No
358
359**Examples:**
360```bash
361# Default: 10 second delay
362NGIT_SYNC_RECONNECT_DELAY_SECS=10
363
364# Quick reconnect catchup
365NGIT_SYNC_RECONNECT_DELAY_SECS=5
366
367# Conservative
368NGIT_SYNC_RECONNECT_DELAY_SECS=30
369```
370
371**Notes:**
372- Prevents rate limiting from remote relays
373- Applied after each successful reconnection
374- Only catches up on recent events (see lookback days)
375
376---
377
378#### `NGIT_SYNC_RECONNECT_LOOKBACK_DAYS`
379
380**Description:** Number of days to look back for reconnect catchup
381**Type:** Integer (days)
382**Default:** `3`
383**Required:** No
384
385**Examples:**
386```bash
387# Default: 3 days lookback
388NGIT_SYNC_RECONNECT_LOOKBACK_DAYS=3
389
390# Short lookback (frequent reconnects expected)
391NGIT_SYNC_RECONNECT_LOOKBACK_DAYS=1
392
393# Extended lookback
394NGIT_SYNC_RECONNECT_LOOKBACK_DAYS=7
395```
396
397**Notes:**
398- Limits catchup queries to recent events only
399- Reduces load compared to full historical sync
400- Balance between completeness and performance
401- Longer lookback useful for less reliable connections
402
403---
404
268### Logging Configuration 405### Logging Configuration
269 406
270#### `RUST_LOG` 407#### `RUST_LOG`