diff options
| author | DanConwayDev <DanConwayDev@protonmail.com> | 2025-12-04 18:43:49 +0000 |
|---|---|---|
| committer | DanConwayDev <DanConwayDev@protonmail.com> | 2025-12-04 18:43:49 +0000 |
| commit | dd403b17e7c74db9443d0891a9de1f0f0f9f89eb (patch) | |
| tree | 177dd9f664dde3565492c1d11016dabfeda28bbc /docs | |
| parent | 950c2e4e68448d2abcad90a31bfffaca6d7bc47e (diff) | |
feat(sync): Phase 6 - observability and production readiness
- Add SyncMetrics with full Prometheus integration
- Track sync gaps via catchup events
- Update Grafana dashboard with sync panels
- Document all sync configuration options
- Update design doc with implementation notes
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/explanation/grasp-02-proactive-sync.md | 128 | ||||
| -rw-r--r-- | docs/grafana/ngit-grasp-dashboard.json | 334 | ||||
| -rw-r--r-- | docs/reference/configuration.md | 137 |
3 files changed, 599 insertions, 0 deletions
diff --git a/docs/explanation/grasp-02-proactive-sync.md b/docs/explanation/grasp-02-proactive-sync.md index a8af3f4..98531ec 100644 --- a/docs/explanation/grasp-02-proactive-sync.md +++ b/docs/explanation/grasp-02-proactive-sync.md | |||
| @@ -745,3 +745,131 @@ pub struct SyncConfig { | |||
| 745 | 8. **Dynamic subscription addition** with periodic consolidation | 745 | 8. **Dynamic subscription addition** with periodic consolidation |
| 746 | 9. **Custom acceptance policy** excluding rate limiting defaults | 746 | 9. **Custom acceptance policy** excluding rate limiting defaults |
| 747 | 10. **Catchup as failure signal** - events found during catchup/daily indicate live sync gaps, tracked in Prometheus | 747 | 10. **Catchup as failure signal** - events found during catchup/daily indicate live sync gaps, tracked in Prometheus |
| 748 | |||
| 749 | --- | ||
| 750 | |||
| 751 | ## Implementation Notes (Phase 6) | ||
| 752 | |||
| 753 | This section documents the final implementation as of Phase 6 (Observability & Production Readiness). | ||
| 754 | |||
| 755 | ### What Was Actually Built | ||
| 756 | |||
| 757 | The implementation closely follows the design document with the following completed components: | ||
| 758 | |||
| 759 | #### Phase 1: Basic Sync (commit b167f1b) | ||
| 760 | - [`SyncManager`](../../src/sync/manager.rs) - Main coordinator for proactive sync | ||
| 761 | - Single relay sync via `NGIT_SYNC_RELAY_URL` configuration | ||
| 762 | - Event validation through existing [`Nip34WritePolicy`](../../src/nostr/builder.rs) | ||
| 763 | |||
| 764 | #### Phase 2: Three-Layer Filters (commit bf558b0) | ||
| 765 | - [`FilterService`](../../src/sync/filter.rs) - Builds three-layer filter strategy | ||
| 766 | - Layer 1: All kind 30617+30618 (announcements) | ||
| 767 | - Layer 2: A/a tag filters for repository events | ||
| 768 | - Layer 3: E/e tag filters for related events (PRs, Issues) | ||
| 769 | - Multi-relay discovery from stored announcements | ||
| 770 | |||
| 771 | #### Phase 3: Health Tracking (commit f639ecf) | ||
| 772 | - [`RelayHealthTracker`](../../src/sync/health.rs) - DashMap-based health tracking | ||
| 773 | - Three states: Healthy → Degraded → Dead | ||
| 774 | - Exponential backoff: 5s → 10s → 20s → ... → max (default 1h) | ||
| 775 | - Dead relay detection after 24h continuous failures | ||
| 776 | - Startup jitter (0-10s) to prevent thundering herd | ||
| 777 | |||
| 778 | #### Phase 4: Dynamic Subscriptions (commit a19ff57) | ||
| 779 | - [`SubscriptionManager`](../../src/sync/subscription.rs) - Per-connection subscription tracking | ||
| 780 | - Dynamic Layer 2 subscriptions when new announcements arrive | ||
| 781 | - Dynamic Layer 3 subscriptions when new PRs/Issues arrive | ||
| 782 | - Filter consolidation at threshold (150 filters) | ||
| 783 | |||
| 784 | #### Phase 5: Catchup & Gap Detection (commit 950c2e4) | ||
| 785 | - [`NegentropyService`](../../src/sync/negentropy.rs) - Gap-filling catchup operations | ||
| 786 | - Startup catchup (configurable delay) | ||
| 787 | - Reconnection catchup (limited lookback) | ||
| 788 | - Daily catchup (not yet implemented - placeholder) | ||
| 789 | |||
| 790 | #### Phase 6: Observability (this phase) | ||
| 791 | - [`SyncMetrics`](../../src/sync/metrics.rs) - Full Prometheus integration | ||
| 792 | - Grafana dashboard panels for sync monitoring | ||
| 793 | - Documentation updates | ||
| 794 | |||
| 795 | ### Differences from Original Design | ||
| 796 | |||
| 797 | 1. **Negentropy (NIP-77)**: Simplified gap-filling was used instead of full NIP-77 negentropy reconciliation, as nostr-sdk 0.44 lacks built-in negentropy support. The current implementation uses timestamp-based catchup queries. | ||
| 798 | |||
| 799 | 2. **Filter Consolidation Threshold**: Set at 150 filters (as designed) based on typical relay filter limits. | ||
| 800 | |||
| 801 | 3. **Health Tracking**: Implemented exactly as designed - in-memory only (not persisted to database), which is acceptable for production as health state rebuilds quickly on restart. | ||
| 802 | |||
| 803 | 4. **Metric Label Strategy**: Used simpler numeric encoding for health status (1=healthy, 2=degraded, 3=dead) instead of multiple label values per relay, reducing cardinality. | ||
| 804 | |||
| 805 | 5. **Event Source Tracking**: Implemented four source types (`live`, `startup`, `reconnect`, `daily`) instead of the original (`direct`, `live_sync`, `catchup`, `daily_catchup`). | ||
| 806 | |||
| 807 | ### Three-Layer Filter Strategy (As Implemented) | ||
| 808 | |||
| 809 | ``` | ||
| 810 | Layer 1: Discovery Layer | ||
| 811 | ├── Query: kinds [30617, 30618] (announcements) | ||
| 812 | ├── Applied: At startup and during sync | ||
| 813 | └── Purpose: Discover all repositories across network | ||
| 814 | |||
| 815 | Layer 2: Repository Events | ||
| 816 | ├── Query: Events with A/a tags pointing to tracked repos | ||
| 817 | ├── Format: A tag = "30617:<pubkey>:<identifier>" | ||
| 818 | ├── Triggered: When new announcement is accepted | ||
| 819 | └── Purpose: Get PRs, issues, patches for repositories | ||
| 820 | |||
| 821 | Layer 3: Related Events | ||
| 822 | ├── Query: Events with E/e tags pointing to tracked PRs/Issues | ||
| 823 | ├── Triggered: When new PR/Issue is accepted | ||
| 824 | └── Purpose: Get comments, reviews, status updates | ||
| 825 | ``` | ||
| 826 | |||
| 827 | ### Prometheus Metrics (As Implemented) | ||
| 828 | |||
| 829 | | Metric | Type | Labels | Description | | ||
| 830 | |--------|------|--------|-------------| | ||
| 831 | | `ngit_sync_relay_connected` | Gauge | relay | Connection status (1/0) | | ||
| 832 | | `ngit_sync_connection_attempts_total` | Counter | relay, result | Attempts by outcome | | ||
| 833 | | `ngit_sync_relay_status` | Gauge | relay | Health state (1/2/3) | | ||
| 834 | | `ngit_sync_relay_failures` | Gauge | relay | Consecutive failures | | ||
| 835 | | `ngit_sync_events_total` | Counter | source | Events by source type | | ||
| 836 | | `ngit_sync_gap_events_total` | Counter | relay | Gap events filled | | ||
| 837 | | `ngit_sync_relays_tracked_total` | Gauge | - | Total relays discovered | | ||
| 838 | | `ngit_sync_relays_connected_total` | Gauge | - | Currently connected | | ||
| 839 | | `ngit_sync_relays_dead_total` | Gauge | - | Dead relay count | | ||
| 840 | |||
| 841 | ### Configuration Options (As Implemented) | ||
| 842 | |||
| 843 | All configuration via environment variables or CLI flags: | ||
| 844 | |||
| 845 | | Option | Type | Default | Description | | ||
| 846 | |--------|------|---------|-------------| | ||
| 847 | | `NGIT_SYNC_RELAY_URL` | String | None | Primary sync relay URL | | ||
| 848 | | `NGIT_SYNC_MAX_BACKOFF_SECS` | u64 | 3600 | Max backoff delay (seconds) | | ||
| 849 | | `NGIT_SYNC_STARTUP_DELAY_SECS` | u64 | 30 | Catchup delay after startup | | ||
| 850 | | `NGIT_SYNC_RECONNECT_DELAY_SECS` | u64 | 10 | Catchup delay after reconnect | | ||
| 851 | | `NGIT_SYNC_RECONNECT_LOOKBACK_DAYS` | u64 | 3 | Days to look back on reconnect | | ||
| 852 | |||
| 853 | ### Module Structure (As Implemented) | ||
| 854 | |||
| 855 | ``` | ||
| 856 | src/sync/ | ||
| 857 | ├── mod.rs # Module exports, constants | ||
| 858 | ├── manager.rs # SyncManager - orchestrates sync | ||
| 859 | ├── connection.rs # SyncConnection - per-relay WebSocket | ||
| 860 | ├── filter.rs # FilterService - three-layer filters | ||
| 861 | ├── health.rs # RelayHealthTracker - health states | ||
| 862 | ├── metrics.rs # SyncMetrics - Prometheus integration | ||
| 863 | ├── negentropy.rs # NegentropyService - gap-filling | ||
| 864 | └── subscription.rs # SubscriptionManager - dynamic subs | ||
| 865 | ``` | ||
| 866 | |||
| 867 | ### Production Readiness Checklist | ||
| 868 | |||
| 869 | - [x] All metrics exposed at `/metrics` endpoint | ||
| 870 | - [x] Health state tracking with configurable backoff | ||
| 871 | - [x] Dead relay detection and minimal retry | ||
| 872 | - [x] Startup jitter to prevent thundering herd | ||
| 873 | - [x] Grafana dashboard with sync panels | ||
| 874 | - [x] Configuration documented | ||
| 875 | - [x] Integration tests passing | ||
diff --git a/docs/grafana/ngit-grasp-dashboard.json b/docs/grafana/ngit-grasp-dashboard.json index bd1b6fe..3b9b216 100644 --- a/docs/grafana/ngit-grasp-dashboard.json +++ b/docs/grafana/ngit-grasp-dashboard.json | |||
| @@ -641,6 +641,340 @@ | |||
| 641 | ], | 641 | ], |
| 642 | "title": "Events Stored vs Rejected (5m)", | 642 | "title": "Events Stored vs Rejected (5m)", |
| 643 | "type": "timeseries" | 643 | "type": "timeseries" |
| 644 | }, | ||
| 645 | { | ||
| 646 | "collapsed": false, | ||
| 647 | "gridPos": { "h": 1, "w": 24, "x": 0, "y": 48 }, | ||
| 648 | "id": 40, | ||
| 649 | "title": "Proactive Sync", | ||
| 650 | "type": "row" | ||
| 651 | }, | ||
| 652 | { | ||
| 653 | "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||
| 654 | "fieldConfig": { | ||
| 655 | "defaults": { | ||
| 656 | "color": { "mode": "palette-classic" }, | ||
| 657 | "custom": { | ||
| 658 | "axisCenteredZero": false, | ||
| 659 | "axisColorMode": "text", | ||
| 660 | "axisLabel": "", | ||
| 661 | "axisPlacement": "auto", | ||
| 662 | "barAlignment": 0, | ||
| 663 | "drawStyle": "line", | ||
| 664 | "fillOpacity": 10, | ||
| 665 | "gradientMode": "none", | ||
| 666 | "hideFrom": { "legend": false, "tooltip": false, "viz": false }, | ||
| 667 | "lineInterpolation": "linear", | ||
| 668 | "lineWidth": 1, | ||
| 669 | "pointSize": 5, | ||
| 670 | "scaleDistribution": { "type": "linear" }, | ||
| 671 | "showPoints": "never", | ||
| 672 | "spanNulls": false, | ||
| 673 | "stacking": { "group": "A", "mode": "none" }, | ||
| 674 | "thresholdsStyle": { "mode": "off" } | ||
| 675 | }, | ||
| 676 | "mappings": [], | ||
| 677 | "thresholds": { "mode": "absolute", "steps": [] }, | ||
| 678 | "unit": "short" | ||
| 679 | } | ||
| 680 | }, | ||
| 681 | "gridPos": { "h": 8, "w": 12, "x": 0, "y": 49 }, | ||
| 682 | "id": 41, | ||
| 683 | "options": { | ||
| 684 | "legend": { "calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true }, | ||
| 685 | "tooltip": { "mode": "multi", "sort": "none" } | ||
| 686 | }, | ||
| 687 | "targets": [ | ||
| 688 | { | ||
| 689 | "expr": "ngit_sync_relays_connected_total", | ||
| 690 | "legendFormat": "Connected", | ||
| 691 | "refId": "A" | ||
| 692 | }, | ||
| 693 | { | ||
| 694 | "expr": "ngit_sync_relays_tracked_total", | ||
| 695 | "legendFormat": "Tracked", | ||
| 696 | "refId": "B" | ||
| 697 | }, | ||
| 698 | { | ||
| 699 | "expr": "ngit_sync_relays_dead_total", | ||
| 700 | "legendFormat": "Dead", | ||
| 701 | "refId": "C" | ||
| 702 | } | ||
| 703 | ], | ||
| 704 | "title": "Sync Relays Over Time", | ||
| 705 | "type": "timeseries" | ||
| 706 | }, | ||
| 707 | { | ||
| 708 | "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||
| 709 | "fieldConfig": { | ||
| 710 | "defaults": { | ||
| 711 | "color": { "mode": "palette-classic" }, | ||
| 712 | "custom": { "hideFrom": { "legend": false, "tooltip": false, "viz": false } }, | ||
| 713 | "mappings": [], | ||
| 714 | "unit": "short" | ||
| 715 | }, | ||
| 716 | "overrides": [ | ||
| 717 | { | ||
| 718 | "matcher": { "id": "byName", "options": "healthy" }, | ||
| 719 | "properties": [{ "id": "color", "value": { "fixedColor": "green", "mode": "fixed" } }] | ||
| 720 | }, | ||
| 721 | { | ||
| 722 | "matcher": { "id": "byName", "options": "degraded" }, | ||
| 723 | "properties": [{ "id": "color", "value": { "fixedColor": "yellow", "mode": "fixed" } }] | ||
| 724 | }, | ||
| 725 | { | ||
| 726 | "matcher": { "id": "byName", "options": "dead" }, | ||
| 727 | "properties": [{ "id": "color", "value": { "fixedColor": "red", "mode": "fixed" } }] | ||
| 728 | } | ||
| 729 | ] | ||
| 730 | }, | ||
| 731 | "gridPos": { "h": 8, "w": 6, "x": 12, "y": 49 }, | ||
| 732 | "id": 42, | ||
| 733 | "options": { | ||
| 734 | "legend": { "displayMode": "list", "placement": "right", "showLegend": true }, | ||
| 735 | "pieType": "pie", | ||
| 736 | "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }, | ||
| 737 | "tooltip": { "mode": "single", "sort": "none" } | ||
| 738 | }, | ||
| 739 | "targets": [ | ||
| 740 | { | ||
| 741 | "expr": "count(ngit_sync_relay_status == 1)", | ||
| 742 | "legendFormat": "healthy", | ||
| 743 | "refId": "A" | ||
| 744 | }, | ||
| 745 | { | ||
| 746 | "expr": "count(ngit_sync_relay_status == 2)", | ||
| 747 | "legendFormat": "degraded", | ||
| 748 | "refId": "B" | ||
| 749 | }, | ||
| 750 | { | ||
| 751 | "expr": "count(ngit_sync_relay_status == 3)", | ||
| 752 | "legendFormat": "dead", | ||
| 753 | "refId": "C" | ||
| 754 | } | ||
| 755 | ], | ||
| 756 | "title": "Relay Health Distribution", | ||
| 757 | "type": "piechart" | ||
| 758 | }, | ||
| 759 | { | ||
| 760 | "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||
| 761 | "fieldConfig": { | ||
| 762 | "defaults": { | ||
| 763 | "color": { "mode": "thresholds" }, | ||
| 764 | "mappings": [], | ||
| 765 | "thresholds": { | ||
| 766 | "mode": "absolute", | ||
| 767 | "steps": [ | ||
| 768 | { "color": "green", "value": null }, | ||
| 769 | { "color": "yellow", "value": 1 }, | ||
| 770 | { "color": "red", "value": 5 } | ||
| 771 | ] | ||
| 772 | }, | ||
| 773 | "unit": "short" | ||
| 774 | } | ||
| 775 | }, | ||
| 776 | "gridPos": { "h": 4, "w": 3, "x": 18, "y": 49 }, | ||
| 777 | "id": 43, | ||
| 778 | "options": { | ||
| 779 | "colorMode": "value", | ||
| 780 | "graphMode": "none", | ||
| 781 | "justifyMode": "auto", | ||
| 782 | "orientation": "auto", | ||
| 783 | "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }, | ||
| 784 | "textMode": "auto" | ||
| 785 | }, | ||
| 786 | "targets": [ | ||
| 787 | { | ||
| 788 | "expr": "ngit_sync_relays_dead_total", | ||
| 789 | "legendFormat": "Dead", | ||
| 790 | "refId": "A" | ||
| 791 | } | ||
| 792 | ], | ||
| 793 | "title": "Dead Relays", | ||
| 794 | "type": "stat" | ||
| 795 | }, | ||
| 796 | { | ||
| 797 | "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||
| 798 | "fieldConfig": { | ||
| 799 | "defaults": { | ||
| 800 | "color": { "mode": "thresholds" }, | ||
| 801 | "mappings": [], | ||
| 802 | "thresholds": { | ||
| 803 | "mode": "absolute", | ||
| 804 | "steps": [{ "color": "blue", "value": null }] | ||
| 805 | }, | ||
| 806 | "unit": "short" | ||
| 807 | } | ||
| 808 | }, | ||
| 809 | "gridPos": { "h": 4, "w": 3, "x": 21, "y": 49 }, | ||
| 810 | "id": 44, | ||
| 811 | "options": { | ||
| 812 | "colorMode": "value", | ||
| 813 | "graphMode": "none", | ||
| 814 | "justifyMode": "auto", | ||
| 815 | "orientation": "auto", | ||
| 816 | "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }, | ||
| 817 | "textMode": "auto" | ||
| 818 | }, | ||
| 819 | "targets": [ | ||
| 820 | { | ||
| 821 | "expr": "ngit_sync_relays_connected_total", | ||
| 822 | "legendFormat": "Connected", | ||
| 823 | "refId": "A" | ||
| 824 | } | ||
| 825 | ], | ||
| 826 | "title": "Connected Relays", | ||
| 827 | "type": "stat" | ||
| 828 | }, | ||
| 829 | { | ||
| 830 | "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||
| 831 | "fieldConfig": { | ||
| 832 | "defaults": { | ||
| 833 | "color": { "mode": "palette-classic" }, | ||
| 834 | "custom": { | ||
| 835 | "axisCenteredZero": false, | ||
| 836 | "axisColorMode": "text", | ||
| 837 | "axisLabel": "", | ||
| 838 | "axisPlacement": "auto", | ||
| 839 | "barAlignment": 0, | ||
| 840 | "drawStyle": "bars", | ||
| 841 | "fillOpacity": 50, | ||
| 842 | "gradientMode": "none", | ||
| 843 | "hideFrom": { "legend": false, "tooltip": false, "viz": false }, | ||
| 844 | "lineInterpolation": "linear", | ||
| 845 | "lineWidth": 1, | ||
| 846 | "pointSize": 5, | ||
| 847 | "scaleDistribution": { "type": "linear" }, | ||
| 848 | "showPoints": "never", | ||
| 849 | "spanNulls": false, | ||
| 850 | "stacking": { "group": "A", "mode": "normal" }, | ||
| 851 | "thresholdsStyle": { "mode": "off" } | ||
| 852 | }, | ||
| 853 | "mappings": [], | ||
| 854 | "thresholds": { "mode": "absolute", "steps": [] }, | ||
| 855 | "unit": "short" | ||
| 856 | }, | ||
| 857 | "overrides": [ | ||
| 858 | { | ||
| 859 | "matcher": { "id": "byName", "options": "success" }, | ||
| 860 | "properties": [{ "id": "color", "value": { "fixedColor": "green", "mode": "fixed" } }] | ||
| 861 | }, | ||
| 862 | { | ||
| 863 | "matcher": { "id": "byName", "options": "failure" }, | ||
| 864 | "properties": [{ "id": "color", "value": { "fixedColor": "red", "mode": "fixed" } }] | ||
| 865 | } | ||
| 866 | ] | ||
| 867 | }, | ||
| 868 | "gridPos": { "h": 4, "w": 6, "x": 18, "y": 53 }, | ||
| 869 | "id": 45, | ||
| 870 | "options": { | ||
| 871 | "legend": { "calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true }, | ||
| 872 | "tooltip": { "mode": "multi", "sort": "none" } | ||
| 873 | }, | ||
| 874 | "targets": [ | ||
| 875 | { | ||
| 876 | "expr": "increase(ngit_sync_connection_attempts_total{result=\"success\"}[5m])", | ||
| 877 | "legendFormat": "success", | ||
| 878 | "refId": "A" | ||
| 879 | }, | ||
| 880 | { | ||
| 881 | "expr": "increase(ngit_sync_connection_attempts_total{result=\"failure\"}[5m])", | ||
| 882 | "legendFormat": "failure", | ||
| 883 | "refId": "B" | ||
| 884 | } | ||
| 885 | ], | ||
| 886 | "title": "Connection Attempts (5m)", | ||
| 887 | "type": "timeseries" | ||
| 888 | }, | ||
| 889 | { | ||
| 890 | "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||
| 891 | "fieldConfig": { | ||
| 892 | "defaults": { | ||
| 893 | "color": { "mode": "palette-classic" }, | ||
| 894 | "custom": { | ||
| 895 | "axisCenteredZero": false, | ||
| 896 | "axisColorMode": "text", | ||
| 897 | "axisLabel": "", | ||
| 898 | "axisPlacement": "auto", | ||
| 899 | "barAlignment": 0, | ||
| 900 | "drawStyle": "line", | ||
| 901 | "fillOpacity": 10, | ||
| 902 | "gradientMode": "none", | ||
| 903 | "hideFrom": { "legend": false, "tooltip": false, "viz": false }, | ||
| 904 | "lineInterpolation": "linear", | ||
| 905 | "lineWidth": 1, | ||
| 906 | "pointSize": 5, | ||
| 907 | "scaleDistribution": { "type": "linear" }, | ||
| 908 | "showPoints": "never", | ||
| 909 | "spanNulls": false, | ||
| 910 | "stacking": { "group": "A", "mode": "none" }, | ||
| 911 | "thresholdsStyle": { "mode": "off" } | ||
| 912 | }, | ||
| 913 | "mappings": [], | ||
| 914 | "thresholds": { "mode": "absolute", "steps": [] }, | ||
| 915 | "unit": "short" | ||
| 916 | } | ||
| 917 | }, | ||
| 918 | "gridPos": { "h": 8, "w": 12, "x": 0, "y": 57 }, | ||
| 919 | "id": 46, | ||
| 920 | "options": { | ||
| 921 | "legend": { "calcs": ["sum"], "displayMode": "table", "placement": "right", "showLegend": true }, | ||
| 922 | "tooltip": { "mode": "multi", "sort": "none" } | ||
| 923 | }, | ||
| 924 | "targets": [ | ||
| 925 | { | ||
| 926 | "expr": "rate(ngit_sync_events_total[5m])", | ||
| 927 | "legendFormat": "{{source}}", | ||
| 928 | "refId": "A" | ||
| 929 | } | ||
| 930 | ], | ||
| 931 | "title": "Synced Events by Source (5m)", | ||
| 932 | "type": "timeseries" | ||
| 933 | }, | ||
| 934 | { | ||
| 935 | "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||
| 936 | "fieldConfig": { | ||
| 937 | "defaults": { | ||
| 938 | "color": { "mode": "palette-classic" }, | ||
| 939 | "custom": { | ||
| 940 | "axisCenteredZero": false, | ||
| 941 | "axisColorMode": "text", | ||
| 942 | "axisLabel": "", | ||
| 943 | "axisPlacement": "auto", | ||
| 944 | "barAlignment": 0, | ||
| 945 | "drawStyle": "bars", | ||
| 946 | "fillOpacity": 50, | ||
| 947 | "gradientMode": "none", | ||
| 948 | "hideFrom": { "legend": false, "tooltip": false, "viz": false }, | ||
| 949 | "lineInterpolation": "linear", | ||
| 950 | "lineWidth": 1, | ||
| 951 | "pointSize": 5, | ||
| 952 | "scaleDistribution": { "type": "linear" }, | ||
| 953 | "showPoints": "never", | ||
| 954 | "spanNulls": false, | ||
| 955 | "stacking": { "group": "A", "mode": "normal" }, | ||
| 956 | "thresholdsStyle": { "mode": "off" } | ||
| 957 | }, | ||
| 958 | "mappings": [], | ||
| 959 | "thresholds": { "mode": "absolute", "steps": [] }, | ||
| 960 | "unit": "short" | ||
| 961 | } | ||
| 962 | }, | ||
| 963 | "gridPos": { "h": 8, "w": 12, "x": 12, "y": 57 }, | ||
| 964 | "id": 47, | ||
| 965 | "options": { | ||
| 966 | "legend": { "calcs": ["sum"], "displayMode": "table", "placement": "right", "showLegend": true }, | ||
| 967 | "tooltip": { "mode": "multi", "sort": "none" } | ||
| 968 | }, | ||
| 969 | "targets": [ | ||
| 970 | { | ||
| 971 | "expr": "increase(ngit_sync_gap_events_total[1h])", | ||
| 972 | "legendFormat": "{{relay}}", | ||
| 973 | "refId": "A" | ||
| 974 | } | ||
| 975 | ], | ||
| 976 | "title": "Gap Events Filled by Relay (1h)", | ||
| 977 | "type": "timeseries" | ||
| 644 | } | 978 | } |
| 645 | ], | 979 | ], |
| 646 | "refresh": "30s", | 980 | "refresh": "30s", |
diff --git a/docs/reference/configuration.md b/docs/reference/configuration.md index e2ec9aa..80ae45c 100644 --- a/docs/reference/configuration.md +++ b/docs/reference/configuration.md | |||
| @@ -265,6 +265,143 @@ NGIT_DATABASE_BACKEND=lmdb | |||
| 265 | 265 | ||
| 266 | --- | 266 | --- |
| 267 | 267 | ||
| 268 | ### Proactive Sync Configuration (GRASP-02) | ||
| 269 | |||
| 270 | These options configure the proactive sync feature that synchronizes events from other relays. | ||
| 271 | |||
| 272 | #### `NGIT_SYNC_RELAY_URL` | ||
| 273 | |||
| 274 | **Description:** URL of the primary relay to sync events from | ||
| 275 | **Type:** String (WebSocket URL) | ||
| 276 | **Default:** None (sync disabled) | ||
| 277 | **Required:** No | ||
| 278 | |||
| 279 | **Examples:** | ||
| 280 | ```bash | ||
| 281 | # Sync from a public relay | ||
| 282 | NGIT_SYNC_RELAY_URL=wss://relay.example.com | ||
| 283 | |||
| 284 | # Sync from another GRASP relay | ||
| 285 | NGIT_SYNC_RELAY_URL=wss://git.nostr.dev | ||
| 286 | |||
| 287 | # Local testing | ||
| 288 | NGIT_SYNC_RELAY_URL=ws://127.0.0.1:8081 | ||
| 289 | ``` | ||
| 290 | |||
| 291 | **Notes:** | ||
| 292 | - When set, enables proactive sync feature | ||
| 293 | - The relay will discover additional relays from repository announcements | ||
| 294 | - Synced events go through the same validation as directly-submitted events | ||
| 295 | - Use WebSocket protocol (`ws://` or `wss://`) | ||
| 296 | |||
| 297 | --- | ||
| 298 | |||
| 299 | #### `NGIT_SYNC_MAX_BACKOFF_SECS` | ||
| 300 | |||
| 301 | **Description:** Maximum backoff time in seconds for sync relay reconnection | ||
| 302 | **Type:** Integer (seconds) | ||
| 303 | **Default:** `3600` (1 hour) | ||
| 304 | **Required:** No | ||
| 305 | |||
| 306 | **Examples:** | ||
| 307 | ```bash | ||
| 308 | # Default: 1 hour max backoff | ||
| 309 | NGIT_SYNC_MAX_BACKOFF_SECS=3600 | ||
| 310 | |||
| 311 | # Aggressive: 5 minute max backoff | ||
| 312 | NGIT_SYNC_MAX_BACKOFF_SECS=300 | ||
| 313 | |||
| 314 | # Conservative: 2 hour max backoff | ||
| 315 | NGIT_SYNC_MAX_BACKOFF_SECS=7200 | ||
| 316 | ``` | ||
| 317 | |||
| 318 | **Notes:** | ||
| 319 | - Backoff starts at 5 seconds and doubles on each failure | ||
| 320 | - Capped at this maximum value | ||
| 321 | - After 24 hours of failures, relay is marked "dead" and retried daily | ||
| 322 | - Lower values mean more reconnection attempts | ||
| 323 | |||
| 324 | --- | ||
| 325 | |||
| 326 | #### `NGIT_SYNC_STARTUP_DELAY_SECS` | ||
| 327 | |||
| 328 | **Description:** Delay in seconds before running startup catchup | ||
| 329 | **Type:** Integer (seconds) | ||
| 330 | **Default:** `30` | ||
| 331 | **Required:** No | ||
| 332 | |||
| 333 | **Examples:** | ||
| 334 | ```bash | ||
| 335 | # Default: 30 second delay | ||
| 336 | NGIT_SYNC_STARTUP_DELAY_SECS=30 | ||
| 337 | |||
| 338 | # Quick startup (testing) | ||
| 339 | NGIT_SYNC_STARTUP_DELAY_SECS=5 | ||
| 340 | |||
| 341 | # Production: longer warm-up | ||
| 342 | NGIT_SYNC_STARTUP_DELAY_SECS=60 | ||
| 343 | ``` | ||
| 344 | |||
| 345 | **Notes:** | ||
| 346 | - Allows connections to stabilize before catchup | ||
| 347 | - Reduces load on remote relays at startup | ||
| 348 | - Set to 0 for immediate catchup (not recommended) | ||
| 349 | |||
| 350 | --- | ||
| 351 | |||
| 352 | #### `NGIT_SYNC_RECONNECT_DELAY_SECS` | ||
| 353 | |||
| 354 | **Description:** Delay in seconds before running catchup after reconnection | ||
| 355 | **Type:** Integer (seconds) | ||
| 356 | **Default:** `10` | ||
| 357 | **Required:** No | ||
| 358 | |||
| 359 | **Examples:** | ||
| 360 | ```bash | ||
| 361 | # Default: 10 second delay | ||
| 362 | NGIT_SYNC_RECONNECT_DELAY_SECS=10 | ||
| 363 | |||
| 364 | # Quick reconnect catchup | ||
| 365 | NGIT_SYNC_RECONNECT_DELAY_SECS=5 | ||
| 366 | |||
| 367 | # Conservative | ||
| 368 | NGIT_SYNC_RECONNECT_DELAY_SECS=30 | ||
| 369 | ``` | ||
| 370 | |||
| 371 | **Notes:** | ||
| 372 | - Prevents rate limiting from remote relays | ||
| 373 | - Applied after each successful reconnection | ||
| 374 | - Only catches up on recent events (see lookback days) | ||
| 375 | |||
| 376 | --- | ||
| 377 | |||
| 378 | #### `NGIT_SYNC_RECONNECT_LOOKBACK_DAYS` | ||
| 379 | |||
| 380 | **Description:** Number of days to look back for reconnect catchup | ||
| 381 | **Type:** Integer (days) | ||
| 382 | **Default:** `3` | ||
| 383 | **Required:** No | ||
| 384 | |||
| 385 | **Examples:** | ||
| 386 | ```bash | ||
| 387 | # Default: 3 days lookback | ||
| 388 | NGIT_SYNC_RECONNECT_LOOKBACK_DAYS=3 | ||
| 389 | |||
| 390 | # Short lookback (frequent reconnects expected) | ||
| 391 | NGIT_SYNC_RECONNECT_LOOKBACK_DAYS=1 | ||
| 392 | |||
| 393 | # Extended lookback | ||
| 394 | NGIT_SYNC_RECONNECT_LOOKBACK_DAYS=7 | ||
| 395 | ``` | ||
| 396 | |||
| 397 | **Notes:** | ||
| 398 | - Limits catchup queries to recent events only | ||
| 399 | - Reduces load compared to full historical sync | ||
| 400 | - Balance between completeness and performance | ||
| 401 | - Longer lookback useful for less reliable connections | ||
| 402 | |||
| 403 | --- | ||
| 404 | |||
| 268 | ### Logging Configuration | 405 | ### Logging Configuration |
| 269 | 406 | ||
| 270 | #### `RUST_LOG` | 407 | #### `RUST_LOG` |