upleb.uk

Public git repos — served from a NIP-34 GRASP relay at git.upleb.uk

summaryrefslogtreecommitdiff
path: root/docs/explanation/monitoring.md
diff options
context:
space:
mode:
authorDanConwayDev <DanConwayDev@protonmail.com>2025-12-04 15:17:04 +0000
committerDanConwayDev <DanConwayDev@protonmail.com>2025-12-04 15:24:19 +0000
commitfd0c87c787d0626b3546fa571541c9c809711821 (patch)
tree934f20d973127f380b807d2bd44b25c197cf349c /docs/explanation/monitoring.md
parent762cd8e815e797f173f541795de774fbbf978fc3 (diff)
add prometheus metrics
Diffstat (limited to 'docs/explanation/monitoring.md')
-rw-r--r--docs/explanation/monitoring.md99
1 files changed, 99 insertions, 0 deletions
diff --git a/docs/explanation/monitoring.md b/docs/explanation/monitoring.md
new file mode 100644
index 0000000..3b1b1ac
--- /dev/null
+++ b/docs/explanation/monitoring.md
@@ -0,0 +1,99 @@
1# Monitoring
2
3ngit-grasp exposes Prometheus metrics at `/metrics` for monitoring WebSocket connections, Git operations, Nostr events, and system health.
4
5## Architecture
6
7```mermaid
8flowchart TB
9 subgraph ngit-grasp
10 HTTP[HTTP Service]
11 WS[WebSocket Handler]
12 GIT[Git Handlers]
13 RELAY[Nostr Relay]
14
15 subgraph Metrics Module
16 REG[Prometheus Registry]
17 CT[ConnectionTracker]
18 MC[Metric Counters]
19 end
20
21 ME[/metrics endpoint]
22 end
23
24 subgraph External
25 PROM[Prometheus Server]
26 GRAF[Grafana]
27 ADMIN[Admin Browser]
28 end
29
30 HTTP --> ME
31 WS --> CT
32 WS --> MC
33 GIT --> MC
34 RELAY --> MC
35
36 CT --> REG
37 MC --> REG
38 REG --> ME
39
40 PROM -->|scrape /metrics| ME
41 GRAF -->|query| PROM
42 ADMIN -->|view dashboards| GRAF
43```
44
45## Configuration
46
47| Option | CLI Flag | Environment Variable | Default | Description |
48|--------|----------|---------------------|---------|-------------|
49| Metrics enabled | `--metrics-enabled` | `NGIT_METRICS_ENABLED` | `true` | Enable /metrics endpoint |
50| Abuse threshold | `--abuse-threshold` | `NGIT_ABUSE_THRESHOLD` | `10` | Max connections per IP before flagging |
51| Top N repos | `--top-n-repos` | `NGIT_TOP_N_REPOS` | `10` | Number of top bandwidth repos to track |
52
53## Privacy Model
54
55IP addresses are **never exposed in Prometheus metrics**. The connection tracker maintains per-IP counts internally only for abuse detection:
56
57| Data | Exposed in Metrics? |
58|------|---------------------|
59| Total connections | ✅ Yes |
60| Unique IP count | ✅ Yes |
61| Flagged abuser count | ✅ Yes |
62| Actual IP addresses | ❌ No (internal only) |
63| IP + abuse flag | ⚠️ Logs only (when flagged) |
64
65When an IP exceeds the abuse threshold, a warning is logged but the IP is never exposed via Prometheus.
66
67## Deployment
68
69See [Prometheus Setup Guide](../how-to/prometheus-setup.md) for NixOS configuration and Grafana dashboard provisioning.
70
71## Future: Load-Based Sync Scheduling (GRASP-02)
72
73The metrics infrastructure enables future load-based scheduling for GRASP-02 sync jobs:
74
75```mermaid
76flowchart TD
77 SYNC[Sync Manager] --> CHECK{Check Load}
78 CHECK --> MET[Query Metrics]
79 MET --> CONN{Connections > N?}
80 CONN -->|Yes| DELAY[Delay 5 min]
81 CONN -->|No| RUN[Run Sync Job]
82 DELAY --> CHECK
83```
84
85## Future: Loki for Detailed Logging
86
87For detailed per-repository investigation at scale, consider adding **Loki** (log aggregation):
88
89- Structured logging with tracing crate already in place
90- Loki queries enable ad-hoc deep dives (e.g., find all transfers > 10MB)
91- Pairs with Prometheus for long-term trends
92
93## Future: Sync Metrics (GRASP-02)
94
95When GRASP-02 proactive sync is implemented, additional metrics will track:
96
97- Events received from sync (live vs catchup)
98- Active outbound relay connections
99- Catchup gap (events found during catchup indicating sync failures) \ No newline at end of file