diff options
| author | DanConwayDev <DanConwayDev@protonmail.com> | 2026-01-13 20:43:22 +0000 |
|---|---|---|
| committer | DanConwayDev <DanConwayDev@protonmail.com> | 2026-01-14 13:39:52 +0000 |
| commit | e3792b9abefd43b4594af2640ad4665c006fa3b0 (patch) | |
| tree | 2e669d1df3812cdc1bf47f7f4c83261cedc6e313 /docs/explanation | |
| parent | 4c8f1813fada9ce2bfd371095b0721bff68173e3 (diff) | |
docs: add defensive analysis of other relays (strfry, nostr-rs-relay, khatru)
Comprehensive research on rate limiting and defensive features across major Nostr relay implementations. Documents:
- Current state of ngit-grasp defensive features
- Detailed analysis of strfry, nostr-rs-relay, and khatru
- Concrete defaults and configuration options from each
- Rust rate limiting ecosystem (governor crate)
- Recommendations for ngit-grasp implementation
- Proposed default values and implementation phases
Diffstat (limited to 'docs/explanation')
| -rw-r--r-- | docs/explanation/defensive-analysis-of-other-relays.md | 669 |
1 files changed, 669 insertions, 0 deletions
diff --git a/docs/explanation/defensive-analysis-of-other-relays.md b/docs/explanation/defensive-analysis-of-other-relays.md new file mode 100644 index 0000000..eb5f020 --- /dev/null +++ b/docs/explanation/defensive-analysis-of-other-relays.md | |||
| @@ -0,0 +1,669 @@ | |||
| 1 | # Defensive Analysis of Other Nostr Relays | ||
| 2 | |||
| 3 | **Issue:** d6ee - Defensive Relay Features | ||
| 4 | **Date:** 2026-01-13 | ||
| 5 | **Purpose:** Research findings on rate limiting and defensive features in major Nostr relay implementations to inform ngit-grasp's defensive strategy. | ||
| 6 | |||
| 7 | ## Executive Summary | ||
| 8 | |||
| 9 | This analysis examines how three major Nostr relay implementations (strfry, nostr-rs-relay, and khatru) handle rate limiting, connection management, and DoS protection. The goal is to identify industry best practices and concrete defaults to implement in ngit-grasp. | ||
| 10 | |||
| 11 | **Key Finding:** Most relays have VERY permissive defaults or no limits at all, relying on operators to configure appropriately or use external reverse proxies. Only khatru provides opinionated secure-by-default settings. | ||
| 12 | |||
| 13 | ## Current State of ngit-grasp | ||
| 14 | |||
| 15 | ### Existing Defensive Features ✅ | ||
| 16 | |||
| 17 | #### 1. Connection Tracking & Abuse Detection | ||
| 18 | **Location:** `src/metrics/connection.rs` | ||
| 19 | |||
| 20 | - Per-IP connection counting via `ConnectionTracker` | ||
| 21 | - Abuse threshold detection (default: 10 connections per IP) | ||
| 22 | - Privacy-preserving metrics (IPs never exposed to Prometheus) | ||
| 23 | - Tracks: total connections, unique IPs, flagged abusers | ||
| 24 | |||
| 25 | **Configuration:** | ||
| 26 | ```rust | ||
| 27 | // src/config.rs:366-372 | ||
| 28 | pub metrics_connection_per_ip_abuse_threshold: u32 = 10 | ||
| 29 | ``` | ||
| 30 | |||
| 31 | **Limitations:** | ||
| 32 | - ⚠️ **Display-only** - Detection happens but no enforcement | ||
| 33 | - ⚠️ No connection limit enforcement | ||
| 34 | - ⚠️ No per-IP subscription limits | ||
| 35 | - ⚠️ No time-based rate limits | ||
| 36 | |||
| 37 | #### 2. Git Remote Throttling (Purgatory Sync) | ||
| 38 | **Location:** `src/purgatory/sync/throttle.rs` | ||
| 39 | |||
| 40 | - Sophisticated domain-based rate limiting for outbound git fetch requests | ||
| 41 | - Per-domain concurrent request limits (default: 5) | ||
| 42 | - Per-domain rate limits (default: 30 requests/minute) | ||
| 43 | - Round-robin queue management for fairness | ||
| 44 | - Sliding window implementation | ||
| 45 | |||
| 46 | **How it works:** | ||
| 47 | ```rust | ||
| 48 | // Lines 159: Default throttle manager creation | ||
| 49 | let throttle_manager = Arc::new(ThrottleManager::new(5, 30)); | ||
| 50 | |||
| 51 | // Lines 96-106: DomainThrottle tracks concurrent and rate limits | ||
| 52 | pub fn new(domain: String, max_concurrent: u32, max_per_minute: u32) | ||
| 53 | |||
| 54 | // Lines 113-129: Checks both limits before allowing requests | ||
| 55 | ``` | ||
| 56 | |||
| 57 | **Note:** Only applies to **outbound** git fetches, not incoming client connections. | ||
| 58 | |||
| 59 | #### 3. Event Blacklisting | ||
| 60 | **Location:** `src/config.rs` (lines 247-281, 658-668), `src/nostr/builder.rs` (lines 75-86, 495-505) | ||
| 61 | |||
| 62 | - Event author blacklist - Block all events from specific npubs | ||
| 63 | - Repository blacklist - Block announcements for specific repos/identifiers/npubs | ||
| 64 | - Blacklist checked FIRST in write policy (overrides everything) | ||
| 65 | |||
| 66 | **Configuration:** | ||
| 67 | ```bash | ||
| 68 | NGIT_EVENT_BLACKLIST="npub1...,npub2..." | ||
| 69 | NGIT_REPOSITORY_BLACKLIST="npub1.../identifier,identifier" | ||
| 70 | ``` | ||
| 71 | |||
| 72 | #### 4. Naughty List for Problematic Git Remotes | ||
| 73 | **Location:** `src/sync/naughty_list.rs` | ||
| 74 | |||
| 75 | - Tracks git remote domains with persistent infrastructure errors | ||
| 76 | - Classifies errors (DNS, TLS, protocol, WebSocket) | ||
| 77 | - Temporary blacklisting with expiration (default: 12 hours) | ||
| 78 | - Used to skip unreliable relays during sync | ||
| 79 | |||
| 80 | #### 5. Metrics & Monitoring | ||
| 81 | **Location:** `src/metrics/mod.rs` | ||
| 82 | |||
| 83 | - WebSocket connection metrics (total, duration, messages by type) | ||
| 84 | - Git operation tracking (clone, fetch, push by status) | ||
| 85 | - Nostr event metrics (received, stored, rejected by kind and reason) | ||
| 86 | - Sync metrics (connections, attempts, failures) | ||
| 87 | - Repository count tracking | ||
| 88 | |||
| 89 | **No enforcement capabilities** - purely observability. | ||
| 90 | |||
| 91 | ### What's Missing ❌ | ||
| 92 | |||
| 93 | 1. **WebSocket Connection Limits** - No global or per-IP enforcement | ||
| 94 | 2. **Subscription (REQ) Limits** - Clients can open unlimited REQs | ||
| 95 | 3. **Event Rate Limiting** - No per-client/per-IP limits | ||
| 96 | 4. **HTTP Endpoint Protection** - All endpoints unprotected | ||
| 97 | 5. **Message Size Limits** - No WebSocket/event size caps | ||
| 98 | 6. **Rate Limiting Crates** - No dependencies available | ||
| 99 | |||
| 100 | ### Integration Points for Implementation | ||
| 101 | |||
| 102 | #### WebSocket Connection Accept Point | ||
| 103 | **Location:** `src/http/mod.rs:402-424` | ||
| 104 | |||
| 105 | ```rust | ||
| 106 | tokio::spawn(async move { | ||
| 107 | match hyper::upgrade::on(req).await { | ||
| 108 | Ok(upgraded) => { | ||
| 109 | // Track connection | ||
| 110 | m.connection_tracker().on_connect(addr.ip()); | ||
| 111 | // ⬅️ COULD ADD: Check connection limits here | ||
| 112 | |||
| 113 | relay.take_connection(TokioIo::new(upgraded), addr).await | ||
| 114 | |||
| 115 | m.connection_tracker().on_disconnect(addr.ip()); | ||
| 116 | } | ||
| 117 | } | ||
| 118 | }); | ||
| 119 | ``` | ||
| 120 | |||
| 121 | This is the ideal location to add connection limit enforcement before accepting the WebSocket upgrade. | ||
| 122 | |||
| 123 | ## Analysis of Other Relays | ||
| 124 | |||
| 125 | ### 1. strfry (C++, by hoytech) | ||
| 126 | |||
| 127 | **Repository:** https://github.com/hoytech/strfry | ||
| 128 | **Stars:** 623 | **Focus:** High performance, custom LMDB schema | ||
| 129 | |||
| 130 | #### Configuration File: `strfry.conf` | ||
| 131 | |||
| 132 | ##### Event Limits | ||
| 133 | ```conf | ||
| 134 | maxEventSize = 65536 # 64 KB - Maximum normalized JSON size | ||
| 135 | maxNumTags = 2000 # Maximum number of tags allowed | ||
| 136 | maxTagValSize = 1024 # 1 KB - Maximum tag value size | ||
| 137 | rejectEventsNewerThanSeconds = 900 # 15 minutes - Reject future events | ||
| 138 | rejectEventsOlderThanSeconds = 94608000 # ~3 years - Reject old events | ||
| 139 | rejectEphemeralEventsOlderThanSeconds = 60 # 60s - Ephemeral cutoff | ||
| 140 | ephemeralEventsLifetimeSeconds = 300 # 5 minutes - Ephemeral retention | ||
| 141 | ``` | ||
| 142 | |||
| 143 | ##### Connection & WebSocket Limits | ||
| 144 | ```conf | ||
| 145 | maxWebsocketPayloadSize = 131072 # 128 KB - Max WebSocket frame size | ||
| 146 | nofiles = 1000000 # OS-limit on max open files/sockets | ||
| 147 | autoPingSeconds = 55 # WebSocket PING frequency | ||
| 148 | ``` | ||
| 149 | |||
| 150 | ##### Query & Subscription Limits | ||
| 151 | ```conf | ||
| 152 | maxReqFilterSize = 200 # Max filters allowed in a REQ | ||
| 153 | maxSubsPerConnection = 20 # Max concurrent subscriptions per connection | ||
| 154 | maxFilterLimit = 500 # Max records returned per filter | ||
| 155 | queryTimesliceBudgetMicroseconds = 10000 # 10ms - Max CPU per query timeslice | ||
| 156 | ``` | ||
| 157 | |||
| 158 | ##### Thread Pool Configuration | ||
| 159 | ```conf | ||
| 160 | ingester = 3 # Route incoming requests, validate events/sigs | ||
| 161 | reqWorker = 3 # Handle initial DB scan for events | ||
| 162 | reqMonitor = 3 # Handle filtering of new events | ||
| 163 | negentropy = 2 # Handle negentropy protocol messages | ||
| 164 | ``` | ||
| 165 | |||
| 166 | ##### Compression | ||
| 167 | ```conf | ||
| 168 | compression: | ||
| 169 | enabled = true # permessage-deflate compression | ||
| 170 | slidingWindow = true # Maintains sliding window (better compression, more memory) | ||
| 171 | ``` | ||
| 172 | |||
| 173 | #### Implementation Approach | ||
| 174 | |||
| 175 | **Architecture Highlights:** | ||
| 176 | - **No explicit per-IP rate limiting** in config - relies on external reverse proxy or plugin system | ||
| 177 | - **Query pause/resume**: Long-running queries can be paused (stored as few hundred to few thousand bytes) and resumed when socket buffer drains | ||
| 178 | - **Query prioritization**: New queries processed before resuming queries that already ran >10ms | ||
| 179 | - **LMDB-based**: Zero-copy access from page cache, read path requires no locking | ||
| 180 | - **Batching**: Events written in batches with single fsync for efficiency | ||
| 181 | - **Plugin system**: External programs (any language) can implement write policies via line-based JSON interface | ||
| 182 | |||
| 183 | **Rate Limiting Strategy:** | ||
| 184 | - Delegates to external plugins for event acceptance policies | ||
| 185 | - Relies on reverse proxy (nginx, etc.) for connection-level rate limiting | ||
| 186 | - Focus on efficient query handling rather than built-in rate limits | ||
| 187 | |||
| 188 | **Strengths:** | ||
| 189 | - Extremely high performance | ||
| 190 | - Sophisticated query engine with pause/resume | ||
| 191 | - Flexible plugin system | ||
| 192 | |||
| 193 | **Weaknesses:** | ||
| 194 | - No built-in connection or event rate limiting | ||
| 195 | - Requires external infrastructure for DoS protection | ||
| 196 | - More complex to deploy securely | ||
| 197 | |||
| 198 | --- | ||
| 199 | |||
| 200 | ### 2. nostr-rs-relay (Rust, by scsibug/gheartsfield) | ||
| 201 | |||
| 202 | **Repository:** https://git.sr.ht/~gheartsfield/nostr-rs-relay | ||
| 203 | **Focus:** Rust implementation with SQLite or PostgreSQL backend | ||
| 204 | |||
| 205 | #### Configuration File: `config.toml` | ||
| 206 | |||
| 207 | ##### Rate Limiting | ||
| 208 | ```toml | ||
| 209 | # DEFAULT: 0 (unlimited) - Events created per second (server-wide, averaged over 1 minute) | ||
| 210 | # RECOMMENDED: Set to low value like 5 for public relays | ||
| 211 | messages_per_sec = 0 | ||
| 212 | |||
| 213 | # DEFAULT: 0 (unlimited) - Client subscriptions created (averaged over 1 minute) | ||
| 214 | # RECOMMENDED: Set to low value like 10 | ||
| 215 | subscriptions_per_min = 0 | ||
| 216 | |||
| 217 | # DEFAULT: 0 (unlimited) - Concurrent DB connections per client | ||
| 218 | db_conns_per_client = 0 | ||
| 219 | ``` | ||
| 220 | |||
| 221 | ##### Event & Message Size Limits | ||
| 222 | ```toml | ||
| 223 | max_event_bytes = 131072 # 128 KB - Maximum EVENT message size | ||
| 224 | max_ws_message_bytes = 131072 # 128 KB - Maximum WebSocket message | ||
| 225 | max_ws_frame_bytes = 131072 # 128 KB - Maximum WebSocket frame | ||
| 226 | ``` | ||
| 227 | |||
| 228 | ##### Buffering & Backpressure | ||
| 229 | ```toml | ||
| 230 | broadcast_buffer = 16384 # Buffer for subscribers (prevents slow readers consuming memory) | ||
| 231 | event_persist_buffer = 4096 # Buffer for DB commits (provides backpressure if DB writes slow) | ||
| 232 | max_blocking_threads = 16 # Limit blocking threads for DB connections | ||
| 233 | ``` | ||
| 234 | |||
| 235 | ##### Time-based Restrictions | ||
| 236 | ```toml | ||
| 237 | # Reject events with timestamps this far in future | ||
| 238 | # RECOMMENDED: 30 minutes, but defaults to allowing any date if not set | ||
| 239 | reject_future_seconds = 1800 # 30 minutes | ||
| 240 | ``` | ||
| 241 | |||
| 242 | ##### Connection Pool | ||
| 243 | ```toml | ||
| 244 | min_conn = 4 # Minimum reader connections | ||
| 245 | max_conn = 8 # Maximum reader connections (recommended: approx number of cores) | ||
| 246 | ``` | ||
| 247 | |||
| 248 | ##### WebSocket | ||
| 249 | ```toml | ||
| 250 | ping_interval = 300 # 5 minutes - WebSocket ping interval | ||
| 251 | ``` | ||
| 252 | |||
| 253 | ##### Event Kind Filtering | ||
| 254 | ```toml | ||
| 255 | # Optional - Specific event kinds to discard | ||
| 256 | event_kind_blacklist = [] | ||
| 257 | |||
| 258 | # Optional - Only accept these event kinds | ||
| 259 | event_kind_allowlist = [] | ||
| 260 | |||
| 261 | # Rejects imprecise requests (kind-only, author-only) to improve outbox model adoption | ||
| 262 | limit_scrapers = false | ||
| 263 | ``` | ||
| 264 | |||
| 265 | #### Implementation Approach | ||
| 266 | |||
| 267 | **Architecture Highlights:** | ||
| 268 | - **Tokio async runtime**: Non-blocking I/O | ||
| 269 | - **SQLite or PostgreSQL**: Configurable database backend | ||
| 270 | - **gRPC plugin support**: External authorization service via `event_admission_server` | ||
| 271 | - **Rate limiting**: Averaged over time windows (1 minute), applied server-wide | ||
| 272 | - **No per-IP limits by default**: Relies on configuration or external proxy | ||
| 273 | |||
| 274 | **Rate Limiting Strategy:** | ||
| 275 | - Provides configuration options but defaults to UNLIMITED | ||
| 276 | - Operators MUST configure limits for production use | ||
| 277 | - Time-window averaging (1 minute) for rate calculations | ||
| 278 | - Server-wide limits, not per-IP | ||
| 279 | |||
| 280 | **Strengths:** | ||
| 281 | - Well-documented configuration options | ||
| 282 | - Flexible database backends | ||
| 283 | - Buffer-based backpressure mechanism | ||
| 284 | |||
| 285 | **Weaknesses:** | ||
| 286 | - **Dangerously permissive defaults** - unlimited by default | ||
| 287 | - No per-IP rate limiting built-in | ||
| 288 | - Requires active operator configuration for security | ||
| 289 | |||
| 290 | --- | ||
| 291 | |||
| 292 | ### 3. khatru (Go framework, by fiatjaf) | ||
| 293 | |||
| 294 | **Repository:** https://github.com/fiatjaf/khatru | ||
| 295 | **Stars:** 133 | **Focus:** Framework for custom relays, not a standalone relay | ||
| 296 | |||
| 297 | #### Default Configuration (from `relay.go` and `policies/`) | ||
| 298 | |||
| 299 | ##### Built-in Defaults (NewRelay) | ||
| 300 | ```go | ||
| 301 | ReadBufferSize: 1024 // bytes | ||
| 302 | WriteBufferSize: 1024 // bytes | ||
| 303 | WriteWait: 10 * time.Second // Time allowed to write message to peer | ||
| 304 | PongWait: 60 * time.Second // Time allowed to read next pong from peer | ||
| 305 | PingPeriod: 30 * time.Second // Send pings with this period (must be < PongWait) | ||
| 306 | MaxMessageSize: 512000 // ~500 KB - Maximum message size from peer | ||
| 307 | ``` | ||
| 308 | |||
| 309 | ##### Sane Defaults Policy (`ApplySaneDefaults`) | ||
| 310 | |||
| 311 | **Event Rate Limiting:** | ||
| 312 | ```go | ||
| 313 | EventIPRateLimiter( | ||
| 314 | tokensPerInterval: 2, // events | ||
| 315 | interval: 180, // 3 minutes (180 seconds) | ||
| 316 | maxTokens: 10 // burst capacity | ||
| 317 | ) | ||
| 318 | // Effective rate: ~0.67 events/minute per IP, burst up to 10 | ||
| 319 | ``` | ||
| 320 | |||
| 321 | **Filter (REQ) Rate Limiting:** | ||
| 322 | ```go | ||
| 323 | FilterIPRateLimiter( | ||
| 324 | tokensPerInterval: 20, // requests | ||
| 325 | interval: 60, // 1 minute | ||
| 326 | maxTokens: 100 // burst capacity | ||
| 327 | ) | ||
| 328 | // Effective rate: 20 REQs/minute per IP, burst up to 100 | ||
| 329 | ``` | ||
| 330 | |||
| 331 | **Connection Rate Limiting:** | ||
| 332 | ```go | ||
| 333 | ConnectionRateLimiter( | ||
| 334 | tokensPerInterval: 1, // connection | ||
| 335 | interval: 300, // 5 minutes | ||
| 336 | maxTokens: 100 // burst capacity | ||
| 337 | ) | ||
| 338 | // Effective rate: 1 connection per 5 minutes per IP, burst up to 100 | ||
| 339 | ``` | ||
| 340 | |||
| 341 | **Event Policies:** | ||
| 342 | - `RejectEventsWithBase64Media` - Rejects events containing `data:image/` or `data:video/` | ||
| 343 | - `NoComplexFilters` - Rejects filters with >4 total items AND >2 tag filters | ||
| 344 | |||
| 345 | #### Available Rate Limiter Functions | ||
| 346 | |||
| 347 | 1. **`EventIPRateLimiter(tokensPerInterval, interval, maxTokens)`** - Rate limit events by IP | ||
| 348 | 2. **`EventPubKeyRateLimiter(tokensPerInterval, interval, maxTokens)`** - Rate limit by pubkey | ||
| 349 | 3. **`EventAuthedPubKeyRateLimiter(tokensPerInterval, interval, maxTokens)`** - Rate limit authenticated users | ||
| 350 | 4. **`ConnectionRateLimiter(tokensPerInterval, interval, maxTokens)`** - Rate limit new connections | ||
| 351 | 5. **`FilterIPRateLimiter(tokensPerInterval, interval, maxTokens)`** - Rate limit REQ messages | ||
| 352 | |||
| 353 | #### Other Available Policies | ||
| 354 | |||
| 355 | **Event Rejection:** | ||
| 356 | - `PreventTooManyIndexableTags(max, ignoreKinds, onlyKinds)` - Limit indexable tags | ||
| 357 | - `PreventLargeTags(maxTagValueLen)` - Reject large tag values (default: 100 bytes) | ||
| 358 | - `RestrictToSpecifiedKinds(allowEphemeral, kinds...)` - Whitelist specific kinds | ||
| 359 | - `PreventTimestampsInThePast(threshold)` - Reject old events | ||
| 360 | - `PreventTimestampsInTheFuture(threshold)` - Reject future-dated events | ||
| 361 | |||
| 362 | **Filter Policies:** | ||
| 363 | - `NoComplexFilters` - Max 4 items total, max 2 tag filters | ||
| 364 | - `NoEmptyFilters` - Require at least one filter criterion | ||
| 365 | - `AntiSyncBots` - Require author for kind:1 queries | ||
| 366 | - `NoSearchQueries` - Disable search functionality | ||
| 367 | - `MustAuth` - Require NIP-42 authentication | ||
| 368 | |||
| 369 | #### Implementation Approach | ||
| 370 | |||
| 371 | **Architecture Highlights:** | ||
| 372 | - **Token bucket algorithm**: Implemented in `startRateLimitSystem[K]` using atomic counters | ||
| 373 | - **Per-key tracking**: Uses `xsync.MapOf` for concurrent map access | ||
| 374 | - **Automatic cleanup**: Goroutine periodically decrements buckets and removes zero/negative entries | ||
| 375 | - **Framework design**: Relay operators compose policies by adding functions to hook slices | ||
| 376 | - **No global defaults enforced**: Operators must explicitly apply policies | ||
| 377 | - **Lightweight**: Pure Go, no external dependencies for rate limiting | ||
| 378 | |||
| 379 | **Rate Limiting Strategy:** | ||
| 380 | - **Most opinionated defaults** of all three relays | ||
| 381 | - Token bucket with automatic refill | ||
| 382 | - Per-IP tracking for all limits | ||
| 383 | - Composable policy system | ||
| 384 | |||
| 385 | **Strengths:** | ||
| 386 | - **Secure by default** when using `ApplySaneDefaults` | ||
| 387 | - Very clear, composable policy API | ||
| 388 | - Lightweight token bucket implementation | ||
| 389 | - Well-suited for custom relay development | ||
| 390 | |||
| 391 | **Weaknesses:** | ||
| 392 | - Framework, not standalone relay (requires custom code) | ||
| 393 | - Aggressive defaults might be too restrictive for some use cases | ||
| 394 | - Go-based (not applicable to ngit-grasp, but worth noting) | ||
| 395 | |||
| 396 | --- | ||
| 397 | |||
| 398 | ## Comparative Summary | ||
| 399 | |||
| 400 | | Feature | strfry | nostr-rs-relay | khatru (sane defaults) | | ||
| 401 | |---------|--------|----------------|------------------------| | ||
| 402 | | **Max Event Size** | 64 KB | 128 KB | 500 KB | | ||
| 403 | | **Max WS Message** | 128 KB | 128 KB | 500 KB | | ||
| 404 | | **Max Subs/Connection** | 20 | ∞ (unlimited) | ∞ (unlimited) | | ||
| 405 | | **Max Filters/REQ** | 200 | ∞ (unlimited) | Complexity-based (4 items, 2 tags) | | ||
| 406 | | **Event Rate Limit** | Plugin-based | 0 (unlimited default) | **2 per 3min per IP** | | ||
| 407 | | **REQ Rate Limit** | None built-in | 0 (unlimited default) | **20/min per IP** | | ||
| 408 | | **Connection Rate** | None built-in | None | **1 per 5min per IP** | | ||
| 409 | | **Future Event Rejection** | 15 minutes | 30 minutes | Policy-based | | ||
| 410 | | **Rate Limit Technique** | External plugins | Averaged over 1 minute | Token bucket (atomic) | | ||
| 411 | | **Backpressure** | Query pause/resume | Buffering + blocking | Framework hooks | | ||
| 412 | | **Default Philosophy** | Permissive + plugins | **Dangerously permissive** | **Conservative** | | ||
| 413 | | **Per-IP Tracking** | Metrics only | No | Yes (all limits) | | ||
| 414 | | **Production Ready** | Yes (with config) | Yes (with config) | Framework (DIY) | | ||
| 415 | |||
| 416 | ## Rust Rate Limiting Ecosystem | ||
| 417 | |||
| 418 | ### Governor Crate | ||
| 419 | |||
| 420 | **Repository:** https://github.com/boinkor-net/governor | ||
| 421 | **Documentation:** https://docs.rs/governor/ | ||
| 422 | **Version:** 0.10.4 (stable) | ||
| 423 | |||
| 424 | #### Overview | ||
| 425 | |||
| 426 | Governor is the most popular rate limiting library in the Rust ecosystem. It implements the **Generic Cell Rate Algorithm (GCRA)**, which is equivalent to a token bucket but more space-efficient. | ||
| 427 | |||
| 428 | #### Features | ||
| 429 | |||
| 430 | - **Thread-safe**: Uses atomic operations for lock-free operation | ||
| 431 | - **Per-key rate limiting**: Built-in support via `DefaultKeyedRateLimiter` | ||
| 432 | - **Direct rate limiting**: Single-state limiter via `DefaultDirectRateLimiter` | ||
| 433 | - **Async/await support**: Works with Tokio and other async runtimes | ||
| 434 | - **Jitter support**: Built-in jitter for avoiding thundering herd | ||
| 435 | - **Dashmap integration**: Uses `dashmap` for concurrent key-value storage | ||
| 436 | - **Quota system**: Flexible quota definitions (per second, minute, hour, etc.) | ||
| 437 | |||
| 438 | #### Example Usage | ||
| 439 | |||
| 440 | ```rust | ||
| 441 | use std::num::NonZeroU32; | ||
| 442 | use nonzero_ext::*; | ||
| 443 | use governor::{Quota, RateLimiter}; | ||
| 444 | |||
| 445 | // Simple direct rate limiter | ||
| 446 | let mut lim = RateLimiter::direct(Quota::per_second(nonzero!(50u32))); | ||
| 447 | assert_eq!(Ok(()), lim.check()); | ||
| 448 | |||
| 449 | // Keyed rate limiter (e.g., per IP) | ||
| 450 | use governor::state::{InMemoryState, keyed::DefaultKeyedRateLimiter}; | ||
| 451 | use std::net::IpAddr; | ||
| 452 | |||
| 453 | let limiter = RateLimiter::keyed(Quota::per_minute(nonzero!(10u32))); | ||
| 454 | let ip: IpAddr = "192.168.1.1".parse().unwrap(); | ||
| 455 | if limiter.check_key(&ip).is_err() { | ||
| 456 | // Rate limit exceeded for this IP | ||
| 457 | } | ||
| 458 | ``` | ||
| 459 | |||
| 460 | #### Dependencies | ||
| 461 | |||
| 462 | - `cfg-if` - Configuration | ||
| 463 | - `dashmap` (optional) - Concurrent hashmap for keyed limiters | ||
| 464 | - `parking_lot` (optional) - More efficient mutexes | ||
| 465 | - `quanta` (optional) - High-resolution timing | ||
| 466 | - `portable-atomic` - Atomic operations | ||
| 467 | - `nonzero_ext` - NonZero integer utilities | ||
| 468 | |||
| 469 | #### Pros | ||
| 470 | |||
| 471 | - Industry standard, widely used | ||
| 472 | - Well-maintained and documented | ||
| 473 | - Efficient implementation (atomic operations) | ||
| 474 | - Flexible quota system | ||
| 475 | - Works with async | ||
| 476 | |||
| 477 | #### Cons | ||
| 478 | |||
| 479 | - Additional dependency (though well-vetted) | ||
| 480 | - Slightly more complex API than hand-rolled solution | ||
| 481 | - Uses more memory for keyed limiters with many keys | ||
| 482 | |||
| 483 | ### Alternative: Extend Existing ThrottleManager | ||
| 484 | |||
| 485 | ngit-grasp already has a working rate limiter in `src/purgatory/sync/throttle.rs`: | ||
| 486 | |||
| 487 | ```rust | ||
| 488 | pub struct ThrottleManager { | ||
| 489 | throttles: DashMap<String, Mutex<DomainThrottle>>, | ||
| 490 | max_concurrent_per_domain: u32, | ||
| 491 | max_per_minute_per_domain: u32, | ||
| 492 | } | ||
| 493 | ``` | ||
| 494 | |||
| 495 | **Sliding window implementation:** | ||
| 496 | ```rust | ||
| 497 | let recent_count = self.request_times | ||
| 498 | .iter() | ||
| 499 | .filter(|t| now.duration_since(**t) < window) | ||
| 500 | .count(); | ||
| 501 | recent_count < self.max_per_minute as usize | ||
| 502 | ``` | ||
| 503 | |||
| 504 | #### Pros of Reusing | ||
| 505 | |||
| 506 | - No new dependencies | ||
| 507 | - Already proven to work in production | ||
| 508 | - Team familiarity with the code | ||
| 509 | - Consistent patterns across codebase | ||
| 510 | |||
| 511 | #### Cons of Reusing | ||
| 512 | |||
| 513 | - More maintenance burden | ||
| 514 | - May not handle all edge cases | ||
| 515 | - Less efficient than GCRA algorithm | ||
| 516 | - Would need to be generalized for different use cases | ||
| 517 | |||
| 518 | ## Recommendations for ngit-grasp | ||
| 519 | |||
| 520 | ### 1. Rate Limiting Library Choice | ||
| 521 | |||
| 522 | **Recommendation: Use `governor` crate** | ||
| 523 | |||
| 524 | **Reasoning:** | ||
| 525 | - Industry standard with proven track record | ||
| 526 | - More efficient than our sliding window approach | ||
| 527 | - Handles edge cases we might miss | ||
| 528 | - Good async support for our Tokio-based architecture | ||
| 529 | - Active maintenance and community support | ||
| 530 | - Minimal overhead (atomic operations, lock-free) | ||
| 531 | |||
| 532 | ### 2. Default Philosophy | ||
| 533 | |||
| 534 | **Recommendation: Conservative defaults with clear relaxation path** | ||
| 535 | |||
| 536 | **Reasoning:** | ||
| 537 | - Following khatru's approach: secure by default | ||
| 538 | - Better to start restrictive and allow operators to relax | ||
| 539 | - Prevents "configuration debt" where operators forget to harden | ||
| 540 | - ngit-grasp is infrastructure software - security should be default | ||
| 541 | - Clear documentation on how to adjust for different use cases | ||
| 542 | |||
| 543 | ### 3. Proposed Default Values | ||
| 544 | |||
| 545 | Based on research and ngit-grasp's specific use case (git-over-nostr relay): | ||
| 546 | |||
| 547 | ```toml | ||
| 548 | # Connection Limits | ||
| 549 | NGIT_MAX_CONNECTIONS_GLOBAL = 1000 | ||
| 550 | NGIT_MAX_CONNECTIONS_PER_IP = 10 | ||
| 551 | NGIT_CONNECTION_RATE_PER_IP = "5/minute" # 5 connections per minute per IP | ||
| 552 | |||
| 553 | # Subscription (REQ) Limits | ||
| 554 | NGIT_MAX_SUBSCRIPTIONS_PER_CONNECTION = 20 | ||
| 555 | NGIT_MAX_FILTERS_PER_REQ = 100 | ||
| 556 | NGIT_SUBSCRIPTION_RATE_PER_IP = "30/minute" # 30 REQs per minute per IP | ||
| 557 | |||
| 558 | # Event Ingestion Limits | ||
| 559 | NGIT_EVENT_RATE_PER_IP = "10/minute" # 10 events per minute per IP | ||
| 560 | NGIT_EVENT_RATE_BURST = 30 # Allow burst up to 30 | ||
| 561 | NGIT_MAX_EVENT_SIZE_BYTES = 131072 # 128 KB (matches nostr-rs-relay) | ||
| 562 | NGIT_MAX_WEBSOCKET_MESSAGE_BYTES = 131072 # 128 KB | ||
| 563 | |||
| 564 | # HTTP Endpoint Protection | ||
| 565 | NGIT_HTTP_RATE_PER_IP = "60/minute" # 60 HTTP requests per minute per IP | ||
| 566 | |||
| 567 | # Time-based Event Restrictions | ||
| 568 | NGIT_REJECT_EVENTS_NEWER_THAN_SECONDS = 900 # 15 minutes (matches strfry) | ||
| 569 | NGIT_REJECT_EVENTS_OLDER_THAN_SECONDS = 94608000 # ~3 years (matches strfry) | ||
| 570 | |||
| 571 | # Whitelist | ||
| 572 | NGIT_RATE_LIMIT_WHITELIST_IPS = "" # Comma-separated IPs exempt from rate limits | ||
| 573 | ``` | ||
| 574 | |||
| 575 | **Rationale for values:** | ||
| 576 | - **Connections:** 10/IP is conservative but allows legitimate multi-client use | ||
| 577 | - **Subscriptions:** 20/connection matches strfry, reasonable for typical clients | ||
| 578 | - **Events:** 10/min is more permissive than khatru (2 per 3min) but still protective | ||
| 579 | - **Message size:** 128 KB matches industry standard (nostr-rs-relay, strfry's WS message size) | ||
| 580 | - **HTTP:** 60/min allows normal browsing without allowing scraping abuse | ||
| 581 | |||
| 582 | ### 4. Implementation Phases | ||
| 583 | |||
| 584 | **Phase 1: Core DoS Prevention (High Priority)** | ||
| 585 | - Connection limits (global and per-IP) | ||
| 586 | - Basic event rate limiting (per-IP) | ||
| 587 | - Message size limits | ||
| 588 | - WebSocket message limits | ||
| 589 | |||
| 590 | **Phase 2: Advanced Subscription Protection (Medium Priority)** | ||
| 591 | - Subscription limits per connection | ||
| 592 | - Filter complexity limits | ||
| 593 | - Subscription rate limiting per IP | ||
| 594 | |||
| 595 | **Phase 3: HTTP & Advanced Features (Lower Priority)** | ||
| 596 | - HTTP endpoint rate limiting | ||
| 597 | - IP whitelisting | ||
| 598 | - Fine-grained metrics for rate limit hits | ||
| 599 | - Configurable rejection messages | ||
| 600 | |||
| 601 | ### 5. Configuration Management | ||
| 602 | |||
| 603 | Following AGENTS.md requirements, ALL configuration changes must update: | ||
| 604 | |||
| 605 | 1. **`src/config.rs`** - Add fields with proper env var names and defaults | ||
| 606 | 2. **`docs/reference/configuration.md`** - Document each option with examples | ||
| 607 | 3. **`nix/module.nix`** - Add NixOS options in `instanceOptions` | ||
| 608 | 4. **`.env.example`** - Add options with comments | ||
| 609 | |||
| 610 | ### 6. Metrics & Observability | ||
| 611 | |||
| 612 | Add Prometheus metrics for: | ||
| 613 | - `ngit_rate_limit_hits_total{limit_type, reason}` - Counter of rate limit hits | ||
| 614 | - `ngit_connections_active` - Current active connections | ||
| 615 | - `ngit_connections_per_ip` - Histogram of connections per IP | ||
| 616 | - `ngit_subscriptions_active` - Current active subscriptions | ||
| 617 | - `ngit_rate_limit_whitelisted_requests_total` - Requests from whitelisted IPs | ||
| 618 | |||
| 619 | ### 7. Testing Strategy | ||
| 620 | |||
| 621 | - **Unit tests**: Test rate limiter logic in isolation | ||
| 622 | - **Integration tests**: Use `TestRelay` to verify limits enforced | ||
| 623 | - **Fuzz testing**: Random patterns to ensure no panics | ||
| 624 | - **Load testing**: Verify performance under rate-limited load | ||
| 625 | - **Metrics verification**: Ensure metrics accurately reflect limit hits | ||
| 626 | |||
| 627 | ## Common Attack Patterns | ||
| 628 | |||
| 629 | Based on production relay operator experiences: | ||
| 630 | |||
| 631 | 1. **Connection flooding** - Open thousands of connections to exhaust file descriptors | ||
| 632 | 2. **Subscription spam** - Open many REQs per connection to consume memory | ||
| 633 | 3. **Event spam** - Submit events rapidly to overwhelm storage/processing | ||
| 634 | 4. **Large message attacks** - Send huge WebSocket frames to consume bandwidth | ||
| 635 | 5. **Complex filter DoS** - Submit filters with thousands of authors/kinds to slow queries | ||
| 636 | 6. **Slow read attack** - Connect but never read, filling write buffers | ||
| 637 | 7. **Time-based attacks** - Events with extreme timestamps to bypass caching | ||
| 638 | 8. **Metrics scraping** - Hammer `/metrics` endpoint to consume CPU | ||
| 639 | |||
| 640 | All of these are addressed by the proposed implementation. | ||
| 641 | |||
| 642 | ## Open Questions | ||
| 643 | |||
| 644 | 1. **Should we implement per-pubkey rate limiting** (like khatru) in addition to per-IP? | ||
| 645 | - Useful for authenticated scenarios | ||
| 646 | - Requires NIP-42 AUTH support | ||
| 647 | - Could be Phase 4 | ||
| 648 | |||
| 649 | 2. **Should ephemeral events have different limits?** | ||
| 650 | - strfry has special handling for ephemeral events | ||
| 651 | - Consider separate retention and rate limits | ||
| 652 | |||
| 653 | 3. **Should we support dynamic limit adjustment?** | ||
| 654 | - Allow hot-reloading of limits without restart | ||
| 655 | - Useful for responding to active attacks | ||
| 656 | |||
| 657 | 4. **How should we handle IPv6?** | ||
| 658 | - Rate limit by /64 or /128? | ||
| 659 | - Per-address might be too granular for IPv6 | ||
| 660 | |||
| 661 | ## References | ||
| 662 | |||
| 663 | - strfry repository: https://github.com/hoytech/strfry | ||
| 664 | - strfry config: https://github.com/hoytech/strfry/blob/master/strfry.conf | ||
| 665 | - nostr-rs-relay repository: https://git.sr.ht/~gheartsfield/nostr-rs-relay | ||
| 666 | - khatru repository: https://github.com/fiatjaf/khatru | ||
| 667 | - khatru policies: https://github.com/fiatjaf/khatru/tree/master/policies | ||
| 668 | - governor crate: https://docs.rs/governor/ | ||
| 669 | - GCRA algorithm: https://en.wikipedia.org/wiki/Generic_cell_rate_algorithm | ||