upleb.uk

Public git repos — served from a NIP-34 GRASP relay at git.upleb.uk

summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--docs/explanation/defensive-analysis-of-other-relays.md669
1 files changed, 669 insertions, 0 deletions
diff --git a/docs/explanation/defensive-analysis-of-other-relays.md b/docs/explanation/defensive-analysis-of-other-relays.md
new file mode 100644
index 0000000..eb5f020
--- /dev/null
+++ b/docs/explanation/defensive-analysis-of-other-relays.md
@@ -0,0 +1,669 @@
1# Defensive Analysis of Other Nostr Relays
2
3**Issue:** d6ee - Defensive Relay Features
4**Date:** 2026-01-13
5**Purpose:** Research findings on rate limiting and defensive features in major Nostr relay implementations to inform ngit-grasp's defensive strategy.
6
7## Executive Summary
8
9This analysis examines how three major Nostr relay implementations (strfry, nostr-rs-relay, and khatru) handle rate limiting, connection management, and DoS protection. The goal is to identify industry best practices and concrete defaults to implement in ngit-grasp.
10
11**Key Finding:** Most relays have VERY permissive defaults or no limits at all, relying on operators to configure appropriately or use external reverse proxies. Only khatru provides opinionated secure-by-default settings.
12
13## Current State of ngit-grasp
14
15### Existing Defensive Features ✅
16
17#### 1. Connection Tracking & Abuse Detection
18**Location:** `src/metrics/connection.rs`
19
20- Per-IP connection counting via `ConnectionTracker`
21- Abuse threshold detection (default: 10 connections per IP)
22- Privacy-preserving metrics (IPs never exposed to Prometheus)
23- Tracks: total connections, unique IPs, flagged abusers
24
25**Configuration:**
26```rust
27// src/config.rs:366-372
28pub metrics_connection_per_ip_abuse_threshold: u32 = 10
29```
30
31**Limitations:**
32- ⚠️ **Display-only** - Detection happens but no enforcement
33- ⚠️ No connection limit enforcement
34- ⚠️ No per-IP subscription limits
35- ⚠️ No time-based rate limits
36
37#### 2. Git Remote Throttling (Purgatory Sync)
38**Location:** `src/purgatory/sync/throttle.rs`
39
40- Sophisticated domain-based rate limiting for outbound git fetch requests
41- Per-domain concurrent request limits (default: 5)
42- Per-domain rate limits (default: 30 requests/minute)
43- Round-robin queue management for fairness
44- Sliding window implementation
45
46**How it works:**
47```rust
48// Lines 159: Default throttle manager creation
49let throttle_manager = Arc::new(ThrottleManager::new(5, 30));
50
51// Lines 96-106: DomainThrottle tracks concurrent and rate limits
52pub fn new(domain: String, max_concurrent: u32, max_per_minute: u32)
53
54// Lines 113-129: Checks both limits before allowing requests
55```
56
57**Note:** Only applies to **outbound** git fetches, not incoming client connections.
58
59#### 3. Event Blacklisting
60**Location:** `src/config.rs` (lines 247-281, 658-668), `src/nostr/builder.rs` (lines 75-86, 495-505)
61
62- Event author blacklist - Block all events from specific npubs
63- Repository blacklist - Block announcements for specific repos/identifiers/npubs
64- Blacklist checked FIRST in write policy (overrides everything)
65
66**Configuration:**
67```bash
68NGIT_EVENT_BLACKLIST="npub1...,npub2..."
69NGIT_REPOSITORY_BLACKLIST="npub1.../identifier,identifier"
70```
71
72#### 4. Naughty List for Problematic Git Remotes
73**Location:** `src/sync/naughty_list.rs`
74
75- Tracks git remote domains with persistent infrastructure errors
76- Classifies errors (DNS, TLS, protocol, WebSocket)
77- Temporary blacklisting with expiration (default: 12 hours)
78- Used to skip unreliable relays during sync
79
80#### 5. Metrics & Monitoring
81**Location:** `src/metrics/mod.rs`
82
83- WebSocket connection metrics (total, duration, messages by type)
84- Git operation tracking (clone, fetch, push by status)
85- Nostr event metrics (received, stored, rejected by kind and reason)
86- Sync metrics (connections, attempts, failures)
87- Repository count tracking
88
89**No enforcement capabilities** - purely observability.
90
91### What's Missing ❌
92
931. **WebSocket Connection Limits** - No global or per-IP enforcement
942. **Subscription (REQ) Limits** - Clients can open unlimited REQs
953. **Event Rate Limiting** - No per-client/per-IP limits
964. **HTTP Endpoint Protection** - All endpoints unprotected
975. **Message Size Limits** - No WebSocket/event size caps
986. **Rate Limiting Crates** - No dependencies available
99
100### Integration Points for Implementation
101
102#### WebSocket Connection Accept Point
103**Location:** `src/http/mod.rs:402-424`
104
105```rust
106tokio::spawn(async move {
107 match hyper::upgrade::on(req).await {
108 Ok(upgraded) => {
109 // Track connection
110 m.connection_tracker().on_connect(addr.ip());
111 // ⬅️ COULD ADD: Check connection limits here
112
113 relay.take_connection(TokioIo::new(upgraded), addr).await
114
115 m.connection_tracker().on_disconnect(addr.ip());
116 }
117 }
118});
119```
120
121This is the ideal location to add connection limit enforcement before accepting the WebSocket upgrade.
122
123## Analysis of Other Relays
124
125### 1. strfry (C++, by hoytech)
126
127**Repository:** https://github.com/hoytech/strfry
128**Stars:** 623 | **Focus:** High performance, custom LMDB schema
129
130#### Configuration File: `strfry.conf`
131
132##### Event Limits
133```conf
134maxEventSize = 65536 # 64 KB - Maximum normalized JSON size
135maxNumTags = 2000 # Maximum number of tags allowed
136maxTagValSize = 1024 # 1 KB - Maximum tag value size
137rejectEventsNewerThanSeconds = 900 # 15 minutes - Reject future events
138rejectEventsOlderThanSeconds = 94608000 # ~3 years - Reject old events
139rejectEphemeralEventsOlderThanSeconds = 60 # 60s - Ephemeral cutoff
140ephemeralEventsLifetimeSeconds = 300 # 5 minutes - Ephemeral retention
141```
142
143##### Connection & WebSocket Limits
144```conf
145maxWebsocketPayloadSize = 131072 # 128 KB - Max WebSocket frame size
146nofiles = 1000000 # OS-limit on max open files/sockets
147autoPingSeconds = 55 # WebSocket PING frequency
148```
149
150##### Query & Subscription Limits
151```conf
152maxReqFilterSize = 200 # Max filters allowed in a REQ
153maxSubsPerConnection = 20 # Max concurrent subscriptions per connection
154maxFilterLimit = 500 # Max records returned per filter
155queryTimesliceBudgetMicroseconds = 10000 # 10ms - Max CPU per query timeslice
156```
157
158##### Thread Pool Configuration
159```conf
160ingester = 3 # Route incoming requests, validate events/sigs
161reqWorker = 3 # Handle initial DB scan for events
162reqMonitor = 3 # Handle filtering of new events
163negentropy = 2 # Handle negentropy protocol messages
164```
165
166##### Compression
167```conf
168compression:
169 enabled = true # permessage-deflate compression
170 slidingWindow = true # Maintains sliding window (better compression, more memory)
171```
172
173#### Implementation Approach
174
175**Architecture Highlights:**
176- **No explicit per-IP rate limiting** in config - relies on external reverse proxy or plugin system
177- **Query pause/resume**: Long-running queries can be paused (stored as few hundred to few thousand bytes) and resumed when socket buffer drains
178- **Query prioritization**: New queries processed before resuming queries that already ran >10ms
179- **LMDB-based**: Zero-copy access from page cache, read path requires no locking
180- **Batching**: Events written in batches with single fsync for efficiency
181- **Plugin system**: External programs (any language) can implement write policies via line-based JSON interface
182
183**Rate Limiting Strategy:**
184- Delegates to external plugins for event acceptance policies
185- Relies on reverse proxy (nginx, etc.) for connection-level rate limiting
186- Focus on efficient query handling rather than built-in rate limits
187
188**Strengths:**
189- Extremely high performance
190- Sophisticated query engine with pause/resume
191- Flexible plugin system
192
193**Weaknesses:**
194- No built-in connection or event rate limiting
195- Requires external infrastructure for DoS protection
196- More complex to deploy securely
197
198---
199
200### 2. nostr-rs-relay (Rust, by scsibug/gheartsfield)
201
202**Repository:** https://git.sr.ht/~gheartsfield/nostr-rs-relay
203**Focus:** Rust implementation with SQLite or PostgreSQL backend
204
205#### Configuration File: `config.toml`
206
207##### Rate Limiting
208```toml
209# DEFAULT: 0 (unlimited) - Events created per second (server-wide, averaged over 1 minute)
210# RECOMMENDED: Set to low value like 5 for public relays
211messages_per_sec = 0
212
213# DEFAULT: 0 (unlimited) - Client subscriptions created (averaged over 1 minute)
214# RECOMMENDED: Set to low value like 10
215subscriptions_per_min = 0
216
217# DEFAULT: 0 (unlimited) - Concurrent DB connections per client
218db_conns_per_client = 0
219```
220
221##### Event & Message Size Limits
222```toml
223max_event_bytes = 131072 # 128 KB - Maximum EVENT message size
224max_ws_message_bytes = 131072 # 128 KB - Maximum WebSocket message
225max_ws_frame_bytes = 131072 # 128 KB - Maximum WebSocket frame
226```
227
228##### Buffering & Backpressure
229```toml
230broadcast_buffer = 16384 # Buffer for subscribers (prevents slow readers consuming memory)
231event_persist_buffer = 4096 # Buffer for DB commits (provides backpressure if DB writes slow)
232max_blocking_threads = 16 # Limit blocking threads for DB connections
233```
234
235##### Time-based Restrictions
236```toml
237# Reject events with timestamps this far in future
238# RECOMMENDED: 30 minutes, but defaults to allowing any date if not set
239reject_future_seconds = 1800 # 30 minutes
240```
241
242##### Connection Pool
243```toml
244min_conn = 4 # Minimum reader connections
245max_conn = 8 # Maximum reader connections (recommended: approx number of cores)
246```
247
248##### WebSocket
249```toml
250ping_interval = 300 # 5 minutes - WebSocket ping interval
251```
252
253##### Event Kind Filtering
254```toml
255# Optional - Specific event kinds to discard
256event_kind_blacklist = []
257
258# Optional - Only accept these event kinds
259event_kind_allowlist = []
260
261# Rejects imprecise requests (kind-only, author-only) to improve outbox model adoption
262limit_scrapers = false
263```
264
265#### Implementation Approach
266
267**Architecture Highlights:**
268- **Tokio async runtime**: Non-blocking I/O
269- **SQLite or PostgreSQL**: Configurable database backend
270- **gRPC plugin support**: External authorization service via `event_admission_server`
271- **Rate limiting**: Averaged over time windows (1 minute), applied server-wide
272- **No per-IP limits by default**: Relies on configuration or external proxy
273
274**Rate Limiting Strategy:**
275- Provides configuration options but defaults to UNLIMITED
276- Operators MUST configure limits for production use
277- Time-window averaging (1 minute) for rate calculations
278- Server-wide limits, not per-IP
279
280**Strengths:**
281- Well-documented configuration options
282- Flexible database backends
283- Buffer-based backpressure mechanism
284
285**Weaknesses:**
286- **Dangerously permissive defaults** - unlimited by default
287- No per-IP rate limiting built-in
288- Requires active operator configuration for security
289
290---
291
292### 3. khatru (Go framework, by fiatjaf)
293
294**Repository:** https://github.com/fiatjaf/khatru
295**Stars:** 133 | **Focus:** Framework for custom relays, not a standalone relay
296
297#### Default Configuration (from `relay.go` and `policies/`)
298
299##### Built-in Defaults (NewRelay)
300```go
301ReadBufferSize: 1024 // bytes
302WriteBufferSize: 1024 // bytes
303WriteWait: 10 * time.Second // Time allowed to write message to peer
304PongWait: 60 * time.Second // Time allowed to read next pong from peer
305PingPeriod: 30 * time.Second // Send pings with this period (must be < PongWait)
306MaxMessageSize: 512000 // ~500 KB - Maximum message size from peer
307```
308
309##### Sane Defaults Policy (`ApplySaneDefaults`)
310
311**Event Rate Limiting:**
312```go
313EventIPRateLimiter(
314 tokensPerInterval: 2, // events
315 interval: 180, // 3 minutes (180 seconds)
316 maxTokens: 10 // burst capacity
317)
318// Effective rate: ~0.67 events/minute per IP, burst up to 10
319```
320
321**Filter (REQ) Rate Limiting:**
322```go
323FilterIPRateLimiter(
324 tokensPerInterval: 20, // requests
325 interval: 60, // 1 minute
326 maxTokens: 100 // burst capacity
327)
328// Effective rate: 20 REQs/minute per IP, burst up to 100
329```
330
331**Connection Rate Limiting:**
332```go
333ConnectionRateLimiter(
334 tokensPerInterval: 1, // connection
335 interval: 300, // 5 minutes
336 maxTokens: 100 // burst capacity
337)
338// Effective rate: 1 connection per 5 minutes per IP, burst up to 100
339```
340
341**Event Policies:**
342- `RejectEventsWithBase64Media` - Rejects events containing `data:image/` or `data:video/`
343- `NoComplexFilters` - Rejects filters with >4 total items AND >2 tag filters
344
345#### Available Rate Limiter Functions
346
3471. **`EventIPRateLimiter(tokensPerInterval, interval, maxTokens)`** - Rate limit events by IP
3482. **`EventPubKeyRateLimiter(tokensPerInterval, interval, maxTokens)`** - Rate limit by pubkey
3493. **`EventAuthedPubKeyRateLimiter(tokensPerInterval, interval, maxTokens)`** - Rate limit authenticated users
3504. **`ConnectionRateLimiter(tokensPerInterval, interval, maxTokens)`** - Rate limit new connections
3515. **`FilterIPRateLimiter(tokensPerInterval, interval, maxTokens)`** - Rate limit REQ messages
352
353#### Other Available Policies
354
355**Event Rejection:**
356- `PreventTooManyIndexableTags(max, ignoreKinds, onlyKinds)` - Limit indexable tags
357- `PreventLargeTags(maxTagValueLen)` - Reject large tag values (default: 100 bytes)
358- `RestrictToSpecifiedKinds(allowEphemeral, kinds...)` - Whitelist specific kinds
359- `PreventTimestampsInThePast(threshold)` - Reject old events
360- `PreventTimestampsInTheFuture(threshold)` - Reject future-dated events
361
362**Filter Policies:**
363- `NoComplexFilters` - Max 4 items total, max 2 tag filters
364- `NoEmptyFilters` - Require at least one filter criterion
365- `AntiSyncBots` - Require author for kind:1 queries
366- `NoSearchQueries` - Disable search functionality
367- `MustAuth` - Require NIP-42 authentication
368
369#### Implementation Approach
370
371**Architecture Highlights:**
372- **Token bucket algorithm**: Implemented in `startRateLimitSystem[K]` using atomic counters
373- **Per-key tracking**: Uses `xsync.MapOf` for concurrent map access
374- **Automatic cleanup**: Goroutine periodically decrements buckets and removes zero/negative entries
375- **Framework design**: Relay operators compose policies by adding functions to hook slices
376- **No global defaults enforced**: Operators must explicitly apply policies
377- **Lightweight**: Pure Go, no external dependencies for rate limiting
378
379**Rate Limiting Strategy:**
380- **Most opinionated defaults** of all three relays
381- Token bucket with automatic refill
382- Per-IP tracking for all limits
383- Composable policy system
384
385**Strengths:**
386- **Secure by default** when using `ApplySaneDefaults`
387- Very clear, composable policy API
388- Lightweight token bucket implementation
389- Well-suited for custom relay development
390
391**Weaknesses:**
392- Framework, not standalone relay (requires custom code)
393- Aggressive defaults might be too restrictive for some use cases
394- Go-based (not applicable to ngit-grasp, but worth noting)
395
396---
397
398## Comparative Summary
399
400| Feature | strfry | nostr-rs-relay | khatru (sane defaults) |
401|---------|--------|----------------|------------------------|
402| **Max Event Size** | 64 KB | 128 KB | 500 KB |
403| **Max WS Message** | 128 KB | 128 KB | 500 KB |
404| **Max Subs/Connection** | 20 | ∞ (unlimited) | ∞ (unlimited) |
405| **Max Filters/REQ** | 200 | ∞ (unlimited) | Complexity-based (4 items, 2 tags) |
406| **Event Rate Limit** | Plugin-based | 0 (unlimited default) | **2 per 3min per IP** |
407| **REQ Rate Limit** | None built-in | 0 (unlimited default) | **20/min per IP** |
408| **Connection Rate** | None built-in | None | **1 per 5min per IP** |
409| **Future Event Rejection** | 15 minutes | 30 minutes | Policy-based |
410| **Rate Limit Technique** | External plugins | Averaged over 1 minute | Token bucket (atomic) |
411| **Backpressure** | Query pause/resume | Buffering + blocking | Framework hooks |
412| **Default Philosophy** | Permissive + plugins | **Dangerously permissive** | **Conservative** |
413| **Per-IP Tracking** | Metrics only | No | Yes (all limits) |
414| **Production Ready** | Yes (with config) | Yes (with config) | Framework (DIY) |
415
416## Rust Rate Limiting Ecosystem
417
418### Governor Crate
419
420**Repository:** https://github.com/boinkor-net/governor
421**Documentation:** https://docs.rs/governor/
422**Version:** 0.10.4 (stable)
423
424#### Overview
425
426Governor is the most popular rate limiting library in the Rust ecosystem. It implements the **Generic Cell Rate Algorithm (GCRA)**, which is equivalent to a token bucket but more space-efficient.
427
428#### Features
429
430- **Thread-safe**: Uses atomic operations for lock-free operation
431- **Per-key rate limiting**: Built-in support via `DefaultKeyedRateLimiter`
432- **Direct rate limiting**: Single-state limiter via `DefaultDirectRateLimiter`
433- **Async/await support**: Works with Tokio and other async runtimes
434- **Jitter support**: Built-in jitter for avoiding thundering herd
435- **Dashmap integration**: Uses `dashmap` for concurrent key-value storage
436- **Quota system**: Flexible quota definitions (per second, minute, hour, etc.)
437
438#### Example Usage
439
440```rust
441use std::num::NonZeroU32;
442use nonzero_ext::*;
443use governor::{Quota, RateLimiter};
444
445// Simple direct rate limiter
446let mut lim = RateLimiter::direct(Quota::per_second(nonzero!(50u32)));
447assert_eq!(Ok(()), lim.check());
448
449// Keyed rate limiter (e.g., per IP)
450use governor::state::{InMemoryState, keyed::DefaultKeyedRateLimiter};
451use std::net::IpAddr;
452
453let limiter = RateLimiter::keyed(Quota::per_minute(nonzero!(10u32)));
454let ip: IpAddr = "192.168.1.1".parse().unwrap();
455if limiter.check_key(&ip).is_err() {
456 // Rate limit exceeded for this IP
457}
458```
459
460#### Dependencies
461
462- `cfg-if` - Configuration
463- `dashmap` (optional) - Concurrent hashmap for keyed limiters
464- `parking_lot` (optional) - More efficient mutexes
465- `quanta` (optional) - High-resolution timing
466- `portable-atomic` - Atomic operations
467- `nonzero_ext` - NonZero integer utilities
468
469#### Pros
470
471- Industry standard, widely used
472- Well-maintained and documented
473- Efficient implementation (atomic operations)
474- Flexible quota system
475- Works with async
476
477#### Cons
478
479- Additional dependency (though well-vetted)
480- Slightly more complex API than hand-rolled solution
481- Uses more memory for keyed limiters with many keys
482
483### Alternative: Extend Existing ThrottleManager
484
485ngit-grasp already has a working rate limiter in `src/purgatory/sync/throttle.rs`:
486
487```rust
488pub struct ThrottleManager {
489 throttles: DashMap<String, Mutex<DomainThrottle>>,
490 max_concurrent_per_domain: u32,
491 max_per_minute_per_domain: u32,
492}
493```
494
495**Sliding window implementation:**
496```rust
497let recent_count = self.request_times
498 .iter()
499 .filter(|t| now.duration_since(**t) < window)
500 .count();
501recent_count < self.max_per_minute as usize
502```
503
504#### Pros of Reusing
505
506- No new dependencies
507- Already proven to work in production
508- Team familiarity with the code
509- Consistent patterns across codebase
510
511#### Cons of Reusing
512
513- More maintenance burden
514- May not handle all edge cases
515- Less efficient than GCRA algorithm
516- Would need to be generalized for different use cases
517
518## Recommendations for ngit-grasp
519
520### 1. Rate Limiting Library Choice
521
522**Recommendation: Use `governor` crate**
523
524**Reasoning:**
525- Industry standard with proven track record
526- More efficient than our sliding window approach
527- Handles edge cases we might miss
528- Good async support for our Tokio-based architecture
529- Active maintenance and community support
530- Minimal overhead (atomic operations, lock-free)
531
532### 2. Default Philosophy
533
534**Recommendation: Conservative defaults with clear relaxation path**
535
536**Reasoning:**
537- Following khatru's approach: secure by default
538- Better to start restrictive and allow operators to relax
539- Prevents "configuration debt" where operators forget to harden
540- ngit-grasp is infrastructure software - security should be default
541- Clear documentation on how to adjust for different use cases
542
543### 3. Proposed Default Values
544
545Based on research and ngit-grasp's specific use case (git-over-nostr relay):
546
547```toml
548# Connection Limits
549NGIT_MAX_CONNECTIONS_GLOBAL = 1000
550NGIT_MAX_CONNECTIONS_PER_IP = 10
551NGIT_CONNECTION_RATE_PER_IP = "5/minute" # 5 connections per minute per IP
552
553# Subscription (REQ) Limits
554NGIT_MAX_SUBSCRIPTIONS_PER_CONNECTION = 20
555NGIT_MAX_FILTERS_PER_REQ = 100
556NGIT_SUBSCRIPTION_RATE_PER_IP = "30/minute" # 30 REQs per minute per IP
557
558# Event Ingestion Limits
559NGIT_EVENT_RATE_PER_IP = "10/minute" # 10 events per minute per IP
560NGIT_EVENT_RATE_BURST = 30 # Allow burst up to 30
561NGIT_MAX_EVENT_SIZE_BYTES = 131072 # 128 KB (matches nostr-rs-relay)
562NGIT_MAX_WEBSOCKET_MESSAGE_BYTES = 131072 # 128 KB
563
564# HTTP Endpoint Protection
565NGIT_HTTP_RATE_PER_IP = "60/minute" # 60 HTTP requests per minute per IP
566
567# Time-based Event Restrictions
568NGIT_REJECT_EVENTS_NEWER_THAN_SECONDS = 900 # 15 minutes (matches strfry)
569NGIT_REJECT_EVENTS_OLDER_THAN_SECONDS = 94608000 # ~3 years (matches strfry)
570
571# Whitelist
572NGIT_RATE_LIMIT_WHITELIST_IPS = "" # Comma-separated IPs exempt from rate limits
573```
574
575**Rationale for values:**
576- **Connections:** 10/IP is conservative but allows legitimate multi-client use
577- **Subscriptions:** 20/connection matches strfry, reasonable for typical clients
578- **Events:** 10/min is more permissive than khatru (2 per 3min) but still protective
579- **Message size:** 128 KB matches industry standard (nostr-rs-relay, strfry's WS message size)
580- **HTTP:** 60/min allows normal browsing without allowing scraping abuse
581
582### 4. Implementation Phases
583
584**Phase 1: Core DoS Prevention (High Priority)**
585- Connection limits (global and per-IP)
586- Basic event rate limiting (per-IP)
587- Message size limits
588- WebSocket message limits
589
590**Phase 2: Advanced Subscription Protection (Medium Priority)**
591- Subscription limits per connection
592- Filter complexity limits
593- Subscription rate limiting per IP
594
595**Phase 3: HTTP & Advanced Features (Lower Priority)**
596- HTTP endpoint rate limiting
597- IP whitelisting
598- Fine-grained metrics for rate limit hits
599- Configurable rejection messages
600
601### 5. Configuration Management
602
603Following AGENTS.md requirements, ALL configuration changes must update:
604
6051. **`src/config.rs`** - Add fields with proper env var names and defaults
6062. **`docs/reference/configuration.md`** - Document each option with examples
6073. **`nix/module.nix`** - Add NixOS options in `instanceOptions`
6084. **`.env.example`** - Add options with comments
609
610### 6. Metrics & Observability
611
612Add Prometheus metrics for:
613- `ngit_rate_limit_hits_total{limit_type, reason}` - Counter of rate limit hits
614- `ngit_connections_active` - Current active connections
615- `ngit_connections_per_ip` - Histogram of connections per IP
616- `ngit_subscriptions_active` - Current active subscriptions
617- `ngit_rate_limit_whitelisted_requests_total` - Requests from whitelisted IPs
618
619### 7. Testing Strategy
620
621- **Unit tests**: Test rate limiter logic in isolation
622- **Integration tests**: Use `TestRelay` to verify limits enforced
623- **Fuzz testing**: Random patterns to ensure no panics
624- **Load testing**: Verify performance under rate-limited load
625- **Metrics verification**: Ensure metrics accurately reflect limit hits
626
627## Common Attack Patterns
628
629Based on production relay operator experiences:
630
6311. **Connection flooding** - Open thousands of connections to exhaust file descriptors
6322. **Subscription spam** - Open many REQs per connection to consume memory
6333. **Event spam** - Submit events rapidly to overwhelm storage/processing
6344. **Large message attacks** - Send huge WebSocket frames to consume bandwidth
6355. **Complex filter DoS** - Submit filters with thousands of authors/kinds to slow queries
6366. **Slow read attack** - Connect but never read, filling write buffers
6377. **Time-based attacks** - Events with extreme timestamps to bypass caching
6388. **Metrics scraping** - Hammer `/metrics` endpoint to consume CPU
639
640All of these are addressed by the proposed implementation.
641
642## Open Questions
643
6441. **Should we implement per-pubkey rate limiting** (like khatru) in addition to per-IP?
645 - Useful for authenticated scenarios
646 - Requires NIP-42 AUTH support
647 - Could be Phase 4
648
6492. **Should ephemeral events have different limits?**
650 - strfry has special handling for ephemeral events
651 - Consider separate retention and rate limits
652
6533. **Should we support dynamic limit adjustment?**
654 - Allow hot-reloading of limits without restart
655 - Useful for responding to active attacks
656
6574. **How should we handle IPv6?**
658 - Rate limit by /64 or /128?
659 - Per-address might be too granular for IPv6
660
661## References
662
663- strfry repository: https://github.com/hoytech/strfry
664- strfry config: https://github.com/hoytech/strfry/blob/master/strfry.conf
665- nostr-rs-relay repository: https://git.sr.ht/~gheartsfield/nostr-rs-relay
666- khatru repository: https://github.com/fiatjaf/khatru
667- khatru policies: https://github.com/fiatjaf/khatru/tree/master/policies
668- governor crate: https://docs.rs/governor/
669- GCRA algorithm: https://en.wikipedia.org/wiki/Generic_cell_rate_algorithm