From 5fed2e2f32cfb15fff042a39f3ac82abe8948ca0 Mon Sep 17 00:00:00 2001 From: DanConwayDev Date: Fri, 9 Jan 2026 21:12:51 +0000 Subject: docs: integrate rejected events index into architecture documentation - Add rejected events index to architecture.md with two-tier system explanation - Document NGIT_REJECTED_HOT_CACHE_DURATION_SECS and NGIT_REJECTED_COLD_INDEX_EXPIRY_SECS in configuration.md - Add comprehensive rejected events metrics section to monitoring.md with Grafana queries and alerts - Explain negentropy integration with rejected index in grasp-02-proactive-sync.md - Document state event authorization defense-in-depth and rejection tracking in inline-authorization.md This integrates information from work/rejected-events-index-summary.md into the main documentation, ensuring architecture docs accurately reflect the implemented rejected events index system. --- docs/explanation/architecture.md | 64 +++++++++++++++++++ docs/explanation/grasp-02-proactive-sync.md | 35 +++++++++++ docs/explanation/inline-authorization.md | 93 ++++++++++++++++++++++++++++ docs/explanation/monitoring.md | 96 ++++++++++++++++++++++++++++- docs/reference/configuration.md | 64 +++++++++++++++++++ 5 files changed, 351 insertions(+), 1 deletion(-) (limited to 'docs') diff --git a/docs/explanation/architecture.md b/docs/explanation/architecture.md index 6da2295..d2a9bf7 100644 --- a/docs/explanation/architecture.md +++ b/docs/explanation/architecture.md @@ -239,6 +239,22 @@ pub struct RepositoryAnnouncement { ... } pub struct RepositoryState { ... } ``` +#### [`policy/state.rs`](src/nostr/policy/state.rs) - State Event Authorization + +State events undergo authorization checks at multiple points: + +```rust +/// State event authorization checks: +/// 1. Announcement must exist for the repository identifier +/// 2. Author must be in maintainer set of accepted announcement +/// 3. Validated on arrival, announcement acceptance, and git data arrival +``` + +**Defense-in-depth authorization:** +- **On arrival** (StatePolicy): Initial authorization check +- **On announcement acceptance**: Purgatory re-evaluation of waiting state events +- **On git data arrival**: Final authorization before database save + ### 5. Purgatory System ([`src/purgatory/`](../../src/purgatory/)) The purgatory system solves the "which arrives first?" problem where either nostr events or git pushes can arrive in any order. It provides an in-memory holding area for events and git data awaiting their counterparts. @@ -457,6 +473,7 @@ The ngit-grasp relay implements **Proactive Sync of Nostr Eevents**, which synch - **Health tracking** with exponential backoff for failing relays - **Daily sync** with random 23-25h timer to detect state drift - **Filter consolidation** when count exceeds 70 to prevent subscription explosion +- **Rejected events index** - prevents wasteful re-fetching during negentropy sync **Architecture:** @@ -479,6 +496,14 @@ The ngit-grasp relay implements **Proactive Sync of Nostr Eevents**, which synch │ │ Health Tracker │ Exponential backoff, dead detection │ │ │ (DashMap) │ │ │ └──────────────────┘ │ +│ ┌──────────────────────────────────────────────────────┐ │ +│ │ Rejected Events Index (Two-Tier) │ │ +│ │ ┌────────────────┐ ┌──────────────────────┐ │ │ +│ │ │ Hot Cache │───▶│ Cold Index │ │ │ +│ │ │ (2 min) │ │ (7 days) │ │ │ +│ │ │ Full events │ │ Metadata only │ │ │ +│ │ └────────────────┘ └──────────────────────┘ │ │ +│ └──────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` @@ -486,6 +511,45 @@ The ngit-grasp relay implements **Proactive Sync of Nostr Eevents**, which synch For full design details, see [grasp-02-proactive-sync-v4.md](grasp-02-proactive-sync-v4.md). +### Rejected Events Index + +The rejected events index solves two critical problems during sync: + +1. **Negentropy sync efficiency**: Prevents repeatedly downloading events that will be rejected again +2. **Race condition resolution**: Enables immediate re-processing when event dependencies are satisfied + +**Two-Tier Architecture:** + +| Tier | Duration | Storage | Purpose | +|------|----------|---------|---------| +| Hot Cache | 2 minutes | Full events | Immediate re-processing when dependencies arrive | +| Cold Index | 7 days | Metadata only | Prevent re-fetch during negentropy sync | + +**Event Flow:** + +``` +Event Rejected (e.g., maintainer before owner announcement) + │ + ├──▶ Store full event in Hot Cache (2 min expiry) + └──▶ Store metadata in Cold Index (7 day expiry) + +Dependency Arrives (e.g., owner announcement accepted) + │ + ├──▶ Invalidate from Cold Index + ├──▶ Retrieve from Hot Cache (if still available) + └──▶ Re-process immediately (<1 second vs 24 hours) + +Negentropy Sync + │ + └──▶ Exclude Cold Index IDs from "missing events" calculation +``` + +**Tracked Events:** +- Repository announcements (kind 30617) rejected for not listing this service or maintainer validation failure +- State events (kind 30618) rejected for missing announcements or unauthorized authors + +**Source Code:** [`src/sync/rejected_index.rs`](../../src/sync/rejected_index.rs) + ## Future Extensions ### GRASP-02: Proactive Sync diff --git a/docs/explanation/grasp-02-proactive-sync.md b/docs/explanation/grasp-02-proactive-sync.md index e983316..ed8fdbf 100644 --- a/docs/explanation/grasp-02-proactive-sync.md +++ b/docs/explanation/grasp-02-proactive-sync.md @@ -729,6 +729,41 @@ If negentropy fails (relay doesn't support NIP-77, network error, etc.): 2. The sync falls back to traditional REQ+EOSE 3. No error is raised - fallback is automatic +### Integration with Rejected Events Index + +The rejected events index prevents wasteful re-fetching during negentropy sync by excluding rejected event IDs from the reconciliation process: + +**During Negentropy Reconciliation:** + +1. **Build "already have" set**: Combine event IDs from: + - Events in database + - Events in purgatory + - **Events in rejected index (hot cache + cold index)** + +2. **Send to negentropy**: This combined set represents "events we already have or don't want" + +3. **Receive differences**: Relay only sends events we don't have and haven't rejected + +4. **Process received events**: New events go through normal validation: + - If accepted → saved to database + - If rejected → added to rejected index + - If waiting for dependencies → added to purgatory + +**Why This Matters:** + +Without the rejected events index, negentropy would repeatedly download events that don't list this service or are from unauthorized maintainers, wasting bandwidth on every sync cycle. + +**Re-Processing on Dependency Arrival:** + +When a dependency is satisfied (e.g., owner announcement accepted): +1. Related entries are **invalidated** (removed) from cold index +2. If event still in hot cache → immediate re-processing +3. If event expired from hot cache → will be re-fetched on next sync (now that dependency exists) + +This prevents permanently excluding events that could become valid after dependencies arrive. + +See [work/rejected-events-index-summary.md](../../work/rejected-events-index-summary.md) for complete implementation details. + --- ## REQ+EOSE Pagination diff --git a/docs/explanation/inline-authorization.md b/docs/explanation/inline-authorization.md index a71a217..7081f63 100644 --- a/docs/explanation/inline-authorization.md +++ b/docs/explanation/inline-authorization.md @@ -352,6 +352,99 @@ pub async fn authorize_push( - If no event found, create placeholder (git-data-first scenario) - Collect PR events from purgatory for post-push processing +## State Event Authorization + +State events (kind 30618) undergo authorization checks at three points (defense-in-depth): + +### 1. On Arrival (StatePolicy) + +When a state event arrives via WebSocket or sync: + +```rust +// src/nostr/policy/state.rs +impl StatePolicy { + async fn admit_event(&self, event: &Event) -> Result { + // Check 1: Does announcement exist for this repository? + let announcements = query_announcements(pubkey, identifier); + if announcements.is_empty() { + return Reject("No announcement exists for repository"); + } + + // Check 2: Is author in maintainer set? + let maintainers = build_maintainer_set(announcements); + if !maintainers.contains(&event.author) { + return Reject("Author not in maintainer set"); + } + + // If git data doesn't exist yet, goes to purgatory + // Otherwise, accepted to database + } +} +``` + +### 2. On Announcement Acceptance (Purgatory Re-evaluation) + +When a repository announcement is accepted, waiting state events are re-evaluated: + +```rust +// After announcement is saved to database +for state_event in purgatory.get_state_events(identifier) { + // Re-check authorization now that announcement exists + if author_in_maintainer_set(state_event.author, identifier) { + // If git data now exists, save to database + // Otherwise, keep in purgatory + } else { + // Remove from purgatory - not authorized + } +} +``` + +### 3. On Git Data Arrival (Purgatory Sync) + +When git data is pushed, purgatory state events are validated before saving: + +```rust +// src/git/handlers.rs - after successful git push +for state_event in purgatory.get_matching_state_events(identifier) { + // Final authorization check before database save + if author_in_maintainer_set(state_event.author, identifier) { + database.save(state_event); + purgatory.remove(state_event); + } else { + purgatory.remove(state_event); // Not authorized + } +} +``` + +### Why Three Checkpoints? + +**Defense-in-depth** ensures authorization is always validated: + +1. **On arrival**: Prevents unauthorized events from entering the system +2. **On announcement acceptance**: Handles race condition where state arrives before announcement +3. **On git data arrival**: Final check before committing to database + +This prevents scenarios where: +- Unauthorized state events are saved after maintainer changes +- Race conditions bypass authorization +- Purgatory holds events that will never be authorized + +### Rejection Tracking + +State events rejected during authorization are tracked in the rejected events index: + +- **Reason: MaintainerNotYetValid** - Author not in maintainer set (may become valid later) +- **Reason: Other** - Other validation failures + +When a repository announcement is accepted, rejected state events for that repository are: +1. **Invalidated** from cold index (removed from negentropy exclusion) +2. **Retrieved** from hot cache (if still available within 2 minutes) +3. **Re-processed** immediately with new maintainer set + +This enables rapid recovery from race conditions where state events arrive before maintainer announcements. + +See [work/rejected-events-index-summary.md](../../work/rejected-events-index-summary.md) for complete details on rejection tracking and re-processing. + --- ## Comparison with Reference Implementation diff --git a/docs/explanation/monitoring.md b/docs/explanation/monitoring.md index 7520813..bd652be 100644 --- a/docs/explanation/monitoring.md +++ b/docs/explanation/monitoring.md @@ -204,4 +204,98 @@ sum(ngit_sync_relay_status == 5) # RateLimited 4. **Restart behavior**: Conservative initial backoff (5s + jitter) avoids thundering herd on restart 5. **Historical data**: Prometheus retains health history; in-memory state only needs current status -See [GRASP-02 Proactive Sync](grasp-02-proactive-sync.md) for full architecture details. \ No newline at end of file +See [GRASP-02 Proactive Sync](grasp-02-proactive-sync.md) for full architecture details. + +## Rejected Events Index Metrics + +The rejected events index tracks rejected repository announcements and state events to prevent wasteful re-fetching during negentropy sync and enable race condition resolution. + +### Rejected Events Metrics + +All metrics are parameterized by `event_type` label with values "announcement" or "state": + +| Metric | Type | Labels | Description | +|--------|------|--------|-------------| +| `ngit_rejected_hot_cache_current` | Gauge | event_type | Current number of entries in hot cache | +| `ngit_rejected_cold_index_current` | Gauge | event_type | Current number of entries in cold index | +| `ngit_rejected_hot_cache_hits` | Counter | event_type | Events successfully retrieved from hot cache for re-processing | +| `ngit_rejected_hot_cache_misses` | Counter | event_type | Events expired from hot cache before dependency arrived | +| `ngit_rejected_hot_cache_expired` | Counter | event_type | Entries cleaned up from hot cache (2 min expiry) | +| `ngit_rejected_cold_index_expired` | Counter | event_type | Entries cleaned up from cold index (7 day expiry) | +| `ngit_rejected_invalidated` | Counter | event_type | Entries invalidated when dependency was satisfied | + +### Example Grafana Queries + +```promql +# Hot cache efficiency - how often we successfully re-process from cache +rate(ngit_rejected_hot_cache_hits_total[5m]) +/ (rate(ngit_rejected_hot_cache_hits_total[5m]) + rate(ngit_rejected_hot_cache_misses_total[5m])) + +# Current rejected events by type +ngit_rejected_hot_cache_current{event_type="announcement"} +ngit_rejected_hot_cache_current{event_type="state"} +ngit_rejected_cold_index_current{event_type="announcement"} +ngit_rejected_cold_index_current{event_type="state"} + +# Race condition resolution rate - invalidations indicate successful dependency arrival +rate(ngit_rejected_invalidated_total[5m]) + +# Cache hit ratio over time (higher is better, means dependencies arriving quickly) +sum(rate(ngit_rejected_hot_cache_hits_total[5m])) +/ sum(rate(ngit_rejected_hot_cache_hits_total[5m]) + rate(ngit_rejected_hot_cache_misses_total[5m])) +``` + +### Example Alerts + +```yaml +# Alert if hot cache hit rate is too low (suggests timing issues) +- alert: RejectedEventsCacheMissRate + expr: | + sum(rate(ngit_rejected_hot_cache_misses_total[5m])) + / sum(rate(ngit_rejected_hot_cache_hits_total[5m]) + rate(ngit_rejected_hot_cache_misses_total[5m])) + > 0.8 + for: 15m + labels: + severity: warning + annotations: + summary: "High rejected events cache miss rate ({{ $value | humanizePercentage }})" + description: "Most rejected events are expiring before dependencies arrive" + +# Alert if cold index growing too large +- alert: RejectedEventsColdIndexSize + expr: ngit_rejected_cold_index_current > 10000 + for: 1h + labels: + severity: info + annotations: + summary: "Rejected events cold index has {{ $value }} entries" + description: "Consider investigating why many events are being rejected" +``` + +### Two-Tier Architecture + +**Hot Cache (2 minutes):** +- Stores full event objects +- Enables immediate re-processing when dependencies arrive +- Cleaned up every 60 seconds +- Memory: ~200 KB typical, ~20 MB worst case + +**Cold Index (7 days):** +- Stores metadata only (event ID, pubkey, identifier, reason) +- Prevents re-downloading during negentropy sync +- Cleaned up daily +- Memory: ~1 MB typical + +### Use Cases + +**Race Condition Resolution:** +When a maintainer announcement arrives before the owner announcement: +1. Maintainer event rejected → hot cache + cold index +2. Owner announcement accepted → invalidate from cold index +3. If still in hot cache → immediate re-processing (<1 second) +4. If expired from hot cache → will be re-fetched on next sync + +**Negentropy Sync Efficiency:** +During sync, cold index IDs are excluded from "missing events" calculation, preventing wasteful re-download of events that will be rejected again. + +See [work/rejected-events-index-summary.md](../../work/rejected-events-index-summary.md) for complete implementation details. \ No newline at end of file diff --git a/docs/reference/configuration.md b/docs/reference/configuration.md index ece14af..bdd832f 100644 --- a/docs/reference/configuration.md +++ b/docs/reference/configuration.md @@ -434,6 +434,70 @@ NGIT_SYNC_RECONNECT_LOOKBACK_DAYS=7 --- +### Rejected Events Index Configuration + +These options configure the two-tier rejected events index that prevents wasteful re-fetching during sync and enables race condition resolution. + +#### `NGIT_REJECTED_HOT_CACHE_DURATION_SECS` + +**Description:** Duration in seconds to retain full events in hot cache for immediate re-processing +**Type:** Integer (seconds) +**Default:** `120` (2 minutes) +**Required:** No + +**Examples:** + +```bash +# Default: 2 minute hot cache +NGIT_REJECTED_HOT_CACHE_DURATION_SECS=120 + +# Shorter window (1 minute) +NGIT_REJECTED_HOT_CACHE_DURATION_SECS=60 + +# Longer window (5 minutes) +NGIT_REJECTED_HOT_CACHE_DURATION_SECS=300 +``` + +**Notes:** + +- Hot cache stores full event objects for immediate re-processing when dependencies arrive +- Events expire from hot cache after this duration and move to cold index +- Shorter durations reduce memory usage but may miss dependency arrivals +- Longer durations increase memory but improve race condition resolution +- Memory impact: ~200 KB typical, ~20 MB worst case + +--- + +#### `NGIT_REJECTED_COLD_INDEX_EXPIRY_SECS` + +**Description:** Duration in seconds to retain event metadata in cold index for negentropy sync exclusion +**Type:** Integer (seconds) +**Default:** `604800` (7 days) +**Required:** No + +**Examples:** + +```bash +# Default: 7 day cold index +NGIT_REJECTED_COLD_INDEX_EXPIRY_SECS=604800 + +# Shorter retention (3 days) +NGIT_REJECTED_COLD_INDEX_EXPIRY_SECS=259200 + +# Longer retention (14 days) +NGIT_REJECTED_COLD_INDEX_EXPIRY_SECS=1209600 +``` + +**Notes:** + +- Cold index stores only metadata (event ID, pubkey, identifier, rejection reason) +- Prevents re-downloading rejected events during negentropy sync +- Entries automatically cleaned up daily +- Longer durations prevent more wasteful re-fetching but use slightly more memory +- Memory impact: ~1 MB typical + +--- + ### Logging Configuration #### `RUST_LOG` -- cgit v1.2.3