| Age | Commit message (Collapse) | Author |
|
When NGIT_MAX_CONNECTIONS is unset the relay imposes no connection cap,
deferring to OS fd limits and infrastructure controls. The option remains
available for operators who want an explicit ceiling.
|
|
purgatory
|
|
|
|
01-fetch-events.sh: nak buffers output when stdout is not a TTY, causing
it to hang silently in non-interactive SSH sessions. Wrap with 'script'
to provide a pseudo-TTY, then strip the injected carriage returns and
connection banner line from the output.
40-classify-actions.sh: bash 5.3 treats ${#assoc[@]} and array iteration
as unbound variable errors under set -u when arrays are empty. Replace
${#assoc[@]} with explicit counters and guard array iterations with
set +u/set -u.
|
|
|
|
Extends purgatory persistence to include announcement purgatory entries.
On graceful shutdown, non-soft-expired announcements are serialised to
purgatory-state.json alongside state/PR/expired events; on startup they
are restored, skipping any entry whose bare repo path no longer exists.
Updates purgatory-design.md to reflect that purgatory persists through
graceful shutdown and documents the new PurgatoryState disk format.
Adds create_announcement_event helper to purgatory_helpers and three new
integration tests in purgatory_persistence covering the full save/restore
cycle, missing-repo skip, and the combined roundtrip with all entry types.
|
|
Remove the pre-implementation planning docs (announcements-purgatory-design.md
and announcements-purgatory-implementation.md) now that the feature is built.
Update the three living docs to reflect what was actually implemented:
- purgatory-design.md: expanded to cover all three purgatory stores
(announcement, state, PR), including AnnouncementPurgatoryEntry structure,
two-phase soft expiry lifecycle, expiry extension triggers, promotion flow,
and updated integration points and file structure
- grasp-02-proactive-sync.md: added SyncLevel enum (Full/StateOnly) to
RepoSyncNeeds, documented the purgatory announcement sync timer as the
registration path for purgatory announcements, updated filter building
to describe build_sync_level_aware_filters() and StateOnly behaviour
- grasp-02-proactive-sync-purgatory-git-data.md: expanded to cover
announcement purgatory as a third entry type, added Timeline E showing
soft-expiry and revival, replaced the single expiry section with separate
hard-expiry (state/PR) and two-phase soft-expiry (announcements) sections
with full justification for the 24-hour extended retention window
|
|
now we have added announcement purgatory to the protocol spec
|
|
- Integrate sync-only-state-events decision (SyncLevel concept)
- Add authorization must check purgatory decision
- Add soft expiry design (delete repo, retain event for 24h)
- Add purgatory lifecycle diagram
- Create separate implementation details document
- Remove inline questions (now resolved)
|
|
|
|
Fixes race condition where user's push becomes no-op after state event
is applied between fetch and push. Now accepts these as successful
no-ops, matching Git's 'Everything up-to-date' behavior.
- Add early detection in get_state_authorization_for_specific_owner_repo
- Return success for all-noop pushes without requiring purgatory event
- Document behavior in inline-authorization.md
|
|
This merge includes critical bug fixes and comprehensive migration tooling
developed during the relay.ngit.dev migration effort.
Bug Fixes:
- Fix git protocol error handling to return HTTP 200 with ERR pkt-line
- Fix naughty list false positives and DNS failure identification
- Fix database query filters in load_existing_events (remove .since())
- Fix OID fetch tracking to distinguish 0 OIDs from successful fetches
- Fix purgatory event source tracking for filtered expiry logging
- Implement OID retry logic for 'not our ref' errors
Migration Tools & Documentation:
- Complete 5-phase migration analysis pipeline with orchestration script
- Phase 1: Event fetching from source relay
- Phase 2: Git sync verification
- Phase 3: Categorization and relay comparison
- Phase 4: Log extraction (parse failures, purgatory expiry)
- Phase 5: Action classification for migration decisions
- Comprehensive migration guide with lessons learned
- Troubleshooting guide for permission and corruption issues
Configuration:
- Add NGIT_LOG_LEVEL configuration option
- Update git throttle limits to 60/minute
- Improve logging throughout for better observability
|
|
Move migration guide and scripts to docs/archive/2026-01-relay-ngit-dev-migration/
with clear warnings that these are reference-only materials from a specific
migration context, not general-purpose tools.
These materials document the relay.ngit.dev migration from ngit-relay to
ngit-grasp in January 2026. The scripts were developed iteratively during
the migration and are specific to that context. They are preserved for:
- Historical reference
- Context for production fixes in this branch
- Inspiration for future migrations (not direct reuse)
The migration uncovered critical bugs now fixed in this branch:
- Git protocol error handling
- Naughty list false positives
- Purgatory event tracking
- Sync startup issues
- Configuration management
|
|
|
|
Add git ancestry comparison (22-compare-git-data.sh) to determine
commit relationships between prod and archive repos. Repos where
archive is ahead are now correctly classified as ready-for-migration
since ngit-grasp only accepts git data authorized by state events.
Previously, repos with different git data were flagged as needs-resync
even when archive had newer/better data than prod.
|
|
|
|
|
|
|
|
Remove --analysis-root flag and external data file dependencies. The script
now extracts repo/npub information directly from 'Added rejected announcement'
log entries (which include pubkey and identifier fields) and uses
`nak encode npub <hex-pubkey>` to convert hex pubkeys to npub format.
This simplification was enabled by the recent logging improvement that added
pubkey to the 'Added rejected announcement' log entries.
|
|
Add proper log level configuration following standard approach:
- CLI flag: --log-level <level>
- Environment variable: NGIT_LOG_LEVEL
- Default: info
- Supports simple levels (error, warn, info, debug, trace)
- Supports filter expressions (e.g., ngit_grasp=debug,actix_web=info)
Configuration is now consistent across all four sources:
1. src/config.rs - Config struct with log_level field
2. docs/reference/configuration.md - Full documentation
3. nix/module.nix - NixOS module with logLevel option
4. .env.example - Example configuration file
This replaces the previous RUST_LOG approach with proper integration
into the ngit-grasp configuration system, enabling trace logging from
CLI, environment variables, or NixOS configuration.
|
|
|
|
The new script implements the redesigned classification system with:
- Tier 1: No Action Required (complete in both, deleted, empty, archive-only)
- Tier 2: Action Required (complete in prod but missing/incomplete in archive)
- Tier 3: Manual Investigation (partial/no-match in prod, archive-only anomalies)
Produces cleaner output format with actionable categories and reasons.
|
|
Phase 4 (30-extract-parse-failures.sh) now enriches parse failures with
repo name and npub by looking up event_id in announcements.json. This is
critical because 'Invalid announcement' rejections only log event_id and
kind, not the repo name or npub.
Phase 5 (40-classify-actions.sh) was also fixed to extract columns 4 and 5
(repo|npub) instead of columns 1 and 2 (event_id|kind) from parse-failures.txt.
Without this fix, action-required.txt showed unusable output like:
000014b2... | 30617 | parse failure logged | fix event format...
Now it correctly shows:
scripts | npub1hs5244... | parse failure logged | fix event format...
The enrichment uses jq to build a lookup table from announcements.json and
optionally uses 'nak' to convert hex pubkeys to npub format.
|
|
Filter parse failures to only those for announcements that are in
production but missing from the archive. This eliminates noise from
rejections of events from other relays that don't affect migration.
Before: 223 parse failures (all rejections from all relays)
After: 18 parse failures (only for missing announcements)
The filter works by:
1. Reading missing announcements from comparison data
2. Extracting event IDs from production announcements JSON
3. Filtering parse failures to only matching event IDs
|
|
The script was counting the same invalid announcement twice because:
- Write policy logs use hex event IDs
- Builder logs use note1 (bech32) event IDs
- Deduplication only worked within each format
Fix: Only extract from write policy logs (hex IDs) to avoid the
format mismatch. Builder logs contain the same events, so we don't
lose any data.
Result: 446 entries → 223 unique invalid announcements (correct count)
|
|
Update parse failures script to also extract 'Invalid announcement'
rejections from logs. These are announcement events that failed
validation (e.g., multiple clone tags instead of single tag with
multiple values).
Changes:
- Search for 'Event rejected by write policy' pattern with 'Invalid announcement'
- Search for 'Rejected repository announcement' pattern from builder
- Extract event_id, kind, and reason from rejection logs
- Combine with [PARSE_FAIL] entries in output
- Deduplicate entries by event_id
- Update header to clarify both patterns are captured
- Update migration guide to document this
- Fix SIGPIPE handling in purgatory script (minor)
This captures the ~446 unique announcements rejected for NIP-34 format
violations (multiple clone tags), which were previously unexplained
in the migration analysis.
|
|
Make scripts fully automatic with no manual intervention needed.
Changes:
- Add --no-pager to journalctl commands in validate-service.sh
- Add service existence validation with helpful error messages
- Capture and report journalctl stderr for better error visibility
- Improve error handling without failing on empty logs
The main issue was missing --no-pager in validate-service.sh which
could cause scripts to hang when run non-interactively (e.g., via SSH).
Tested locally - scripts run without hanging and produce correct output.
|
|
Add validation to ensure Phase 4 scripts use ngit-grasp service
(with structured logging) instead of ngit-relay service.
Changes:
- Add validate-service.sh helper for reusable service validation
- Add validation to run-migration-analysis.sh before Phase 4
- Add validation to 30-extract-parse-failures.sh
- Add validation to 31-extract-purgatory-expiry.sh
- Update migration guide with clear warnings about service selection
- Expand troubleshooting for 'Phase 4 finds no logs' issue
- Emphasize lesson learned in relay.ngit.dev notes
This prevents the issue where Phase 4 was run against ngit-relay.service
and found no parse failures because structured logging only exists in
ngit-grasp services.
|
|
- Add Gotchas section with common issues: git installation, localhost-only
archive relays, non-standard git paths, service name variations, and
permission requirements
- Add relay.ngit.dev-specific migration notes with actual paths, service
names, and analysis results (315 repos need re-sync, 382 purgatory expired)
- Enhance Running the Analysis section with path discovery guidance
- Expand Troubleshooting section with solutions for git not found, archive
connection failures, and wrong git paths
- Add git --version to prerequisite checks
- Update examples to use realistic localhost archive URLs
|
|
- 10-check-git-sync.sh: Check for git before running
- run-migration-analysis.sh: Include git in prerequisite checks
- Fixes script failures when git is not installed
|
|
- Rename guide: migrate-ngit-relay-to-ngit-grasp.md → migrate-to-ngit-grasp.md
- Remove ngit-relay and relay.ngit.dev specific references
- Use generic terminology: source/target relay, current implementation
- Add Compatibility section explaining requirements
- Update examples to be implementation-agnostic
- Update script comments to reference GRASP relay (not ngit-relay)
- Update README.md to link to the new guide
Scripts already work with any GRASP implementation via parameters.
|
|
Transforms the guide from a technical reference into a practical
step-by-step guide with:
- Quick Start section at the top with copy-paste commands
- Prerequisites section with verification steps
- Migration Overview explaining the 3-stage process
- Running the Analysis section with all options documented
- Understanding Results section explaining output files
- Troubleshooting section for common issues
- Architecture section (moved from top) for those wanting details
- Next Steps section for post-analysis workflow
The guide now follows a practical flow: get started fast, understand
results, then dive into architecture details if needed.
|
|
Adds run-migration-analysis.sh that orchestrates all 5 phases of the
migration analysis with:
- Parameterized inputs for relay URLs, git paths, and service name
- Phase control (skip, only, from-phase options)
- Dry-run mode to preview execution
- Progress indicators and timing information
- Error handling with continue-on-error option
- Auto-detection of available features (git paths, journalctl)
- Summary display with results overview
|
|
- Combines all data sources from Phases 1-4
- Produces three actionable outputs: no-action, action-required, manual-investigation
- Generates comprehensive summary with recommendations
- Handles missing Phase 4 logs gracefully
- Classification logic for migration decision-making
|
|
- 30-extract-parse-failures.sh: Extracts parse failure events from logs
- 31-extract-purgatory-expiry.sh: Extracts purgatory expiry events from logs
- Both support time range filtering (--since, --until)
- Includes dry-run mode for testing
- Gracefully handles missing logs with dependency notes
- TSV output format for Phase 5 consumption
- Ready for when structured logging is implemented in ngit-grasp
|
|
- Compares state event refs to actual git data on disk
- Uses git show-ref to handle both loose and packed refs
- Outputs TSV format compatible with Phase 3 categorization
- Optional --categorize flag for inline categorization
- Includes progress indicators and ETA (~20 min runtime on VPS)
- Improved error handling and validation over original script
|
|
- 20-categorize.sh: Categorizes git sync status into 4 categories
- 21-compare-relays.sh: Compares prod vs archive to find gaps
- Updated how-to doc with detailed Phase 3 outputs and directory structure
- Tested with Jan 22 data: 231 complete in both, 276 complete in prod but missing from archive
|
|
- Fetches kind 30618 (state), 30617 (announcement), 5 (deletion) events
- Uses nak req --paginate for complete event retrieval
- Outputs JSONL format for downstream processing
- Includes error handling and timing information
|
|
|
|
|
|
Addresses the problem of empty bare repos misleading clients and sync
downloading refs to deleted repos. Key design points:
- Bare repo created immediately so git pushes can succeed
- Git data arrival triggers promotion to active status
- Expiry extended in two places: state event arrival and git auth
- Indexed by (pubkey, identifier) for correct uniqueness
- Handles replacement announcements and service changes
|
|
Enables relay operators to backup/archive specific GRASP servers by domain.
Includes configuration, validation, documentation, and integration tests.
|
|
Increases connection limit across all configuration sources:
- src/config.rs: default_value_t = 4096
- docs/reference/configuration.md: updated default and examples
- nix/module.nix: maxConnections default = 4096
- .env.example: updated default and comment
This allows the relay to handle more concurrent connections and reduces
the likelihood of connection exhaustion under normal load. The previous
limit of 2000 was too conservative for production deployments.
|
|
Add comprehensive documentation explaining the defensive features
implemented in ngit-grasp. The detailed analysis of other relay
implementations is now preserved in commit history (e3792b9).
|
|
- Make RateLimit explicit in relay builder (500 subs, 60 events/min)
- Add NGIT_MAX_CONNECTIONS config option (default: 500)
- Update all 4 config locations (src, nix, docs, .env.example)
- Fix documentation error: filter limit 5000→500
- Document Phase 2 deferral decision (per-IP enforcement)
Addresses primary DoS vector (connection exhaustion) with minimal code.
Per-IP rate limiting deferred until abuse detected in production.
Related: issue ff38 (git endpoint throttling - separate concern)
|
|
Comprehensive research on rate limiting and defensive features across major Nostr relay implementations. Documents:
- Current state of ngit-grasp defensive features
- Detailed analysis of strfry, nostr-rs-relay, and khatru
- Concrete defaults and configuration options from each
- Rust rate limiting ecosystem (governor crate)
- Recommendations for ngit-grasp implementation
- Proposed default values and implementation phases
|
|
- Update default bind address in src/config.rs to 127.0.0.1:7334
- Update all four critical config sources per AGENTS.md:
- src/config.rs (code default and tests)
- .env.example (development template)
- docs/reference/configuration.md (user documentation)
- nix/module.nix (NixOS deployment)
- Update all documentation examples and references:
- README.md (with note about phone keypad mnemonic)
- docs/how-to/*.md (deploy, prometheus-setup, test-compliance)
- docs/explanation/*.md (architecture, comparison)
- docs/learnings/grasp-audit.md
Port 7334 spells NGIT on a phone keypad, making it memorable and
project-specific.
All tests pass (336 lib tests + 51 integration tests).
|
|
|
|
Adds NGIT_EVENT_BLACKLIST option for blocking all events from specific npubs,
taking precedence over all other validation to enable comprehensive moderation
without affecting curation policy.
Key features:
- Simple npub-only format: <npub>,<npub>,...
- Checked FIRST before any other validation (including repository blacklist)
- Blocks ALL event types (announcements, state events, PRs, comments, etc.)
- Events never reach relay storage or purgatory
- Specific rejection reason for operator debugging
Implementation:
- Add EventBlacklistConfig struct with check() method
- Add NGIT_EVENT_BLACKLIST config option and event_blacklist_config() method
- Add config field to PolicyContext for policy access
- Add check_event_blacklist() to Nip34WritePolicy
- Check event blacklist first in admit_event() method (before any other validation)
- 4 new unit tests covering all blacklist behavior
Configuration synced across all four sources:
- src/config.rs: Core implementation with EventBlacklistConfig
- .env.example: Comprehensive documentation with examples
- docs/reference/configuration.md: Complete reference documentation
- nix/module.nix: NixOS module option with environment mapping
README updates:
- Add comprehensive "Curation & Moderation" section
- Document repository whitelists (GRASP-01 and GRASP-05 modes)
- Document repository and event blacklists with precedence order
- Add configuration table for all curation/moderation settings
- Provide real-world examples for different relay configurations
Testing:
- 4 new tests for event blacklist functionality
- All 336 library tests passing
- All 64 integration tests passing
- All 38 filter support tests passing
Verification:
- Repository blacklist confirmed to apply to sync (uses same admit_event flow)
- Sync events validated through process_event_static -> write_policy.admit_event
Use cases:
- Block spam/abusive users completely
- Prevent malicious actors from submitting any events
- Temporary blocks for investigation
- Moderation without affecting whitelist curation policy
|
|
Adds NGIT_REPOSITORY_BLACKLIST option for blocking repositories, taking precedence
over all whitelists (archive and repository) to enable moderation without affecting
curation policy.
Key features:
- Three blacklist formats: <npub>, <npub>/<identifier>, <identifier>
- Blacklist checked first before any other validation
- Overrides archive whitelist and repository whitelist
- Specific rejection reasons based on match type (npub/identifier/both)
- Not flagged in NIP-11 curation (operational, not policy)
Implementation:
- Add BlacklistConfig struct with check() method returning detailed reasons
- Add NGIT_REPOSITORY_BLACKLIST config option and blacklist_config() method
- Update validate_announcement() to check blacklist first with specific reasons
- 12 new unit tests covering all blacklist behavior and precedence
Configuration synced across all four sources:
- src/config.rs: Core implementation with BlacklistConfig
- .env.example: Comprehensive documentation with examples
- docs/reference/configuration.md: Complete reference documentation
- nix/module.nix: NixOS module option with environment mapping
Testing:
- 12 new tests for blacklist functionality (config + validation)
- All 332 library tests passing
- All 38 integration tests passing
Use cases:
- Block spam/malware repos by identifier
- Block abusive users by npub
- Block specific problematic repos by npub/identifier
- Temporary blocks for investigation
|
|
Adds NGIT_REPOSITORY_WHITELIST option for curated relay operation that
accepts only whitelisted repositories while maintaining GRASP-01 compliance
(announcements must list the service). This differs from archive whitelist
which enables GRASP-05 mode and doesn't require service listing.
Key features:
- Supports three whitelist formats: npub, npub/identifier, identifier
- Enforces mutual exclusivity with archive read-only mode
- Updates NIP-11 curation field when whitelist is enabled
- Maintains GRASP-01 compliance (doesn't add GRASP-05 support)
Configuration synced across all four sources: src/config.rs, docs/reference/configuration.md,
nix/module.nix, and .env.example as required by AGENTS.md.
|
|
Implements NGIT_ARCHIVE_READ_ONLY configuration option that defaults to true
when archive mode is enabled, allowing relays to operate as read-only syncs
of archived repositories.
Key changes:
- Add NGIT_ARCHIVE_READ_ONLY config option (defaults to true if archive enabled)
- NIP-11 advertises GRASP-05 support and includes curation field when read-only
- Validation logic rejects non-whitelisted repos in read-only mode
- Comprehensive tests for read-only behavior and defaults
- Full documentation in config reference, .env.example, and NixOS module
Read-only mode enables passive mirroring without being listed in announcements,
useful for backup/archive operations while preventing accidental write acceptance.
|
|
Implements GRASP-05 specification for accepting repository announcements
that don't list this relay, enabling archive, mirror, and backup use cases.
Core Features:
- Three whitelist formats: <npub>, <npub>/<identifier>, <identifier>
- Archive-all mode for complete ecosystem mirrors
- Fail-fast npub validation at startup
- Read-only enforcement (archived repos reject pushes)
- Full GRASP-02 sync (git data + Nostr events)
- Dynamic archive status (no flags/metadata)
Implementation:
- Add ArchiveWhitelistEntry enum with Pubkey/Repository/Identifier variants
- Add ArchiveConfig with validation and matching logic
- Update AnnouncementResult to include AcceptArchive variant
- Refactor validate_announcement() to return AnnouncementResult with archive check
- Update AnnouncementPolicy with catch-all pattern for cleaner code
- Wire archive config through builder and policy layers
Configuration:
- NGIT_ARCHIVE_ALL: Accept all announcements (⚠️ storage risk)
- NGIT_ARCHIVE_WHITELIST: Comma-separated whitelist entries
- Updated docs, .env.example, and nix/module.nix
Testing:
- 28 unit tests for config parsing and whitelist matching
- 7 integration tests for archive mode validation
- All 296 tests passing
Validation Priority:
1. Lists our service → Accept (GRASP-01, read/write)
2. Is maintainer → AcceptMaintainer (multi-maintainer, read/write)
3. Matches archive config → AcceptArchive (GRASP-05, read-only)
4. None of above → Reject
Security Considerations:
- Archive-all mode has storage/bandwidth DoS risk
- Identifier-only format matches any pubkey (use npub/identifier for high-value)
- Invalid npubs cause startup failure (fail-fast)
Documentation:
- Concise explanation focused on rationale
- Reference docs updated with all config options
- README updated to reflect completed feature
- Removed from roadmap, added to compliance section
See docs/explanation/grasp-05-archive.md for details.
|
|
|
|
Add mandatory uploadpack.allowFilter capability to support partial clones
and fetches as required by GRASP-01 specification. This enables efficient
git operations for bandwidth-constrained clients (e.g., browser-based git
clients like git-natural-api).
Changes:
- Add uploadpack.allowFilter=true to git subprocess configuration
- Update SmartGitServer test helper with filter support
- Add integration tests for filter capability advertisement and functionality
- Update documentation to reflect filter as required capability
Tests verify:
- Filter capability is advertised in info/refs
- Filtered clones with blob:none work correctly
- Filtered fetches with tree:0 work correctly
|
|
- Add new how-to guide covering hash updates for git dependencies
- Applies to any git dependency (e.g., nostr-sdk fork)
- Add critical note in AGENTS.md linking to this guide
- Emphasize that hash updates in both flake.nix and nix/module.nix are MANDATORY
|
|
- Complete guide for deploying ngit-grasp to NixOS servers
- Step-by-step deployment instructions
- Configuration options reference
- Troubleshooting section
- Security hardening recommendations
- Multiple instance examples
- References nix/example-configuration.nix which has clear examples
|
|
- Correct git protocol: ngit-grasp implements HTTP layer, not full git implementation
- Correct nostr relay: both use libraries (Khatru vs nostr-relay-builder)
- Highlight key difference: ngit-relay has NO nostr event sync (only git sync)
- Explain code size difference: mainly due to event sync (~5k lines) that ngit-relay lacks
- Update when-to-choose: ngit-grasp required for event discovery from relay network
|
|
The sync system uses a 5-second batch window for discovered repos.
Repos discovered late in a 30-second test don't have enough time for
the full Layer 2→3→4 cascade:
- Layer 1: Discover repo announcements (0-5s)
- Layer 2: Send #a, #A, #q filters for repos (5-30s)
- Layer 3: Receive issues, patches, PRs (30-60s)
- Layer 4: Receive comments on root events (40-60s)
Testing confirmed that 60 seconds allows late-discovered repos
(gitworkshop, ngit) to complete all layers, while 30 seconds only
allows 1 second after Layer 2 filters are sent.
Updated all references from 30s to 60s throughout the guide and added
explanation of why this duration is necessary.
|
|
- Update production sync testing workflow to save both sync-raw.log and sync.log
- Raw log contains complete untruncated messages (rejection reasons, event data, etc.)
- Sanitized log remains for quick scanning and pattern recognition
- Add guidance on when to use each log and how to retrieve full details from raw log
- Resolves truncated rejection warning issue by making full details accessible
|
|
Mode 1 (Fix Existing Issues) now requires reviewing the proposed fix
and asking for user permission before implementing changes. This ensures
users have visibility and control over what code changes are made.
Changes:
- Added Step 3: Review Proposed Fix and Get Permission
- Renumbered subsequent steps (4-7)
- Updated both overview and detailed workflow sections
- Updated workflow diagram to show review/permission steps
|
|
- Mode 1: Fix one existing issue, test, commit, report
- Mode 2: Discover new issues with minimal documentation
- Emphasize stopping after each cycle
- Remove detailed investigation requirements
- Simplify issue documentation format
|
|
- Update production-sync-testing.md to document issues as individual markdown files in work/active-issues/ instead of polluting the tracked how-to guide
- Add issue template and workflow for creating, viewing, and resolving issues
- Document active-issues/ purpose in work/README.md
- Prevents accidental commits of transient testing issues
- Makes issue management cleaner and more focused
|
|
- Fix shebang in sanitize-logs.sh from #!/bin/bash to #!/usr/bin/env bash for NixOS compatibility
- Update sanitizer defaults from 100/20 to 200/100 chars for better log readability
- Fix CLI argument names in guide: --sync-bootstrap-relay -> --sync-bootstrap-relay-url
- Fix CLI argument names in guide: --git-path -> --git-data-path
These issues were discovered during first-time testing of the production sync testing guide.
|
|
Add infrastructure for iterative debugging of sync against production data:
- scripts/sanitize-logs.sh: Truncates verbose log lines for LLM analysis
- docs/how-to/production-sync-testing.md: Step-by-step guide for testing
sync against real relays, identifying issues, and improving logging
|
|
- Add rejected events index to architecture.md with two-tier system explanation
- Document NGIT_REJECTED_HOT_CACHE_DURATION_SECS and NGIT_REJECTED_COLD_INDEX_EXPIRY_SECS in configuration.md
- Add comprehensive rejected events metrics section to monitoring.md with Grafana queries and alerts
- Explain negentropy integration with rejected index in grasp-02-proactive-sync.md
- Document state event authorization defense-in-depth and rejection tracking in inline-authorization.md
This integrates information from work/rejected-events-index-summary.md into the main documentation,
ensuring architecture docs accurately reflect the implemented rejected events index system.
|
|
Resolves naming conflict with RelayHealthState::Degraded by using a more
explicit name that clearly indicates the connection status relates to
historic sync failures, not connection health degradation.
Changes:
- ConnectionStatus::ConnectedDegraded → ConnectedHistoricSyncFailures
- Updated all documentation and comments
- Updated Prometheus metric descriptions
- Metric value remains 4 for backward compatibility
This makes it clear that:
- ConnectedHistoricSyncFailures = connection lifecycle (missing historic data)
- RelayHealthState::Degraded = connection health (reliability issues)
These are orthogonal concerns - a relay can be ConnectedHistoricSyncFailures
but Healthy, or Connected but Degraded.
|
|
- Add ConnectionStatus::ConnectedDegraded (status=4 in metrics)
- Track batch failures via PendingBatch.failed field
- Track relay-level failures via RelayState.historic_sync_had_failures
- Transition to ConnectedDegraded when any batch fails during historic sync
- Add is_live_sync_active() helper for cleaner match patterns
- Update state machine diagram with ConnectedDegraded transitions
- Update metrics docs with status=4 and example queries
Fixes issue where relays with failed negentropy retries would
incorrectly transition to Connected status despite missing data.
Now operators can distinguish 'fully synced' vs 'degraded (partial data)'.
|
|
- Add ConnectionStatus::Syncing state between Connecting and Connected
- Track historic_sync_completed and historic_sync_completed_at in RelayState
- Auto-detect sync completion via check_and_complete_historic_sync()
- Update metrics: ngit_sync_relay_connected now shows 0-3 (disconnected/connecting/syncing/connected)
- Update Prometheus metric documentation with new status values
- Add state machine diagram showing Syncing transition
- Operators can now distinguish 'connected but catching up' vs 'fully synced'
|
|
|
|
|
|
|
|
process_newly_available_git_data
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Replace EOSE-based sync completion with negentropy reconciliation for:
- Initial connect (fresh sync)
- Daily sync (Layer 1 announcements)
- Stale reconnect (>15 min)
Key changes:
- Add NegentropySyncResult struct with remote_only, local_only, received fields
- Add supports_negentropy() using try-and-fallback approach
- Add negentropy_sync_filter() using nostr-sdk client.sync() API
- Modify handle_connect_or_reconnect() to use negentropy for fresh/stale sync
- Modify daily_sync() to use negentropy for Layer 1
- Single-warning logging per relay when negentropy fails
Quick reconnects (<15 min) unchanged - still use REQ with since filter.
If negentropy unsupported, gracefully falls back to REQ+EOSE flow.
|
|
|
|
|
|
|
|
|
|
|
|
|