| Age | Commit message (Collapse) | Author |
|
Extends cleanup-empty-repos with a second scan direction (filesystem → DB).
Bare git repos under the git data path that have no corresponding 30617
announcement event are identified as orphans and cleaned up.
Empty orphans are always removed. Non-empty orphans are flagged in the
report but only deleted when --purge-orphans is also passed, preventing
accidental data loss.
|
|
--git-dir must precede the subcommand; passing it after for-each-ref
caused git to ignore it and check the CWD instead, making every repo
appear empty.
|
|
git repos
Adds a maintenance subcommand that scans the LMDB database for kind 30617
(repository announcement) events whose bare git repo on disk is empty or
missing, then removes both the 30617 and any matching 30618 (state) events.
A relay should not serve announcement or state events for a repository with
no git data. This was needed to clean up repos leaked by the bug fixed in
2161e3c, and is useful as an ongoing maintenance tool.
Usage (dry-run by default, stop relay before --execute):
ngit-grasp cleanup-empty-repos [--relay-data-path <path>] [--git-data-path <path>] [--execute]
The relay itself is now invoked as an implicit 'serve' subcommand, preserving
full backward compatibility with existing deployments and env-var configuration.
|
|
When a replacement 30617 announcement arrived for an entry already in
purgatory (e.g. the same event fetched from a second relay during sync,
or a user re-submitting a slightly updated announcement), the policy
returned Accept instead of AcceptPurgatory. This caused the event to be
saved to the database immediately, bypassing the purgatory gate, without
the corresponding git data or state events ever arriving.
Fix: return AcceptPurgatory when replacing a purgatory entry so the
updated event stays in purgatory until git data arrives. The purgatory
entry is still updated with the newer event via replace_purgatory_announcement
before the return.
|
|
NIP-01 places no restriction on d tag characters and NIP-34 only
recommends kebab-case without mandating it. Rejecting identifiers with
whitespace or other URL-unsafe characters was therefore overly strict.
The correct approach (per NIP-34 PR #2312 and GRASP-01) is to store
identifiers verbatim on disk and percent-encode them when constructing
URLs. The previous commit already handled the incoming direction
(percent-decoding URL paths before filesystem lookup); this commit
handles the outgoing direction and removes the validation restriction.
Changes:
- validate_identifier: drop whitespace rejection; only reject chars
that are unsafe as filesystem directory names (/, \, null, . / ..)
- git/mod.rs: add percent_encode() alongside percent_decode()
- landing.rs: percent-encode identifier in nostr:// clone URL and
gitworkshop link (also fixes a pre-existing bug where the clone URL
displayed literal '{npub}' / '{identifier}' instead of the values)
|
|
Two bugs allowed a repository announcement with a space-containing
identifier ('kuboslopp by Shakespeare') to enter purgatory and create
a bare repo on disk, but then fail to serve git data over HTTP.
Bug 1 (serving): parse_git_url and parse_repo_url did not percent-decode
the URL path before resolving the filesystem path. A client requesting
/npub.../kuboslopp%20by%20Shakespeare.git/info/refs had the identifier
extracted as 'kuboslopp%20by%20Shakespeare' (literal %20), which did not
match the on-disk directory 'kuboslopp by Shakespeare.git'.
Fix: add percent_decode() in src/git/mod.rs and apply it to the repo
component in both parse_git_url and parse_repo_url.
Bug 2 (validation): validate_announcement did not check that the
identifier is safe as a filesystem path component and URL segment.
Identifiers containing whitespace, path separators, null bytes, or
reserved names (. / ..) should be rejected at acceptance time.
Fix: add validate_identifier() in src/nostr/events.rs and call it from
validate_announcement before any other policy checks.
|
|
When NGIT_MAX_CONNECTIONS is unset the relay imposes no connection cap,
deferring to OS fd limits and infrastructure controls. The option remains
available for operators who want an explicit ceiling.
|
|
Fix pre-existing clippy lints:
- &PathBuf -> &Path in audit_cleanup.rs
- too_many_arguments on process_newly_available_git_data,
process_purgatory_announcements, and HttpService::new
- clone_on_copy for PublicKey (Copy type) in purgatory cleanup loop
|
|
State events (kind 30618) can include refs/tags/<name>^{} entries which
are git's notation for the dereferenced commit behind an annotated tag.
These are not real git refs and are never sent as part of a push.
extract_refs_from_state and RepositoryState::from_event were treating
them as real refs, causing can_satisfy_state to reject valid annotated
tag pushes: the would-be state after the push lacked the spurious ^{}
entry, so the exact-equality check always failed.
|
|
Previously push auth failures returned HTTP 403 which git clients
display as a generic transport error. Now they return HTTP 200 with
an ERR pkt-line containing the rejection reason (e.g. 'authorisation
failed: No state events in purgatory'), which git displays directly.
Remove GitError::Unauthorized as it is no longer used. GitError
variants now represent only transport/infrastructure failures; app-level
rejections use ERR pkt-line responses.
|
|
|
|
Spawns a tokio task that runs every 30 minutes and removes all events
tagged 'grasp-audit-test-event' older than 2 hours from the LMDB
database, along with their associated bare git repositories on disk.
|
|
collect_all_authorized_maintainers
Both were pub functions with no callers. Clippy doesn't flag dead pub
items because the compiler treats them as potentially used by external
crates - only private items trigger the dead_code lint.
|
|
fetch_repository_data_{excluding,with}_purgatory
The old name was ambiguous - it wasn't clear whether purgatory was
included or not. The two variants are now explicitly named:
- fetch_repository_data_excluding_purgatory: DB only
- fetch_repository_data_with_purgatory: DB + purgatory overlay
SyncContext trait method also renamed to fetch_repository_data_with_purgatory
to match the free function it delegates to.
|
|
|
|
state event copy
When git data is fetched into owner A's repo and a state event for owner B
is released from purgatory (copying OIDs from A's repo to B's repo via
process_state_with_git_data), owner B's purgatory announcement was never
promoted. process_purgatory_announcements only promotes the announcement
for the owner derived from source_repo_path (owner A), so owner B's
announcement stayed in purgatory with its 30-minute expiry timer running.
30 minutes later the cleanup task would soft-expire owner B's entry,
deleting the bare repository even though the announcement had been
effectively satisfied.
Fix: after a state event is successfully saved to the database, iterate
over all announcements in db_repo_data and promote any purgatory
announcement for owners whose repos received OIDs via the copy (i.e.
repos other than source_repo_path).
|
|
|
|
Extends purgatory persistence to include announcement purgatory entries.
On graceful shutdown, non-soft-expired announcements are serialised to
purgatory-state.json alongside state/PR/expired events; on startup they
are restored, skipping any entry whose bare repo path no longer exists.
Updates purgatory-design.md to reflect that purgatory persists through
graceful shutdown and documents the new PurgatoryState disk format.
Adds create_announcement_event helper to purgatory_helpers and three new
integration tests in purgatory_persistence covering the full save/restore
cycle, missing-repo skip, and the combined roundtrip with all entry types.
|
|
Kind 5 deletion events referencing a PR or PR-update event by e-tag now
remove the matching purgatory entry, provided the deletion author matches
the PR event author. Placeholders (git data arrived before the event) are
not removed since they have no author to verify against.
PR purgatory is keyed by event ID hex so this is an O(1) lookup, checked
before the O(n) announcement and state event scans.
|
|
The previous tests deleted purgatory announcements (kind 30617) and checked
for bare-repo absence via git ls-remote, which would corrupt shared-mode
test state by destroying repos other tests depend on.
New approach tests deletion of purgatory state events (kind 30618) instead:
- e-tag test: promotes a repo, creates a unique commit locally, submits a
state event pointing to it (enters purgatory), deletes the state event by
event ID, then verifies git push of that commit is rejected.
- a-tag coordinate test: promotes a repo, generates a fresh maintainer
keypair, sends a replacement announcement adding that maintainer, submits
a state event signed by the new maintainer (enters purgatory), deletes by
coordinate 30618:<new_maintainer_pubkey>:<identifier>, then verifies git
push is rejected.
Also extends DeletionPolicy to handle kind 30618 state events in purgatory
for both e-tag (event ID) and a-tag (coordinate) deletion paths.
|
|
Kind 5 deletion events signed by the announcement author now evict the
corresponding purgatory entry and delete the bare repository from disk.
Both NIP-09 reference styles are supported:
- e tag (event ID): matches the purgatory entry whose event ID equals the tag value
- a tag (coordinate 30617:<pubkey>:<identifier>): matches by coordinate, only
removes entries with created_at <= deletion event created_at per NIP-09 spec
Author-only enforcement: coordinate pubkey and e-tag owner must match the
deletion event pubkey; third-party deletion attempts are silently ignored.
Includes 6 unit tests and 2 integration tests (event ID and coordinate paths).
|
|
If remove_dir_all fails, leave the entry untouched so the next cleanup
cycle retries the deletion automatically. Previously a failed deletion
would still set soft_expired=true and extend the expiry, meaning the
bare repo would never be retried.
|
|
Per design doc decision #4: when git auth finds a matching state event
in purgatory that authorizes a push, extend the announcement's expiry.
The repo is actively receiving git data so the announcement should not
expire prematurely. Also triggers revival of soft-expired announcements.
|
|
Per design doc decision #4: state event arrival resets the 30-minute
protocol timer for purgatory announcements. This prevents premature
expiry during slow sync operations where the repo is actively receiving
metadata but git data hasn't arrived yet.
Extends expiry for all owners whose announcement authorized the state
event, and triggers revival if the announcement was soft-expired.
|
|
Two-phase expiry for announcement purgatory entries:
- Phase 1 (initial 30min timeout): delete bare repo, set soft_expired=true,
extend expiry by 24h so the event is retained for potential revival
- Phase 2 (24h extended timeout): fully remove from purgatory
Revival: extend_announcement_expiry() now recreates the bare git repo
when called on a soft-expired entry (triggered by state event or git auth),
clearing soft_expired and resetting the expiry window.
|
|
Remove the redundant inline kind-30617 registration block from the sync
event loop and the three is_generic/recompute_new_sync_filters_for_relay
calls from confirm_batch error paths. The purgatory announcement sync
timer (run_purgatory_announcement_sync) is now the sole registration path.
Consolidate NGIT_SYNC_BATCH_WINDOW_MS and NGIT_PURGATORY_SYNC_INTERVAL_MS
into a single NGIT_TEST=1 flag that sets both timers to 200ms, replacing
two ad-hoc env vars with one reusable test-mode flag.
|
|
When a state event arrives and the required commits already exist in
another maintainer's repo on the same relay, process_state_with_git_data
copies the OIDs across and aligns refs — but never called
process_purgatory_announcements for the target repos. Any announcement
waiting in purgatory for that repo stayed there indefinitely.
Fix: after process_state_with_git_data, call process_newly_available_git_data
for each target repo (those that received copied OIDs) so purgatory
announcements are promoted immediately.
|
|
When an owner announcement is promoted from purgatory via a git push,
any maintainer announcements sitting in the rejected_events_index hot
cache were never re-processed. The invalidate_and_get call only existed
in SyncManager::process_event_static (the nostr sync path); the git push
promotion path (http -> handlers -> git::sync) had no access to the
rejected_events_index at all.
Thread rejected_events_index and write_policy through the git push path:
- process_purgatory_announcements: after saving the promoted announcement,
parse its maintainers tag and call invalidate_and_get() for each, then
re-process any returned hot-cache events via admit_event + save
- process_newly_available_git_data: accept optional write_policy and
rejected_events_index, pass them through to process_purgatory_announcements
- handle_receive_pack: accept Arc<Nip34WritePolicy> and
Arc<RejectedEventsIndex>, pass them to process_newly_available_git_data
- HttpService / run_server: carry the two new fields, clone into each
handle_receive_pack call
- main.rs: obtain rejected_events_index from sync_manager before moving
it into its task; wrap write_policy in Arc for the HTTP server
- RealSyncContext::process_newly_available_git_data: pass None for both
new params (purgatory sync path already handles this via
SyncManager::process_event_static)
Also rewrite the maintainer_reprocessing integration tests to correctly
exercise the hot-cache path now that announcements require git data
before being released from purgatory:
- Start relay_b with relay_a as bootstrap so its SyncManager syncs
maintainer announcements via negentropy before the owner git push
- Use push_unique_git_data_to_relay (new helper) to give each maintainer
a distinct commit hash, preventing git from skipping pack transfer
- Make wait_for_event_on_relay poll in a retry loop so transient timing
gaps between DB write and query do not cause false negatives
|
|
Instead of threading repo_sync_index through PolicyContext/builder.rs/main.rs
to handle user-submitted purgatory announcements, add a simple background
timer (run_purgatory_announcement_sync, every 5s) that scans the purgatory
for announcement entries and registers them in repo_sync_index as StateOnly.
This is simpler and covers both flows:
- Sync-path announcements: inline registration still happens during event
processing (sync/mod.rs:1839+), timer provides a safety net
- User-submitted announcements: SelfSubscriber never sees them (rejected
from DB), timer is the primary registration path
The timer calls sync_purgatory_announcements_to_index() which:
1. Snapshots purgatory via new announcements_for_sync() public method
2. Or_inserts StateOnly entries (never downgrades Full entries)
3. Detects newly added relay URLs and calls handle_new_sync_filters to
connect and subscribe - fixing the failing test that expected relay
discovery from a user-submitted purgatory announcement
Removes: repo_sync_index field from PolicyContext, set/get_repo_sync_index
methods, set_repo_sync_index on Nip34WritePolicy, wiring in main.rs, and
the inline AcceptPurgatory registration block in builder.rs.
|
|
negentropy fallback
Three targeted fixes for purgatory announcement sync:
1. SelfSubscriber sync_level upgrade: After or_insert_with in process_batch,
always set entry.sync_level = SyncLevel::Full so that when a promoted
announcement is broadcast via notify_event and SelfSubscriber receives it,
an existing StateOnly entry gets upgraded to Full and PR event subscriptions
are triggered immediately (not delayed up to 24h).
2. Negentropy fallback filter split: In handle_eose, when falling back from
negentropy to REQ+EOSE, split batch_repos by SyncLevel and call
build_sync_level_aware_filters instead of build_layer2_and_layer3_filters.
Prevents StateOnly (purgatory) repos from getting Layer 2 #a/#A/#q filters
prematurely, which caused nostr-sdk client deduplication to permanently
drop PR events after orphan rejection.
3. Recompute sync filters after announcement batch EOSE: Add
recompute_new_sync_filters_for_relay calls at all three batch-completion
paths in handle_eose for generic filter (announcement) batches. This
triggers state-only subscriptions for any purgatory repos registered during
that batch, fixing the 24h delay before state event sync starts.
4. User-submitted purgatory announcements: Add repo_sync_index field to
PolicyContext with setter/getter, wire in main.rs after SyncManager
creation, and register in AcceptPurgatory handler so user-submitted
announcements get StateOnly sync started immediately.
5. Update archive tests: test_archive_without_state_events_does_not_sync_git
updated to reflect that StateOnly subscription now proactively fetches
state events from source relays. test_archive_read_only_creates_bare_repo
un-ignored as it now works end-to-end.
|
|
premature PR event delivery"
This reverts commit 806936e7d1aab5dfd0c2ad6b98a115122dc1785c.
|
|
after announcement promotion"
This reverts commit d76003b629a4a03dba23a8a1c41da6e4ac4c30cf.
|
|
announcement promotion
When git data arrives for a purgatory announcement and promotes it to the
database, the relay now:
1. Upgrades the announcement's sync level in RepoSyncIndex from StateOnly
to Full (git/sync.rs: process_purgatory_announcements)
2. Sends AddFilters actions to SyncManager for all connected relays, using
Full sync filters (Layer 2 #a/#A/#q) to subscribe to PR events
(purgatory/sync/context.rs: RealSyncContext.process_newly_available_git_data)
3. For user-submitted purgatory announcements, registers the repo in
RepoSyncIndex with StateOnly level and sends AddFilters to SyncManager
so it discovers and connects to relays listed in the announcement tags
(nostr/builder.rs: handle_announcement AcceptPurgatory path)
The RealSyncContext now accepts optional repo_sync_index and sync_action_tx
parameters. main.rs wires these up from SyncManager. PolicyContext gains
repo_sync_index and sync_action_tx fields for the write policy path.
|
|
premature PR event delivery
StateOnly repos in a pending batch had their repo IDs included in the
negentropy REQ+EOSE fallback, which called build_layer2_and_layer3_filters.
This generated #a/#A/#q tag filters for repos whose announcements were
still in purgatory (not yet promoted to the database).
When the remote relay responded with PR events matching those filters,
the write policy correctly rejected them as 'orphan' (no accepted repo
in DB yet). However, nostr-sdk's client-level deduplication then silently
dropped the same event on all subsequent deliveries, making it permanently
unavailable even after the announcement was promoted.
Fix: split batch_repos into full vs state-only by consulting repo_sync_index
at fallback time, then call build_sync_level_aware_filters which only
generates #a/#A/#q filters for Full repos. StateOnly repos only get
the kind 30618 + #d filter they were originally subscribed with.
|
|
purgatory
remove_purgatory_announcement() was unconditionally wiping all state
events for an identifier when one owner's announcement was evicted.
State events are keyed by identifier alone, so this incorrectly
discarded state events belonging to a different owner's repository
sharing the same identifier string. Now only removes state events if
no other owner's announcement remains in purgatory for that identifier.
|
|
An older rejected announcement (e.g. a relay replay of a superseded
event) was incorrectly evicting a newer purgatory entry for the same
pubkey+identifier. Now only evict when the incoming event's created_at
is strictly greater than the stored entry's created_at.
|
|
The sync loop calls fetch_repository_data() to get clone URLs so it knows
where to fetch git data from. Previously this only queried the database,
which means an announcement still in purgatory (no git data yet) would
return no clone URLs, so the sync loop could never fetch the git data
needed to promote the announcement - a circular deadlock.
Fix by switching to fetch_repository_data_with_purgatory() which combines
database announcements with purgatory announcements. Update the trait
method's doc comment to document this behaviour.
The mock implementation in tests is unaffected since it returns
pre-configured data rather than delegating to either function.
|
|
Previously, has_active_announcement() only queried the database, so when
a newer announcement arrived for the same (pubkey, identifier) while the
original was still in purgatory, it was incorrectly routed as a brand-new
announcement (AcceptPurgatory) rather than replacing the existing entry.
This change splits the logic into two cases:
- If the existing entry is in the database: return Accept (replacement) as before
- If the existing entry is only in purgatory: replace the purgatory entry via
add_announcement() (which overwrites by key) and extend expiries for both the
announcement and any waiting state events, then return Accept
- If the owner sends a Reject-classified announcement (service removed) but has
a purgatory entry: clear the purgatory entry, delete the bare repo, and remove
any waiting state events before rejecting
Also add an explicit comment to find_accepted_repository() in related.rs
clarifying that it intentionally only checks the database. Related events
should only be accepted after the repository announcement has been promoted
(validated via git data) - this is correct behaviour, not a missing check.
|
|
is_maintainer_in_any_announcement only queried the database, missing
announcements still in purgatory. A maintainer's announcement (which
lists the recursive maintainer) may arrive and enter purgatory before
the recursive maintainer's announcement does, causing the maintainer
exception check to return false and reject the recursive maintainer's
announcement.
|
|
Add comments explaining that PR event processing (both incoming and
purgatory) should only use database announcements, not purgatory ones.
This is intentional because:
- Incoming PR events should only be accepted for validated announcements
- Purgatory PR events should only be released when announcement is promoted
- This prevents accepting PR events for announcements that fail validation
Differs from state event processing which uses fetch_repository_data_with_purgatory
because state events check authorization without releasing from purgatory.
|
|
When processing state events from purgatory, we need to check
authorization against announcements that may still be in purgatory
(not yet promoted to the database).
Previously, process_purgatory_state_events() used fetch_repository_data()
which only queries the database. This caused authorization failures when:
1. Git data arrives
2. Announcement is promoted from purgatory to database
3. State events are processed from purgatory
4. But db_repo_data was fetched BEFORE the announcement promotion
Now uses fetch_repository_data_with_purgatory() to include both
database and purgatory announcements, ensuring authorization works
correctly regardless of promotion timing.
|
|
Purgatory announcements need state events (kind 30618) synced from
external relays, but not full L2/L3 events (patches, issues, PRs)
which would be rejected anyway. This implements the SyncLevel concept
from the design doc (decision #6):
- Add SyncLevel enum (Full vs StateOnly) to RepoSyncNeeds
- When announcement enters purgatory during sync, register in
RepoSyncIndex with SyncLevel::StateOnly
- Add build_sync_level_aware_filters() that partitions repos by level:
StateOnly repos only get state event filters (kind 30618)
- Update derive_relay_targets to track state_only_repos separately
- Update compute_actions to handle both repo sets
- SelfSubscriber always uses SyncLevel::Full (promoted repos)
|
|
The partial fix treating ProcessResult::Purgatory as confirmed in
pending_sync_index would trigger full L2/L3 sync for purgatory
announcements. Per design (decision #6), purgatory announcements
should only sync state events via SyncLevel::StateOnly (not yet
implemented).
Ignore test_archive_read_only_creates_bare_repo until SyncLevel
is implemented in Phase 3.
|
|
Route new announcements to purgatory instead of accepting immediately.
Announcements are promoted to the database when git data arrives,
ensuring we only serve announcements for repos with actual content.
Implemented:
- AnnouncementPurgatoryEntry type and DashMap store
- Route new announcements to purgatory (replacement announcements skip)
- Promote announcements on git data arrival (process_purgatory_announcements)
- Authorization checks purgatory announcements (fetch_repository_data_with_purgatory)
- State policy uses purgatory announcements for maintainer validation
- Cleanup task handles announcement expiry
- Updated count()/cleanup() to 3-tuples
Known broken:
- test_archive_read_only_creates_bare_repo fails: sync module does not
treat purgatory announcements as confirmed repos, so per-repo sync
(state events, PRs) is never triggered for purgatory announcements
- Announcement persistence (save/restore) not implemented
- SyncLevel (StateOnly vs Full) not implemented
- Soft expiry two-phase not implemented
- Expiry extension on state event / git auth not wired up
|
|
- Derive Default for config structs instead of manual impl
- Fix doc comment formatting in ArchiveConfig::matches
- Collapse nested if statement in validate_announcement
- Allow too_many_arguments for SyncManager::new
|
|
Listen for both SIGINT (Ctrl+C) and SIGTERM (systemd) signals to ensure
graceful shutdown cleanup runs when stopping the service via systemd.
Previously, only SIGINT was handled, causing purgatory state and rejected
events cache to be lost on every systemd restart. Now both signals trigger
the cleanup code that saves state files and removes placeholder refs.
Fixes issue 0f73
|
|
|
|
Previously, some IO errors in git handlers were logged while others were
not, leading to inconsistent observability. Additionally, the HTTP layer
logged all git errors redundantly, adding no useful context beyond what
was already logged at the source.
Changes:
- Add error logging to all previously unlogged IO operations in
handle_upload_pack and handle_receive_pack (stdin writes, stdout/stderr
reads, process waits)
- Remove redundant error logging at HTTP layer since all errors are now
logged at their source with full context
- Ensures consistent error-level logging for all git subprocess failures
This provides complete observability of git operations while eliminating
duplicate log entries that don't add value.
|
|
Only the final summary 'Aligned repository with state' remains at INFO level,
showing the total count of refs_created/refs_updated/refs_deleted.
|
|
Improves observability when pushes are rejected due to state events that
only partially match the pushed refs. Previously, logs only showed 'No
state event found' even when state events existed but didn't match.
Changes:
- Add diagnose_state_mismatch() to explain why state events don't match
- Log specific reasons: missing refs, wrong SHAs, or extra refs
- Update rejection message to 'No matching state event found' (more accurate)
- Add 4 unit tests for diagnostic function
Example diagnostic output:
WARN State event abc123 from authorized author doesn't match push:
refs/heads/main missing (state declares 9cc3d93b)
This addresses the issue where a push with only refs/heads/test was
rejected because the state event also declared refs/heads/main, but
logs didn't explain why the match failed.
|
|
Fixes race condition where user's push becomes no-op after state event
is applied between fetch and push. Now accepts these as successful
no-ops, matching Git's 'Everything up-to-date' behavior.
- Add early detection in get_state_authorization_for_specific_owner_repo
- Return success for all-noop pushes without requiring purgatory event
- Document behavior in inline-authorization.md
|
|
This merge includes critical bug fixes and comprehensive migration tooling
developed during the relay.ngit.dev migration effort.
Bug Fixes:
- Fix git protocol error handling to return HTTP 200 with ERR pkt-line
- Fix naughty list false positives and DNS failure identification
- Fix database query filters in load_existing_events (remove .since())
- Fix OID fetch tracking to distinguish 0 OIDs from successful fetches
- Fix purgatory event source tracking for filtered expiry logging
- Implement OID retry logic for 'not our ref' errors
Migration Tools & Documentation:
- Complete 5-phase migration analysis pipeline with orchestration script
- Phase 1: Event fetching from source relay
- Phase 2: Git sync verification
- Phase 3: Categorization and relay comparison
- Phase 4: Log extraction (parse failures, purgatory expiry)
- Phase 5: Action classification for migration decisions
- Comprehensive migration guide with lessons learned
- Troubleshooting guide for permission and corruption issues
Configuration:
- Add NGIT_LOG_LEVEL configuration option
- Update git throttle limits to 60/minute
- Improve logging throughout for better observability
|
|
Add EventSource enum (Direct/Sync) to purgatory entries to distinguish
between user-submitted events and sync-fetched events. This enables:
- WARN-level logging for direct submissions that expire (user should know)
- DEBUG-level logging for sync-fetched expirations (expected behavior)
- Source upgrade from Sync→Direct if user submits after sync
- Expiry timer reset on source upgrade (fresh 30-min window for user)
The source is included in [PURGATORY_EXPIRED] logs as source=direct or
source=sync for easy filtering.
|
|
Previously, sync_identifier_from_url passed all needed OIDs to
process_newly_available_git_data, not just the OIDs that were
successfully fetched. This caused incorrect logging (new_oids_count
would show all needed OIDs, not just fetched ones).
While this didn't break functionality (the actual processing uses
can_apply_state which checks the repository on disk), it made
debugging confusing.
Changes:
- Rename oids_fetched to fetched_oids and change type from usize to Vec<String>
- Return Vec<String> from match arms instead of counts
- Pass fetched_oids (not needed_oids) to process_newly_available_git_data
- Return fetched_oids.len() at the end
This ensures logging accurately reflects which OIDs were actually
fetched from the remote.
|
|
|
|
When fetch_oids returns Ok(vec![]) (all requested OIDs missing from
remote), the log message now says 'Fetch returned no OIDs (not available
on remote)' instead of the misleading 'Fetch succeeded' with oids_fetched=0.
|
|
Add retry loop in fetch_oids that handles git's behavior of stopping
at the first missing OID. When a 'not our ref' error occurs:
- Parse the missing OID from stderr
- Remove it from the fetch list and track it as missing
- Retry with remaining OIDs until success or all OIDs exhausted
This ensures we fetch all available OIDs even when some are missing
from the remote, rather than failing the entire batch.
Also improves error reporting:
- Include URL in all error messages for easier debugging
- Log stderr even when domain is already on naughty list
|
|
Previously, all git upload-pack/receive-pack failures returned HTTP 500,
but the git smart HTTP protocol requires protocol-level errors (like
"not our ref") to be returned as HTTP 200 OK with an ERR pkt-line in
the response body.
Changes:
- Add build_git_protocol_error_response() to create HTTP 200 responses
with properly formatted ERR pkt-line ("ERR <message>\n")
- Add is_git_protocol_error() to detect protocol errors (exit code 128
with stderr content) vs transport errors
- Update handle_upload_pack() and handle_receive_pack() to return
protocol errors as HTTP 200 with ERR pkt-line
- Keep HTTP 500 for actual transport errors (spawn failures, I/O errors,
signals)
This allows git clients to properly parse and display protocol error
messages instead of seeing generic HTTP 500 errors.
|
|
Change protocol error detection to only match WebSocket-specific errors
(websocket, invalid frame) instead of generic 'protocol' keyword which
was incorrectly catching transient git protocol errors.
Git protocol errors like 'fatal: protocol error: bad line length' are
transient network issues that should use backoff/retry, not permanent
naughty list blocking. Only WebSocket/Nostr protocol violations indicate
persistent infrastructure problems.
Fixes production false positive:
- relay.ngit.dev: git protocol error + remote warning misclassified
Add production test cases for git protocol errors and warning combinations.
|
|
Strip URLs (http://, https://, git://, ws://, wss://) from error messages
before classification to prevent false positives from repository names,
paths, or identifiers containing keywords like 'ssl', 'certificate', etc.
- Add strip_urls() function to remove URLs before pattern matching
- Add WebSocket protocol support (ws://, wss://) for relay errors
- Filter remote warnings that don't indicate infrastructure problems
- Use more specific SSL/TLS patterns to avoid npub substring matches
- Reduce test suite from 40 to 13 tests, keeping only edge cases
Fixes false positives seen in production:
- git.shakespeare.diy: 'repository not found' with npub containing 'ssl'
- relay.ngit.dev: HTTP 500 error with npub containing 'ssl'
- gitnostr.com: remote permission warning misclassified as protocol error
|
|
failures
|
|
load_existing_events()
Root cause: `last_connected` was set to Timestamp::now() BEFORE
load_existing_events() was called (line 425), causing the database
query to filter out all existing events with .since(current_time).
The query became: SELECT * FROM events WHERE created_at >= <now>
Result: 0 events returned (nothing has created_at in the future)
Solution: Remove .since() filter from database queries entirely.
The `last_connected` field is now only used for WebSocket subscription
filters to avoid re-fetching events from remote relays on reconnect.
Rationale for this approach over reordering operations:
- Database queries are fast (indexed by kind and created_at)
- Loading all events on startup ensures consistency
- Eliminates subtle ordering dependency that could break in refactoring
- Cleaner mental model: database = full load, WebSocket = incremental
This fixes the issue where ~190 state events weren't being fetched
after deploying the database query fix (commit 4162c90).
Evidence: Production logs showed "Loaded announcements from database
count=0" when there should have been hundreds of announcements.
|
|
|
|
Previously, SelfSubscriber only saw events returned by the WebSocket
subscription to the local relay, which has limits on the number of
events returned. This caused repos with announcements in the database
to never get Layer 2/3 filters created, resulting in missing state events.
Now, on startup, we query the database directly with two separate queries:
1. Query announcements (30617) to populate repo_sync_index
2. Query root events (1617/1618/1621) to create Layer 3 filters
Both queries use .since(last_connected) if available for incremental
loading on reconnect.
Filters are created inline and made mutable to support the .since()
clause, rather than using a shared create_event_filter() method.
Fixes the issue where state events were missing for repos like cashbird
and creative-space that had announcements in the database but weren't
returned by the WebSocket subscription.
|
|
Add proper log level configuration following standard approach:
- CLI flag: --log-level <level>
- Environment variable: NGIT_LOG_LEVEL
- Default: info
- Supports simple levels (error, warn, info, debug, trace)
- Supports filter expressions (e.g., ngit_grasp=debug,actix_web=info)
Configuration is now consistent across all four sources:
1. src/config.rs - Config struct with log_level field
2. docs/reference/configuration.md - Full documentation
3. nix/module.nix - NixOS module with logLevel option
4. .env.example - Example configuration file
This replaces the previous RUST_LOG approach with proper integration
into the ngit-grasp configuration system, enabling trace logging from
CLI, environment variables, or NixOS configuration.
|
|
caught a production bug where npub in url string contained "dns"
triggering false positive
|
|
- Add [PARSE_FAIL] logging when event parsing fails
- Add [PURGATORY_EXPIRED] logging when repos expire from purgatory
- Logs include: kind, event_id, repo, npub, reason
- Supports Phase 4 migration scripts (30-extract-*.sh)
- All 382 tests pass
|
|
When git fetch fails with 'upload-pack: not our ref', git stops at the first
missing OID and doesn't attempt to fetch remaining OIDs. This means if we
request 5 OIDs and the first is missing, we never try the other 4 (which may
exist on the remote).
Changes:
- Parse missing OID from stderr for clearer error messages
- Single OID case: 'remote missing only oid requested: <oid>'
- Multi OID case: Log WARNING and indicate other OIDs weren't attempted
- Identifies the bug that needs retry logic to fetch OIDs individually
|
|
The NIP-11 specification requires the pubkey field to be a 64-character
hex string, but we were incorrectly using npub (bech32) format.
Changes:
- Add Config::relay_owner_pubkey_hex() method to get hex format
- Update NIP-11 document to use hex format instead of npub
- Update test to verify 64-char hex string instead of npub format
Fixes nak relay command error:
'must be a hex string of 64 characters'
|
|
Modern git clients send Content-Encoding: gzip on POST requests to
/git-upload-pack for efficiency. Without decompression, the compressed
binary data was passed directly to git upload-pack, which expected
pkt-line format, causing:
fatal: protocol error: bad line length character: ??
error: RPC failed; HTTP 500
This was discovered in production when git clone requests consistently
failed with HTTP 500 errors. The fix extracts the Content-Encoding
header and uses flate2::GzDecoder to decompress gzip bodies before
passing them to the git subprocess.
|
|
Refactor internal code to use the mark_negentropy_unsupported() method
instead of direct field access for improved readability.
|
|
When negentropy retry makes no progress (relay returns zero events),
this indicates the relay's negentropy implementation is broken. Instead
of marking the batch as failed, we now:
1. Mark the relay as not supporting NIP-77 so future batches skip
negentropy and use REQ+EOSE directly
2. Fall back to REQ+EOSE using semantic filters (kind/author/tags)
for the current batch, which may succeed where ID-based queries fail
This addresses the issue where some relays (e.g., azzamo.net, snort.social)
return event IDs during negentropy diff but fail to serve those events
when requested by ID.
|
|
Enables relay operators to backup/archive specific GRASP servers by domain.
Includes configuration, validation, documentation, and integration tests.
|
|
NIP-34 specifies single clone/relays tags with multiple values, not multiple
tags with single values. Update test helper to match spec.
|
|
Combined Accept and AcceptArchive match arms in builder.rs to ensure
bare repositories are created for both cases. Previously AcceptArchive
had duplicate code that didn't call ensure_bare_repository().
Also includes:
- Config fix: effective_git_data_path() respects explicit paths with memory backend
- TestRelay: Added git_data_path() and archive config support for testing
- Integration tests for archive_read_only behavior
|
|
Increases connection limit across all configuration sources:
- src/config.rs: default_value_t = 4096
- docs/reference/configuration.md: updated default and examples
- nix/module.nix: maxConnections default = 4096
- .env.example: updated default and comment
This allows the relay to handle more concurrent connections and reduces
the likelihood of connection exhaustion under normal load. The previous
limit of 2000 was too conservative for production deployments.
|
|
- Make RateLimit explicit in relay builder (500 subs, 60 events/min)
- Add NGIT_MAX_CONNECTIONS config option (default: 500)
- Update all 4 config locations (src, nix, docs, .env.example)
- Fix documentation error: filter limit 5000→500
- Document Phase 2 deferral decision (per-IP enforcement)
Addresses primary DoS vector (connection exhaustion) with minimal code.
Per-IP rate limiting deferred until abuse detected in production.
Related: issue ff38 (git endpoint throttling - separate concern)
|
|
shutdown/startup
Implement save/restore functionality for rejected events cache and
integrate persistence with relay shutdown/startup lifecycle. Both
purgatory and rejected cache now survive relay restarts.
Key features:
- Serialize rejected events cache to JSON (rejected-events-cache.json)
- Save both hot cache (2min, full events) and cold index (7day, metadata)
- Restore with downtime adjustment (preserves remaining TTL)
- Graceful degradation (missing/corrupted files don't crash)
- File cleanup after successful restore
- Automatic restoration in SyncManager::new()
Integration:
- Shutdown hook saves both purgatory and rejected cache
- Startup hook restores both and re-queues repositories
- Non-fatal errors (logs warnings, continues on failure)
Files:
- src/sync/rejected_index.rs: save_to_disk/restore_from_disk methods
- src/sync/mod.rs: SyncManager integration and auto-restore
- src/main.rs: Shutdown/startup hooks for both caches
- tests/purgatory_persistence.rs: 17 integration tests
Tests: 13 unit tests + 17 integration tests covering full lifecycle
|
|
Implement save/restore functionality for purgatory state to prevent
event loss during relay restarts. Events in purgatory (state events,
PR events, and expired events) are now saved to disk on graceful
shutdown and restored on startup.
Key features:
- Serialize purgatory state to JSON (purgatory-state.json)
- Time conversion helpers for Instant <-> Duration serialization
- Restore with downtime adjustment (preserves remaining TTL)
- Graceful degradation (missing/corrupted files don't crash)
- File cleanup after successful restore
- get_all_identifiers() for re-queueing after restore
Files:
- src/purgatory/persistence.rs: Time conversion helpers
- src/purgatory/types.rs: Serialization derives
- src/purgatory/mod.rs: save_to_disk/restore_from_disk methods
Tests: 15 unit tests covering serialization, downtime, edge cases
|
|
The bug: SelfSubscriber filtered announcements with lists_our_relay() check,
preventing archive_all mode from discovering relays in announcements that
don't list our relay domain.
The insight: SelfSubscriber only receives events that ALREADY passed
write policy validation (archive_all, archive_whitelist, blacklist, etc.)
via admit_event() before being saved to the database. The event flow:
External relay → process_event_static() → write_policy.admit_event()
→ (validation happens here) → save to DB → notify_event()
→ SelfSubscriber receives via WebSocket
So the lists_our_relay() check was redundant double-validation that
broke archive_all mode by filtering events that had already been
accepted by the write policy.
The fix: Simply remove the lists_our_relay() filtering. Events reaching
SelfSubscriber are pre-validated and should all be processed for relay
discovery according to the configured archive policy.
Changes:
- Removed lists_our_relay() check from process_notification() (4 lines)
- Removed unused lists_our_relay() helper function (9 lines)
- Added comment explaining events are pre-validated (3 lines)
- Total: 13 lines removed, 3 lines added
Fixes #194d
|
|
- Update default bind address in src/config.rs to 127.0.0.1:7334
- Update all four critical config sources per AGENTS.md:
- src/config.rs (code default and tests)
- .env.example (development template)
- docs/reference/configuration.md (user documentation)
- nix/module.nix (NixOS deployment)
- Update all documentation examples and references:
- README.md (with note about phone keypad mnemonic)
- docs/how-to/*.md (deploy, prometheus-setup, test-compliance)
- docs/explanation/*.md (architecture, comparison)
- docs/learnings/grasp-audit.md
Port 7334 spells NGIT on a phone keypad, making it memorable and
project-specific.
All tests pass (336 lib tests + 51 integration tests).
|
|
Adds NGIT_EVENT_BLACKLIST option for blocking all events from specific npubs,
taking precedence over all other validation to enable comprehensive moderation
without affecting curation policy.
Key features:
- Simple npub-only format: <npub>,<npub>,...
- Checked FIRST before any other validation (including repository blacklist)
- Blocks ALL event types (announcements, state events, PRs, comments, etc.)
- Events never reach relay storage or purgatory
- Specific rejection reason for operator debugging
Implementation:
- Add EventBlacklistConfig struct with check() method
- Add NGIT_EVENT_BLACKLIST config option and event_blacklist_config() method
- Add config field to PolicyContext for policy access
- Add check_event_blacklist() to Nip34WritePolicy
- Check event blacklist first in admit_event() method (before any other validation)
- 4 new unit tests covering all blacklist behavior
Configuration synced across all four sources:
- src/config.rs: Core implementation with EventBlacklistConfig
- .env.example: Comprehensive documentation with examples
- docs/reference/configuration.md: Complete reference documentation
- nix/module.nix: NixOS module option with environment mapping
README updates:
- Add comprehensive "Curation & Moderation" section
- Document repository whitelists (GRASP-01 and GRASP-05 modes)
- Document repository and event blacklists with precedence order
- Add configuration table for all curation/moderation settings
- Provide real-world examples for different relay configurations
Testing:
- 4 new tests for event blacklist functionality
- All 336 library tests passing
- All 64 integration tests passing
- All 38 filter support tests passing
Verification:
- Repository blacklist confirmed to apply to sync (uses same admit_event flow)
- Sync events validated through process_event_static -> write_policy.admit_event
Use cases:
- Block spam/abusive users completely
- Prevent malicious actors from submitting any events
- Temporary blocks for investigation
- Moderation without affecting whitelist curation policy
|
|
Adds NGIT_REPOSITORY_BLACKLIST option for blocking repositories, taking precedence
over all whitelists (archive and repository) to enable moderation without affecting
curation policy.
Key features:
- Three blacklist formats: <npub>, <npub>/<identifier>, <identifier>
- Blacklist checked first before any other validation
- Overrides archive whitelist and repository whitelist
- Specific rejection reasons based on match type (npub/identifier/both)
- Not flagged in NIP-11 curation (operational, not policy)
Implementation:
- Add BlacklistConfig struct with check() method returning detailed reasons
- Add NGIT_REPOSITORY_BLACKLIST config option and blacklist_config() method
- Update validate_announcement() to check blacklist first with specific reasons
- 12 new unit tests covering all blacklist behavior and precedence
Configuration synced across all four sources:
- src/config.rs: Core implementation with BlacklistConfig
- .env.example: Comprehensive documentation with examples
- docs/reference/configuration.md: Complete reference documentation
- nix/module.nix: NixOS module option with environment mapping
Testing:
- 12 new tests for blacklist functionality (config + validation)
- All 332 library tests passing
- All 38 integration tests passing
Use cases:
- Block spam/malware repos by identifier
- Block abusive users by npub
- Block specific problematic repos by npub/identifier
- Temporary blocks for investigation
|
|
config methods
Refactors configuration validation to fail fast on fatal errors at startup
while gracefully handling recoverable issues (e.g., malformed whitelist entries).
Changes:
- Add Config::validate() for eager validation called immediately after load
- Remove Result<> from archive_config() and repository_config() methods
- WhitelistEntry::parse_whitelist() skips invalid entries with warnings
- Validate relay_owner_nsec format in Config::validate()
- Update all call sites to remove Result handling from config getters
Benefits:
- Fatal config errors (incompatible settings) fail at startup, not runtime
- Recoverable errors (bad whitelist entries) logged as warnings and skipped
- No Result handling scattered throughout runtime code after validation
- Config methods safe to call without error handling after validate()
Testing:
- Add 7 new tests for validation edge cases and error handling
- Total config tests: 40 (up from 33)
- All 320 library tests passing
Breaking change: Config users must call config.validate() after Config::load()
to ensure configuration is valid. This is enforced in main.rs.
|
|
Adds NGIT_REPOSITORY_WHITELIST option for curated relay operation that
accepts only whitelisted repositories while maintaining GRASP-01 compliance
(announcements must list the service). This differs from archive whitelist
which enables GRASP-05 mode and doesn't require service listing.
Key features:
- Supports three whitelist formats: npub, npub/identifier, identifier
- Enforces mutual exclusivity with archive read-only mode
- Updates NIP-11 curation field when whitelist is enabled
- Maintains GRASP-01 compliance (doesn't add GRASP-05 support)
Configuration synced across all four sources: src/config.rs, docs/reference/configuration.md,
nix/module.nix, and .env.example as required by AGENTS.md.
|
|
Implements NGIT_ARCHIVE_READ_ONLY configuration option that defaults to true
when archive mode is enabled, allowing relays to operate as read-only syncs
of archived repositories.
Key changes:
- Add NGIT_ARCHIVE_READ_ONLY config option (defaults to true if archive enabled)
- NIP-11 advertises GRASP-05 support and includes curation field when read-only
- Validation logic rejects non-whitelisted repos in read-only mode
- Comprehensive tests for read-only behavior and defaults
- Full documentation in config reference, .env.example, and NixOS module
Read-only mode enables passive mirroring without being listed in announcements,
useful for backup/archive operations while preventing accidental write acceptance.
|
|
Implements GRASP-05 specification for accepting repository announcements
that don't list this relay, enabling archive, mirror, and backup use cases.
Core Features:
- Three whitelist formats: <npub>, <npub>/<identifier>, <identifier>
- Archive-all mode for complete ecosystem mirrors
- Fail-fast npub validation at startup
- Read-only enforcement (archived repos reject pushes)
- Full GRASP-02 sync (git data + Nostr events)
- Dynamic archive status (no flags/metadata)
Implementation:
- Add ArchiveWhitelistEntry enum with Pubkey/Repository/Identifier variants
- Add ArchiveConfig with validation and matching logic
- Update AnnouncementResult to include AcceptArchive variant
- Refactor validate_announcement() to return AnnouncementResult with archive check
- Update AnnouncementPolicy with catch-all pattern for cleaner code
- Wire archive config through builder and policy layers
Configuration:
- NGIT_ARCHIVE_ALL: Accept all announcements (⚠️ storage risk)
- NGIT_ARCHIVE_WHITELIST: Comma-separated whitelist entries
- Updated docs, .env.example, and nix/module.nix
Testing:
- 28 unit tests for config parsing and whitelist matching
- 7 integration tests for archive mode validation
- All 296 tests passing
Validation Priority:
1. Lists our service → Accept (GRASP-01, read/write)
2. Is maintainer → AcceptMaintainer (multi-maintainer, read/write)
3. Matches archive config → AcceptArchive (GRASP-05, read-only)
4. None of above → Reject
Security Considerations:
- Archive-all mode has storage/bandwidth DoS risk
- Identifier-only format matches any pubkey (use npub/identifier for high-value)
- Invalid npubs cause startup failure (fail-fast)
Documentation:
- Concise explanation focused on rationale
- Reference docs updated with all config options
- README updated to reflect completed feature
- Removed from roadmap, added to compliance section
See docs/explanation/grasp-05-archive.md for details.
|
|
Add GRASP-02 to supported_grasps array in NIP-11 relay information
document to advertise proactive sync capability to clients and tools.
|
|
Add mandatory uploadpack.allowFilter capability to support partial clones
and fetches as required by GRASP-01 specification. This enables efficient
git operations for bandwidth-constrained clients (e.g., browser-based git
clients like git-natural-api).
Changes:
- Add uploadpack.allowFilter=true to git subprocess configuration
- Update SmartGitServer test helper with filter support
- Add integration tests for filter capability advertisement and functionality
- Update documentation to reflect filter as required capability
Tests verify:
- Filter capability is advertised in info/refs
- Filtered clones with blob:none work correctly
- Filtered fetches with tree:0 work correctly
|
|
Previously, purgatory sync was using '--depth=1' when fetching OIDs from
remote servers. This created shallow clones with only 1-2 commits instead
of the complete git history.
The fix removes the '--depth=1' flag, allowing git to fetch the complete
commit history chain when fetching specific commit OIDs. This is the
correct behavior for GRASP - users cloning from our relay should get the
full repository history.
Changes:
- Remove '--depth=1' from git fetch command in RealSyncContext::fetch_oids
- Update comment to clarify that full history is fetched
Impact:
- Production repositories will now contain full git history
- Users cloning from the relay will get complete commit chains
- No more 'shallow' files in git repositories
- May be slightly slower due to fetching more data, but correctness is prioritized
Testing:
- All 564 tests pass (276 unit + 288 integration)
- No regressions in existing functionality
Fixes issue documented in work/active-issues/shallow-git-fetch.md
|
|
Implements ngit_repositories_total metric by counting *.git directories
on disk every time /metrics is requested (~15s interval by Prometheus).
This approach is simpler than increment-on-create because:
- No need to pass metrics through the relay builder chain
- Always accurate and self-correcting
- Negligible performance impact (~100-200 dir entries)
Changes:
- Add count_repositories_on_disk() static method to Metrics
- Update Metrics::render() to count repos before encoding metrics
- Pass git_data_path to Metrics::new() in main.rs
- Consolidate metrics tests to avoid global Prometheus registry conflicts
Fixes repository count metric issue from Phase 8 deployment plan.
|
|
When relay_owner_nsec is provided via CLI argument or environment
variable (e.g., read from a file by the NixOS module), trim any
leading/trailing whitespace including newlines. This matches the
behavior when reading from the .relay-owner.nsec file directly.
Fixes issue where NixOS module reads nsec file with 'cat', which
includes the trailing newline, making the nsec invalid when passed
as a CLI argument.
Also reverted the tr workaround in nix/module.nix since ngit-grasp
now handles this correctly.
|
|
Add comprehensive comment explaining why some relays (azzamo.net,
snort.social) return zero events during negentropy retry even when
they have the events. Documents infinite loop prevention logic and
suggests future REQ+EOSE fallback strategy.
|
|
Announcements were being rejected when clone URLs or relay URLs had
trailing slashes that didn't match. Added URL normalization to strip
trailing slashes before comparison, allowing announcements to be
accepted regardless of trailing slash presence.
- Add normalize_url_for_comparison() helper
- Update has_clone_url() and has_relay() to normalize before matching
- Add comprehensive tests for trailing slash scenarios
Fixes issue in work/active-issues/clone-relays-mismatch-validation.md
|
|
Implement domain-level naughty list tracking for git remotes, reusing the
existing NaughtyListTracker from relay sync. This prevents repeated attempts
to fetch from git domains with persistent infrastructure issues (SSL/TLS
certificate errors, DNS failures).
Changes:
- Updated NaughtyListTracker to track both relay URLs and git domains
- Added git_naughty_list field to RealSyncContext for error classification
- Modified fetch_oids() to classify git fetch errors and record naughty domains
- Updated sync_identifier_next_url() to filter out naughty domains during URL selection
- Added git_naughty_list parameter to ThrottleManager for domain queue processing
- Threaded naughty list through start_sync_loop and all sync functions
- Updated all tests to pass naughty list parameter
The naughty list uses 12-hour expiration (configurable) to allow domains to
recover from infrastructure issues. First occurrence logs WARN, repeats log DEBUG.
|
|
|
|
When negentropy sync fails (one or more filters fail during diff), the
code previously left a pending batch and returned early, preventing any
sync from happening. This caused the "No sync targets found" issue.
Changes:
- Track negentropy success with a boolean flag
- On negentropy failure: clean up pending batch and fall through to REQ+EOSE
- Log the fallback at info level for visibility
- Restructure control flow so REQ+EOSE path executes after negentropy failure
This ensures sync always completes using traditional REQ+EOSE when
NIP-77 negentropy is unavailable or fails.
|
|
Add naughty list tracking for relays with persistent infrastructure issues
(DNS failures, TLS certificate errors, protocol violations) to reduce log
noise and provide better visibility via metrics.
Key features:
- Classify errors into naughty (persistent) vs transient (temporary)
- Track naughty relays with category, reason, and occurrence count
- Log WARN on first naughty occurrence, DEBUG on repeats
- Automatic expiration after 12 hours (configurable)
- Prometheus metrics for monitoring naughty relays by category
- Periodic cleanup task integrated with health checker
Components added:
- src/sync/naughty_list.rs: Core naughty list tracker with error classification
- NaughtyListTracker integration in RelayHealthTracker
- Connection error handling updates in sync manager
- Naughty list metrics (total by category, detailed info per relay)
- Config option for naughty_list_expiration_hours (default: 12)
Closes DNS lookup failures and TLS certificate errors tracking issues.
|
|
Live subscriptions (limit:0, no auto-close) are not tracked in
PendingBatch because they stay open indefinitely for new events.
When they receive EOSE (immediately, since no historic events),
handle_eose can't find them in outstanding_subs.
This is expected behavior, not an error. Changed log level from
warn to trace to reduce noise.
Observed in production logs: sync_live() subscriptions with limit:0
complete immediately and trigger this path.
Issue: work/active-issues/eose-unknown-subscription.md
|
|
Removes kind 30618 (state events) from Layer 1 announcement filter and
adds targeted subscriptions using #d (identifier) tags in Layer 2.
Problem: Layer 1 was receiving ALL state events from all relays,
causing 1000+ rejections for repositories we don't host.
Solution:
- Remove Kind::RepoState from build_announcement_filter (Layer 1)
- Add state_event_filters_for_our_repos() function that creates filters
with kind 30618 and #d tags for only our hosted repo identifiers
- Integrate state filters into build_layer2_and_layer3_filters
- Extract unique identifiers from repo refs and batch by 100 per filter
Benefits:
- Dramatically reduces bandwidth and rejection noise (1000+ → ~0)
- More efficient: one filter with multiple identifiers vs broadcast
- Only receive state events for repositories we actually care about
Resolves: work/active-issues/layer1-state-event-oversubscription.md
|