diff options
| author | DanConwayDev <DanConwayDev@protonmail.com> | 2026-01-27 19:14:43 +0000 |
|---|---|---|
| committer | DanConwayDev <DanConwayDev@protonmail.com> | 2026-01-27 20:38:24 +0000 |
| commit | 10eab82164bb91236f2afa6b7919d0710609ba7f (patch) | |
| tree | b24630d4ba2cbd2d7b227ed43d1050bf8fc7c121 /docs | |
| parent | a1573c6018c2e81795dc87d36011604dfed80936 (diff) | |
docs: add ngit-relay troubleshooting guide for permission and corruption issues
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/how-to/migrate-to-ngit-grasp.md | 314 |
1 files changed, 314 insertions, 0 deletions
diff --git a/docs/how-to/migrate-to-ngit-grasp.md b/docs/how-to/migrate-to-ngit-grasp.md index 62cad87..abe2191 100644 --- a/docs/how-to/migrate-to-ngit-grasp.md +++ b/docs/how-to/migrate-to-ngit-grasp.md | |||
| @@ -714,3 +714,317 @@ This section documents the specific configuration and lessons learned from migra | |||
| 714 | 2. **Investigate 5 edge cases**: Manual review of unusual states | 714 | 2. **Investigate 5 edge cases**: Manual review of unusual states |
| 715 | 3. **Monitor purgatory**: 382 expired entries indicate sync issues to investigate | 715 | 3. **Monitor purgatory**: 382 expired entries indicate sync issues to investigate |
| 716 | 4. **Plan cutover**: Once re-sync complete, switch DNS/proxy to ngit-grasp | 716 | 4. **Plan cutover**: Once re-sync complete, switch DNS/proxy to ngit-grasp |
| 717 | |||
| 718 | ## ngit-relay Troubleshooting | ||
| 719 | |||
| 720 | This section covers common issues encountered when running ngit-relay in production, including git permission errors and repository corruption. These issues were discovered during the relay.ngit.dev migration and may affect other deployments. | ||
| 721 | |||
| 722 | ### Git Permission Denied Errors | ||
| 723 | |||
| 724 | #### Symptoms | ||
| 725 | |||
| 726 | When cloning repositories, you see: | ||
| 727 | |||
| 728 | ```bash | ||
| 729 | $ git clone https://relay.ngit.dev/npub.../repo.git | ||
| 730 | Cloning into 'repo'... | ||
| 731 | remote: warning: unable to access '/root/.config/git/attributes': Permission denied | ||
| 732 | ``` | ||
| 733 | |||
| 734 | Or in container logs: | ||
| 735 | |||
| 736 | ``` | ||
| 737 | warning: unable to access '/root/.config/git/attributes': Permission denied | ||
| 738 | ``` | ||
| 739 | |||
| 740 | #### Explanation | ||
| 741 | |||
| 742 | This occurs when: | ||
| 743 | 1. Git operations run as a non-root user (typically `nginx` user, UID 101) | ||
| 744 | 2. Git tries to access `/root/.config/git/attributes` for global git configuration | ||
| 745 | 3. The `/root` directory has permissions `0700` (drwx------), preventing non-root users from traversing into it | ||
| 746 | 4. Even though the `attributes` file itself may be world-readable, the nginx user cannot reach it due to parent directory permissions | ||
| 747 | |||
| 748 | **Root cause:** The container runs git commands via fcgiwrap as the nginx user, but `/root` is only accessible by root. | ||
| 749 | |||
| 750 | #### Quick Fix (Temporary - Does Not Survive Container Restart) | ||
| 751 | |||
| 752 | This fix resolves the issue immediately but will be lost when containers restart: | ||
| 753 | |||
| 754 | ```bash | ||
| 755 | # For each ngit-relay container, exec in and create the git config directory | ||
| 756 | sudo podman exec <container-name> sh -c "mkdir -p /root/.config/git && touch /root/.config/git/attributes && chmod 644 /root/.config/git/attributes" | ||
| 757 | |||
| 758 | # Example for specific containers: | ||
| 759 | sudo podman exec gitnostr-com-ngit-relay sh -c "mkdir -p /root/.config/git && touch /root/.config/git/attributes && chmod 644 /root/.config/git/attributes" | ||
| 760 | |||
| 761 | sudo podman exec relay-ngit-dev-ngit-relay sh -c "mkdir -p /root/.config/git && touch /root/.config/git/attributes && chmod 644 /root/.config/git/attributes" | ||
| 762 | ``` | ||
| 763 | |||
| 764 | **Important:** This fix is temporary and will be lost when the container restarts. For a permanent solution, see the NixOS configuration below. | ||
| 765 | |||
| 766 | #### Permanent Fix (NixOS Configuration) | ||
| 767 | |||
| 768 | For NixOS deployments, add systemd services that automatically fix `/root` permissions after each container start: | ||
| 769 | |||
| 770 | ```nix | ||
| 771 | # In your ngit-relay service configuration (e.g., services/relay-ngit-dev-ngit-relay.nix) | ||
| 772 | |||
| 773 | systemd.services.relay-ngit-dev-fix-root-perms = { | ||
| 774 | description = "Fix /root permissions in relay.ngit.dev container for git access"; | ||
| 775 | after = [ "podman-relay-ngit-dev-ngit-relay.service" ]; | ||
| 776 | requires = [ "podman-relay-ngit-dev-ngit-relay.service" ]; | ||
| 777 | wantedBy = [ "multi-user.target" ]; | ||
| 778 | serviceConfig = { | ||
| 779 | Type = "oneshot"; | ||
| 780 | RemainAfterExit = true; | ||
| 781 | ExecStart = "${pkgs.bash}/bin/bash -c 'sleep 5 && ${pkgs.podman}/bin/podman exec relay-ngit-dev-ngit-relay chmod 711 /root'"; | ||
| 782 | Restart = "on-failure"; | ||
| 783 | RestartSec = "10s"; | ||
| 784 | }; | ||
| 785 | }; | ||
| 786 | ``` | ||
| 787 | |||
| 788 | This changes `/root` permissions from `0700` to `0711`, allowing the nginx user to traverse through `/root` to reach `/root/.config/git/`. | ||
| 789 | |||
| 790 | **Why 711?** | ||
| 791 | - `7` (owner/root): Full read/write/execute | ||
| 792 | - `1` (group): Execute only (traverse) | ||
| 793 | - `1` (other): Execute only (traverse) | ||
| 794 | |||
| 795 | This allows non-root users to traverse through `/root` to access subdirectories, while still protecting `/root` contents from being listed or read. | ||
| 796 | |||
| 797 | #### Verification | ||
| 798 | |||
| 799 | After applying the fix: | ||
| 800 | |||
| 801 | ```bash | ||
| 802 | # Test that cloning works without permission warnings | ||
| 803 | git clone https://relay.ngit.dev/npub.../repo.git | ||
| 804 | |||
| 805 | # Should clone successfully with no "Permission denied" warnings | ||
| 806 | |||
| 807 | # Verify /root permissions inside container | ||
| 808 | sudo podman exec relay-ngit-dev-ngit-relay ls -ld /root | ||
| 809 | # Should show: drwx--x--x (711) | ||
| 810 | |||
| 811 | # Verify nginx user can access git config | ||
| 812 | sudo podman exec relay-ngit-dev-ngit-relay su -s /bin/sh nginx -c "cat /root/.config/git/attributes" | ||
| 813 | # Should succeed without "Permission denied" | ||
| 814 | ``` | ||
| 815 | |||
| 816 | ### Git Repository Corruption | ||
| 817 | |||
| 818 | #### Symptoms | ||
| 819 | |||
| 820 | When cloning repositories, you see: | ||
| 821 | |||
| 822 | ```bash | ||
| 823 | $ git clone https://relay.ngit.dev/npub.../repo.git | ||
| 824 | Cloning into 'repo'... | ||
| 825 | remote: fatal: bad tree object 8b765235809eb27159657eb4c97fb37d21c29bf0 | ||
| 826 | remote: aborting due to possible repository corruption on the remote side. | ||
| 827 | fatal: early EOF | ||
| 828 | fatal: fetch-pack: invalid index-pack output | ||
| 829 | ``` | ||
| 830 | |||
| 831 | Or when running `git fsck` on the server: | ||
| 832 | |||
| 833 | ``` | ||
| 834 | broken link from tree 7d60270e1904c30ae6cef7b465ef842a9f9f63c3 | ||
| 835 | to tree 8b765235809eb27159657eb4c97fb37d21c29bf0 | ||
| 836 | missing tree 8b765235809eb27159657eb4c97fb37d21c29bf0 | ||
| 837 | ``` | ||
| 838 | |||
| 839 | #### Explanation | ||
| 840 | |||
| 841 | Repository corruption typically occurs due to: | ||
| 842 | |||
| 843 | 1. **Incomplete push operations**: A git push was interrupted mid-transfer, creating a commit that references objects that were never written to disk | ||
| 844 | 2. **Permission issues during push**: The git-receive-pack process couldn't write objects due to permission problems (e.g., files owned by wrong user) | ||
| 845 | 3. **Disk/filesystem issues**: Rare cases of disk errors or filesystem corruption | ||
| 846 | |||
| 847 | **Common pattern:** A commit exists with references to tree objects, but those tree objects are missing from the repository. Sometimes individual blobs (files) exist as "dangling" objects but were never properly linked into the tree structure. | ||
| 848 | |||
| 849 | **Warning signs:** | ||
| 850 | - HEAD file or objects owned by root when they should be owned by the service user (UID 101) | ||
| 851 | - Dangling blobs in `git fsck` output | ||
| 852 | - Recent permission denied errors in logs | ||
| 853 | |||
| 854 | #### How to Fix | ||
| 855 | |||
| 856 | **Step 1: Locate the corrupted repository** | ||
| 857 | |||
| 858 | ```bash | ||
| 859 | # SSH to the server | ||
| 860 | ssh dc@ngit.dev | ||
| 861 | |||
| 862 | # Find the repository path | ||
| 863 | # For relay.ngit.dev: /persistent/relay-ngit-dev-ngit-relay/data/repos/npub.../repo.git | ||
| 864 | # For gitnostr.com: /persistent/gitnostr-com-ngit-relay/data/repos/npub.../repo.git | ||
| 865 | |||
| 866 | cd /persistent/relay-ngit-dev-ngit-relay/data/repos/npub1c03rad0r6q833vh57kyd3ndu2jry30nkr0wepqfpsm05vq7he25slryrnw/axepool.git | ||
| 867 | ``` | ||
| 868 | |||
| 869 | **Step 2: Diagnose the corruption** | ||
| 870 | |||
| 871 | ```bash | ||
| 872 | # Run git fsck to identify missing/corrupted objects | ||
| 873 | git fsck --full | ||
| 874 | |||
| 875 | # Example output: | ||
| 876 | # broken link from tree 7d60270e1904c30ae6cef7b465ef842a9f9f63c3 | ||
| 877 | # to tree 8b765235809eb27159657eb4c97fb37d21c29bf0 | ||
| 878 | # missing tree 8b765235809eb27159657eb4c97fb37d21c29bf0 | ||
| 879 | # dangling blob 94490b902c9bceb6f901cd0c7c25b685e3685d87 | ||
| 880 | |||
| 881 | # Check which commit references the missing object | ||
| 882 | git log --all --oneline | head -10 | ||
| 883 | |||
| 884 | # Inspect the broken commit | ||
| 885 | git cat-file -p <commit-hash> | ||
| 886 | # This will show which tree is missing | ||
| 887 | ``` | ||
| 888 | |||
| 889 | **Step 3: Attempt automatic repair** | ||
| 890 | |||
| 891 | Try these in order: | ||
| 892 | |||
| 893 | ```bash | ||
| 894 | # Option A: Repack and garbage collect | ||
| 895 | git gc --aggressive --prune=now | ||
| 896 | |||
| 897 | # Then check if corruption is fixed | ||
| 898 | git fsck --full | ||
| 899 | |||
| 900 | # Option B: If that doesn't work, try recovering from pack files | ||
| 901 | git unpack-objects < .git/objects/pack/*.pack | ||
| 902 | git fsck --full | ||
| 903 | ``` | ||
| 904 | |||
| 905 | **Step 4: Manual reconstruction (if automatic repair fails)** | ||
| 906 | |||
| 907 | If the missing tree object can be reconstructed from dangling blobs: | ||
| 908 | |||
| 909 | ```bash | ||
| 910 | # 1. Identify what should be in the missing tree | ||
| 911 | # Look at the commit message and nearby commits to understand the structure | ||
| 912 | |||
| 913 | # 2. Find dangling blobs that might belong to the tree | ||
| 914 | git fsck --full | grep "dangling blob" | ||
| 915 | |||
| 916 | # 3. Examine each dangling blob to identify files | ||
| 917 | git cat-file -p 94490b902c9bceb6f901cd0c7c25b685e3685d87 | ||
| 918 | |||
| 919 | # 4. Reconstruct the tree manually | ||
| 920 | # This requires creating a new tree object with the correct structure | ||
| 921 | # Example (advanced): | ||
| 922 | git mktree <<EOF | ||
| 923 | 100644 blob <blob-hash> filename1.rs | ||
| 924 | 100644 blob <blob-hash> filename2.rs | ||
| 925 | EOF | ||
| 926 | # This outputs a new tree hash | ||
| 927 | |||
| 928 | # 5. Create a new commit with the fixed tree | ||
| 929 | git commit-tree <new-tree-hash> -p <parent-commit> -m "Reconstructed commit message" | ||
| 930 | # This outputs a new commit hash | ||
| 931 | |||
| 932 | # 6. Update the branch reference | ||
| 933 | git update-ref refs/heads/<branch-name> <new-commit-hash> | ||
| 934 | |||
| 935 | # 7. Clean up | ||
| 936 | git gc --prune=now | ||
| 937 | ``` | ||
| 938 | |||
| 939 | **Step 5: Verify the fix** | ||
| 940 | |||
| 941 | ```bash | ||
| 942 | # Run fsck again - should show no errors | ||
| 943 | git fsck --full | ||
| 944 | |||
| 945 | # Test clone locally | ||
| 946 | git clone /path/to/repo.git /tmp/test-clone | ||
| 947 | |||
| 948 | # Test clone via HTTP | ||
| 949 | git clone https://relay.ngit.dev/npub.../repo.git /tmp/test-clone-http | ||
| 950 | ``` | ||
| 951 | |||
| 952 | **Step 6: Fix ownership and permissions** | ||
| 953 | |||
| 954 | Ensure all repository files are owned by the correct user: | ||
| 955 | |||
| 956 | ```bash | ||
| 957 | # For ngit-relay containers, files should be owned by UID 101 (nginx user) | ||
| 958 | sudo chown -R 101:101 /persistent/relay-ngit-dev-ngit-relay/data/repos/npub.../repo.git | ||
| 959 | |||
| 960 | # Verify | ||
| 961 | ls -la /persistent/relay-ngit-dev-ngit-relay/data/repos/npub.../repo.git | ||
| 962 | ``` | ||
| 963 | |||
| 964 | **Step 7: Replicate fix to other instances (if applicable)** | ||
| 965 | |||
| 966 | If you have multiple relay instances (e.g., gitnostr.com and relay.ngit.dev), replicate the fix: | ||
| 967 | |||
| 968 | ```bash | ||
| 969 | # Copy the repaired pack files | ||
| 970 | sudo cp /persistent/relay-ngit-dev-ngit-relay/data/repos/npub.../repo.git/objects/pack/* \ | ||
| 971 | /persistent/gitnostr-com-ngit-relay/data/repos/npub.../repo.git/objects/pack/ | ||
| 972 | |||
| 973 | # Update the branch reference | ||
| 974 | cd /persistent/gitnostr-com-ngit-relay/data/repos/npub.../repo.git | ||
| 975 | git update-ref refs/heads/<branch-name> <new-commit-hash> | ||
| 976 | |||
| 977 | # Fix ownership | ||
| 978 | sudo chown -R 101:101 /persistent/gitnostr-com-ngit-relay/data/repos/npub.../repo.git | ||
| 979 | |||
| 980 | # Clean up | ||
| 981 | git gc --prune=now | ||
| 982 | ``` | ||
| 983 | |||
| 984 | #### Prevention | ||
| 985 | |||
| 986 | To prevent future corruption: | ||
| 987 | |||
| 988 | 1. **Fix permission issues first**: Ensure the permission denied errors are resolved (see previous section) | ||
| 989 | 2. **Monitor for root-owned files**: Files in git repositories should be owned by UID 101, not root | ||
| 990 | 3. **Check disk health**: Run `df -h` and `smartctl` to ensure disk is healthy | ||
| 991 | 4. **Enable git fsck in monitoring**: Periodically run `git fsck` on repositories to catch corruption early | ||
| 992 | |||
| 993 | ```bash | ||
| 994 | # Add to monitoring/cron (example) | ||
| 995 | find /persistent/*/data/repos -name "*.git" -type d | while read repo; do | ||
| 996 | echo "Checking $repo" | ||
| 997 | git -C "$repo" fsck --full 2>&1 | grep -v "^Checking\|^dangling" | ||
| 998 | done | ||
| 999 | ``` | ||
| 1000 | |||
| 1001 | #### Real-World Example: axepool.git Corruption | ||
| 1002 | |||
| 1003 | During the relay.ngit.dev migration, the `axepool.git` repository was corrupted: | ||
| 1004 | |||
| 1005 | **Problem:** | ||
| 1006 | - Commit `e84518b` referenced tree `8b765235...` (the `src` directory) | ||
| 1007 | - Tree `8b765235...` was missing from the repository | ||
| 1008 | - Blob `94490b90...` (mint_client.rs) existed as a dangling object but wasn't linked | ||
| 1009 | |||
| 1010 | **Root cause:** | ||
| 1011 | - An incomplete push operation | ||
| 1012 | - Permission issues (HEAD file was owned by root) | ||
| 1013 | - The commit was created but the tree object was never written | ||
| 1014 | |||
| 1015 | **Solution:** | ||
| 1016 | 1. Identified the missing tree should contain: `lib.rs`, `main.rs`, `mint_client.rs` | ||
| 1017 | 2. Found the dangling blob `94490b90...` was `mint_client.rs` | ||
| 1018 | 3. Reconstructed the `src` tree with all three files | ||
| 1019 | 4. Created new commit `e12bc3cf...` with the fixed tree | ||
| 1020 | 5. Updated `refs/heads/add-missing-hooks` to point to the new commit | ||
| 1021 | 6. Ran `git gc --prune=now` to clean up | ||
| 1022 | 7. Replicated fix to gitnostr.com instance | ||
| 1023 | |||
| 1024 | **Result:** Both relays now clone successfully with all files intact. | ||
| 1025 | |||
| 1026 | ### Additional Resources | ||
| 1027 | |||
| 1028 | - **ngit-relay repository**: https://github.com/danconwaydev/ngit-relay | ||
| 1029 | - **Git internals documentation**: https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain | ||
| 1030 | - **Podman documentation**: https://docs.podman.io/ | ||