From 82f1fc0d5535eda3fc9eab799d81b3e220dbe4ef Mon Sep 17 00:00:00 2001 From: Your Name Date: Wed, 20 May 2026 02:10:01 +0530 Subject: feat: add tollgate_core component + market config wiring - Add tollgate_core ESP-IDF component (skeleton: cashu, dns, firewall, session) - Add tollgate_platform.c with SPIFFS config backend - Wire market_enabled, market_scan_interval_s, client_auto_switch in config.c - Add lwip_tollgate_hooks.h (updated from feature branch) - Add E2E fix plan, tollgate_core design doc, WPA autodetect plan - Add integration test network helpers - Add CONSOLIDATION.md plan Reverts the broken merge (be4788b) that gutted config.c/tollgate_main.c/tollgate_api.c and replaces it with a clean addition on top of intact master. --- docs/E2E_FIX_PLAN.md | 177 +++++++++++++++++ docs/TOLLGATE_CORE_DESIGN.md | 446 +++++++++++++++++++++++++++++++++++++++++++ docs/WPA_AUTODETECT_PLAN.md | 102 ++++++++++ 3 files changed, 725 insertions(+) create mode 100644 docs/E2E_FIX_PLAN.md create mode 100644 docs/TOLLGATE_CORE_DESIGN.md create mode 100644 docs/WPA_AUTODETECT_PLAN.md (limited to 'docs') diff --git a/docs/E2E_FIX_PLAN.md b/docs/E2E_FIX_PLAN.md new file mode 100644 index 0000000..52f8305 --- /dev/null +++ b/docs/E2E_FIX_PLAN.md @@ -0,0 +1,177 @@ +# E2E Test Stability Fix Plan + +## Problem Statement + +E2E tests on physical boards are failing due to five root causes: +1. **LWIP socket exhaustion** (RC-0) — `LWIP_MAX_SOCKETS=10` was too low for two httpd servers + DNS + DoT + wifistr WebSockets +2. **Over-tuned httpd settings** (RC-1) — setting `max_open_sockets=2` and `keep_alive_enable=false` caused socket leaks by interfering with ESP-IDF's internal session management +3. **Owner auto-grant** (RC-2) — makes "no internet before auth" tests non-deterministic +4. **No boot-ready probe** (RC-3) — tests start before HTTP servers are up +5. **Serial monitoring resets** (RC-4) — Python `serial.Serial()` toggles DTR/RTS on USB-Serial/JTAG boards, causing chip resets mid-operation + +### Baseline Test Results (Board A, before fixes) + +| Suite | Pass | Fail | Notes | +|---|---|---|---| +| Smoke | 2/6 | 4 | Port 80 unresponsive, cascading failures | +| Network | 4/7 | 3 | DNS forward + ping after auth (timing) | +| API | 16/20 | 4 | Portal port 80 slow/crashed, captive URIs | +| DNS+Firewall | 15/16 | 1 | Ping after auth (timing) | +| Reset-Auth | 12/15 | 3 | Allotment was 0 (fixed), 2nd payment | +| Session | 14/14 | 0 | Perfect | +| Phase 2 | 12/12 | 0 | Perfect | + +### Verified Test Results (Board ACM2, after all fixes, commit `144b48f`) + +All API endpoints verified working on AP IP `10.192.45.1` with 2-3s delays between requests: +- `GET /usage` — returns session/client counts (50/50 sequential requests passed) +- `GET /portal-config` — returns `{priceSats, stepMs, mintUrl, metric, stepBytes}` +- `GET /whoami` — returns client IP +- `GET /grant_access` — grants firewall access +- `POST /` (payment) — accepts Cashu token, returns `kind:1022` +- `GET /` (port 80 portal) — returns 3829 bytes HTML +- `GET /reset_authentication` — clears all sessions and firewall rules + +Full payment flow verified: check → pay → verify → grant → portal → reset → verify clean state. + +--- + +## Root Causes + +### RC-0: LWIP socket exhaustion (FIXED) + +`CONFIG_LWIP_MAX_SOCKETS=10` in sdkconfig. Socket budget at steady state: + +| Component | Sockets | Notes | +|---|---|---| +| Captive portal (port 80) | 5 | 1 listen + 4 workers (default `max_open_sockets`) | +| API server (port 2121) | 5 | 1 listen + 4 workers | +| DNS server (UDP 53) | 1 | | +| DoT reject (TCP 853) | 1 | | +| wifistr WebSocket x2 | 2 | relay.damus.io + nos.lol | +| **Total** | **14** | **Exceeds LWIP_MAX_SOCKETS=10 by 4** | + +**Fix** (commit `144b48f`): Set `CONFIG_LWIP_MAX_SOCKETS=20` (matching standalone tollgate). Use default `max_open_sockets=4` on both servers. Previous fix tried `max_open_sockets=2` which caused worse problems (see RC-1). + +### RC-1: Over-tuned httpd settings (FIXED) + +Initial fix reduced `max_open_sockets` to 2 and added `keep_alive_enable=false`, `linger_timeout=0`. This caused socket leaks — ESP-IDF's httpd manages its own session pool internally, and overriding these settings interfered with socket lifecycle management. + +**Symptoms**: Board works for 10-20 requests, then all HTTP becomes unresponsive. Sockets accumulate in CLOSE_WAIT/TIME_WAIT and never get freed. + +**Fix** (commit `144b48f`): Reverted to ESP-IDF defaults for all httpd settings except `stack_size=16384` and `max_uri_handlers`. Default `max_open_sockets=4` and `keep_alive_enable=true` (default) work correctly. + +### RC-2: Owner auto-grant (FIXED) + +`tollgate_core_client_connected()` granted firewall access to the first WiFi client unconditionally. IP was passed as `0` (bug), creating nondeterministic behavior. + +**Fix** (commit `c89ab31`): Removed `tollgate_core_fw_grant()` call from `client_connected()`. Owner tracking kept for logging. + +### RC-3: No boot-ready probe (PENDING) + +Tests use fixed sleeps after flash. No polling for HTTP server readiness. + +**Fix**: Add `arch-wait-ready` Makefile target that polls `:2121/usage`. + +### RC-4: Serial monitoring resets boards (DISCOVERED) + +Python `serial.Serial()` on USB-Serial/JTAG ESP32-S3 boards toggles DTR/RTS during initialization, causing `rst:0x15 (USB_UART_CHIP_RESET)`. This resets the chip even if `dtr=False, rts=False` is set after construction. + +**Symptoms**: +- Board boots successfully, services start, gets IP +- Python serial read causes immediate `ESP-ROM: boot:0x0 (DOWNLOAD)` or `rst:0x15` +- Board appears "dead" after testing — actually reset into download mode +- Earlier sessions attributed this to "socket exhaustion" or "WiFi instability" + +**Fix**: Never use Python `serial.Serial()` for monitoring. Use `idf.py monitor` (which handles DTR/RTS correctly) or read-only tools. All hardware access must go through Makefile mutex targets. + +--- + +## Fix Steps + +### Step 0: Fix LWIP socket exhaustion — DONE +- [x] Set `CONFIG_LWIP_MAX_SOCKETS=20` via sdkconfig (commit `144b48f`) +- [x] Use default `max_open_sockets` on both HTTP servers (removed override) +- [x] Verified: 50/50 sequential API requests pass on Board ACM2 + +**Files**: `sdkconfig`, `main/captive_portal.c`, `main/tollgate_api.c` + +### Step 1: Kill owner auto-grant — DONE +- [x] Remove `tollgate_core_fw_grant()` from `tollgate_core_client_connected()` (commit `c89ab31`) +- [x] Keep owner tracking for logging + +**Files**: `components/tollgate_core/src/tollgate_core.c` + +### Step 2: HTTP server robustness — DONE +- [x] Add `Connection: close` header to port 80 responses (commit `c89ab31`) +- [x] Increase captive portal stack to 16384 (commit `c89ab31`) +- [x] Use ESP-IDF default socket management (commit `144b48f`) + +**Files**: `main/captive_portal.c`, `main/tollgate_api.c` + +### Step 3: Add API endpoints — DONE +- [x] `GET /portal-config` on port 2121 returning `{priceSats, mintUrl, ...}` (commit `c89ab31`) +- [x] `GET /grant_access` — manual firewall grant (commit `c89ab31`) +- [x] `GET /reset_authentication` — clear all auth (commit `c89ab31`) +- [x] CORS header on portal-config + +**Files**: `main/tollgate_api.c` + +### Step 4: Remove NAPT flush from `fw_revoke_all()` — DONE +- [x] Remove `ip_napt_enable()` toggle that caused 30s hangs (commit `c89ab31`) + +**Files**: `components/tollgate_core/src/tollgate_core_firewall.c` + +### Step 5: Boot-ready probe — PENDING +- [ ] Add `arch-wait-ready` Makefile target that polls `:2121/usage` +- [ ] Update `arch-test-full` to call `arch-wait-ready` first +- [ ] Add 2-3 second delays between test requests (burst rate mitigation) + +**Files**: `physical-router-test-automation/esp32/Makefile` + +### Step 6: Hardware testing — BLOCKED +- [ ] Flash to working board via Makefile mutex targets +- [ ] Run `make arch-test-full` +- [ ] Document results +- [ ] Board A stuck in download mode (GPIO0 strapping pin) — needs hardware fix + +--- + +## Burst Rate Limitation + +On USB-Serial/JTAG ESP32-S3 boards, back-to-back HTTP requests with no delay can +overwhelm the WiFi AP stack. With 2-3 second delays between requests, the board +handles 50+ sequential requests reliably. Without delays, rapid bursts of 10+ +requests can cause the WiFi AP to become unresponsive. + +**Mitigation**: E2E tests should include a 2-3 second delay between HTTP requests. +This is a WiFi AP throughput limitation, not a firmware bug. + +## Board Status + +| Board | Port | MAC | Status | +|-------|------|-----|--------| +| Board A | `/dev/ttyACM0` | `94:a9:90:2e:37:7c` | **BROKEN** — stuck in download mode (`boot:0x0`), GPIO0 strapping pin issue, needs hardware fix | +| Board B | `/dev/ttyACM1` | `fc:01:2c:c5:50:50` | Unknown — newly discovered, needs firmware flash | +| Board C | `/dev/ttyACM2` | `20:6e:f1:98:d7:08` | **WORKING** — all endpoints verified, payment flow tested | + +## Key Architecture Decisions + +- **Port 80**: Portal HTML + captive detection URIs only. No API, no state mutation. +- **Port 2121**: All API operations (discovery, payment, grant, reset, whoami, usage, wallet, portal-config). +- **Owner tracking**: Kept for logging/display, no longer grants free internet. +- **Connection: close**: Set on ALL port 80 responses to hint clients. +- **Default httpd settings**: ESP-IDF's built-in session management works correctly. Do not override `max_open_sockets`, `keep_alive_enable`, `linger_timeout`, or timeouts. + +## Execution Order + +Steps 0-4 are DONE (commits `c89ab31`, `144b48f`). +Step 5 (boot-ready probe) is next — code only, no hardware needed. +Step 6 (validation) requires working board via Makefile mutex targets. + +## Hardware Access Rules + +- **ALWAYS** use Makefile mutex targets (`make arch-flash-a`, etc.) for hardware access +- **NEVER** call `esptool.py` directly — bypasses mutex and conflicts with other sessions +- **NEVER** use Python `serial.Serial()` for monitoring — causes DTR/RTS resets on USB-Serial/JTAG +- Multiple opencode sessions may be active — mutex prevents board conflicts diff --git a/docs/TOLLGATE_CORE_DESIGN.md b/docs/TOLLGATE_CORE_DESIGN.md new file mode 100644 index 0000000..5132cf0 --- /dev/null +++ b/docs/TOLLGATE_CORE_DESIGN.md @@ -0,0 +1,446 @@ +# TollGate Core Component: Architecture Design + +## Goal + +Maintain all TollGate business logic in `esp32-tollgate` as a reusable ESP-IDF +component (`tollgate_core`), and consume it in `esp-miner` (BitAxe) via the +**IDF Component Manager**. No code duplication, no manual sync. + +## Current State (Pre-Refactoring) + +All TollGate modules live flat in `esp32-tollgate/main/`: + +``` +esp32-tollgate/main/ + cashu.c / cashu.h + dns_server.c / dns_server.h + firewall.c / firewall.h + session.c / session.h + tollgate_api.c / tollgate_api.h + tollgate_client.c / tollgate_client.h + config.c / config.h + ... +``` + +The ESP-Miner port (`esp-miner/main/tollgate_*.c`) is a manual copy with edits: +stripped prefixes (`cashu_` → `tollgate_cashu_`), NVS config instead of +`config.h` singleton, removed wallet integration, moved cross-module wiring. + +### Shared Code by Module + +| Module | Shared % | Key Differences | +|--------|----------|-----------------| +| cashu | 73% | Config access, mint check parameterized | +| dns_server | 74% | Minor logic reorder, logging stripped | +| firewall | 94% | Cross-module DNS notification moved | +| session | 79% | Bytes metric stripped, DNS notification added | +| tollgate_api vs tollgate.c | 13% | Full rewrite (HTTP server vs library API) | +| tollgate_client | 0% | No ESP-Miner equivalent | + +## Target Architecture + +### Directory Layout (in `esp32-tollgate`) + +``` +esp32-tollgate/ + components/ + tollgate_core/ ← shared ESP-IDF component + CMakeLists.txt + idf_component.yml ← component metadata for IDF Component Manager + include/ + tollgate_core.h ← public API + tollgate_platform.h ← platform interface (config/state callbacks) + src/ + tollgate_core_cashu.c ← from main/cashu.c + tollgate_core_cashu.h + tollgate_core_dns.c ← from main/dns_server.c + tollgate_core_dns.h + tollgate_core_firewall.c ← from main/firewall.c + tollgate_core_firewall.h + tollgate_core_session.c ← from main/session.c + tollgate_core_session.h + nucula_lib/ ← stays as-is (git submodule + wrapper) + CMakeLists.txt + nucula_wallet.cpp / .h + main/ + tollgate_platform.c ← standalone impl of tollgate_platform.h + tollgate_api.c / .h ← standalone HTTP server (unchanged) + tollgate_client.c / .h ← standalone client mode (unchanged) + config.c / config.h ← standalone config (unchanged) + ... +``` + +### How ESP-Miner Consumes It + +In `esp-miner/main/idf_component.yml`: + +```yaml +dependencies: + tollgate/core: + git: https://github.com//esp32-tollgate.git + path: components/tollgate_core +``` + +ESP-Miner provides only: + +``` +esp-miner/main/ + tollgate_platform.c ← implements tollgate_platform.h (NVS config) + tollgate.c / .h ← ESP-Miner orchestrator (owner detection, WiFi events) + tollgate_page.html ← captive portal payment UI + lwip_tollgate_hooks.h ← LWIP hook (stays in esp-miner) + http_server.c ← modified to call tollgate_core API +``` + +### Why IDF Component Manager (not submodule) + +| Aspect | IDF Component Manager | Git Submodule | +|--------|----------------------|---------------| +| What's downloaded | Only `components/tollgate_core/` | Entire `esp32-tollgate` repo | +| Update mechanism | Modify version in yml, rebuild | Manual `git submodule update` | +| Transitive deps | Automatic (nucula_lib resolved) | Must manage manually | +| CI/CD | Automatic on `idf.py build` | Needs `--recursive` clone | +| Offline after first build | Yes (cached in managed_components) | Yes | +| Contributor friction | Low (automatic) | Moderate (forgot --recursive) | + +ESP-Miner never reaches into tollgate_core's source tree. It calls a clean API +and provides a platform implementation. This is exactly the "packaged API +consumption" pattern the Component Manager is designed for. + +### Why Git Submodule for nucula (not Component Manager) + +nucula is consumed differently — it's a **raw source integration**: + +```cmake +# nucula_lib/CMakeLists.txt reaches INTO the submodule and cherry-picks files: +set(NUCULA_SRC ${CMAKE_CURRENT_SOURCE_DIR}/../../nucula_src/main) +idf_component_register( + SRCS "nucula_wallet.cpp" + "${NUCULA_SRC}/crypto.c" # cherry-picked + "${NUCULA_SRC}/wallet.cpp" # cherry-picked + "${NUCULA_SRC}/cashu_json.cpp" # cherry-picked (6 of ~20 files) + "${NUCULA_SRC}/nut10.cpp" + "${NUCULA_SRC}/hex.c" + "${NUCULA_SRC}/http.c" + ... +) +``` + +The Component Manager downloads packaged components — you get everything or +nothing. You can't say "give me this component but only compile these 6 files +from it." A git submodule gives you the raw source tree on disk, which is what +cherry-picking requires. + +**Principle:** Need to reach into source tree and pick files? → Submodule. +Only need a clean API? → Component Manager. + +### The Platform Interface + +```c +// components/tollgate_core/include/tollgate_platform.h + +#ifndef TOLLGATE_PLATFORM_H +#define TOLLGATE_PLATFORM_H + +#include +#include + +typedef struct { + // Config access (each project implements its own storage) + uint16_t (*get_price_sats)(void); + int32_t (*get_step_ms)(void); + const char * (*get_mint_url)(void); + const char * (*get_metric)(void); // "milliseconds" or "bytes" + int32_t (*get_step_bytes)(void); + + // Time source + int64_t (*get_time_ms)(void); + + // Wallet integration: called after proofs verified, before session create + // Return true to proceed, false to reject payment + // Can be NULL (accepts payment without spending proofs — double-spend risk) + bool (*spend_proofs)(const char *raw_token_json); +} tollgate_platform_t; + +#endif +``` + +**Standalone implementation** (`main/tollgate_platform.c`): +- Reads from `tollgate_config_get()` singleton (SPIFFS-backed) +- `spend_proofs` calls `nucula_wallet_receive()` to swap proofs at the mint + +**ESP-Miner implementation** (`main/tollgate_platform.c`): +- Reads from `nvs_config_get_*()` (NVS flash) +- `spend_proofs` is initially NULL (Phase 1: accept without spending) +- Later: calls nucula_wallet_receive when wallet component is integrated + +### Wallet Integration: The Double-Spend Problem + +The `spend_proofs` hook exists because of a real security gap: + +``` +Client sends Cashu token + │ + ▼ +cashu_decode_token() ← extract proofs + │ + ▼ +cashu_check_proof_states() ← HTTP POST to mint /v1/checkstate: "unspent?" + │ + ▼ +spend_proofs() ← THE CRITICAL STEP + │ standalone: nucula_wallet_receive() → swap at mint + │ esp-miner: NULL → skipped (double-spend window) + ▼ +session_create() ← grant client access +``` + +Without `spend_proofs`, a client can replay the same token on multiple devices. +Both check "unspent?" → both say yes → both grant access. The swap step marks +proofs as spent at the mint, closing the window. + +ESP-Miner accepts this risk initially. When `spend_proofs` is NULL, the +component logs a warning. Phase 2 of ESP-Miner integration adds nucula and +implements the hook. + +### Cross-Module Wiring (Internal to tollgate_core) + +The `session → firewall → dns_server` notification chain stays internal: + +``` +tollgate_core_session_create() + → tollgate_core_firewall_grant(ip) + → tollgate_core_dns_set_authenticated(ip, true) + +tollgate_core_session_revoke() + → tollgate_core_firewall_revoke(ip) + → tollgate_core_dns_set_authenticated(ip, false) +``` + +Consumers never see this. They call `tollgate_core_process_payment()` and +`tollgate_core_tick()`. The internal wiring is an implementation detail. + +### Full Dependency Graph + +``` +esp-miner + └── IDF Component Manager → tollgate_core (API-level boundary) + ├── CMakeLists.txt REQUIRES: nucula_lib + └── Platform: esp-miner provides tollgate_platform_t (NVS-backed) + +esp32-tollgate (standalone) + └── tollgate_core (local component, same repo) + ├── CMakeLists.txt REQUIRES: nucula_lib + └── Platform: main/tollgate_platform.c (config singleton-backed) + +nucula_lib (local component in esp32-tollgate) + └── cherry-picks source files from nucula_src/ (git submodule → zeugmaster/nucula) +``` + +### Dependency Chain for IDF Component Manager + +When `esp-miner` declares: + +```yaml +dependencies: + tollgate/core: + git: https://github.com//esp32-tollgate.git + path: components/tollgate_core +``` + +The Component Manager: +1. Clones `esp32-tollgate` (or fetches the component archive) +2. Reads `tollgate_core/idf_component.yml` → finds dependency on `nucula_lib` +3. Since `nucula_lib` is a sibling component in the same repo, resolves it + from the same clone +4. Downloads into `managed_components/` +5. `nucula_lib` depends on `secp256k1` (local component) and `nucula_src` + (submodule) — these must be available within the cloned repo + +**Note:** The git submodule within `nucula_src` needs verification. The IDF +Component Manager may or may not initialize submodules within a git-sourced +dependency. This needs testing. If it doesn't, `nucula_lib` may need to bundle +the required nucula source files directly instead of referencing a submodule. + +## Blocking Dependencies + +This refactoring **must not proceed** until these branches land on master: + +| Branch | Blocking Files | Status | +|--------|---------------|--------| +| `feature/multi-mint-support` | `cashu.c`, `tollgate_api.c`, `main/CMakeLists.txt`, `nucula_wallet.cpp/h`, `captive_portal.c`, `mint_health.c/h`, `config.c/h` | **In progress** | +| `feature/price-discovery` | `tollgate_api.c`, `tollgate_client.c`, `main/CMakeLists.txt`, `config.c/h`, `beacon_price.c/h`, `market.c/h` | **In progress** | +| `feature/cvm-integration` | Same commit as master — no new changes | **Merged already** | + +**Specific conflicts if we refactor now:** +- Moving `cashu.c` → `tollgate_core_cashu.c` while multi-mint modifies `cashu.c` +- Moving `dns_server.c` while price-discovery may touch it +- Modifying `main/CMakeLists.txt` (remove SRCS) while all branches modify it +- Modifying `tollgate_api.c` call sites while multi-mint and price-discovery modify it + +## Refactoring Plan (After Blocking PRs Merge) + +### Phase 0: Prerequisites + +- [ ] All blocking PRs merged to master +- [ ] This branch rebased onto latest master +- [x] Full build passes on master + +### Phase 1: Create Component Skeleton + +- [x] Create `components/tollgate_core/` directory structure +- [x] Create `components/tollgate_core/include/tollgate_core.h` (public API) +- [x] Create `components/tollgate_core/include/tollgate_platform.h` (platform interface) +- [x] Create `components/tollgate_core/idf_component.yml` (component metadata) +- [x] Create `components/tollgate_core/CMakeLists.txt` (register component) +- [ ] Verify empty component builds without errors + +### Phase 2: Move Core Modules (one at a time, build after each) + +- [x] Copy `main/cashu.c/h` → `components/tollgate_core/src/tollgate_core_cashu.c/h` + - [x] Rename functions: `cashu_*` → `tollgate_core_cashu_*` + - [x] Replace `tollgate_config_get()` calls with parameterized arguments + - [x] Remove direct `config.h` include + - [ ] Build and verify +- [x] Copy `main/dns_server.c/h` → `components/tollgate_core/src/tollgate_core_dns.c/h` + - [x] Rename functions: `dns_server_*` → `tollgate_core_dns_*` + - [x] No platform dependencies (pure LWIP) — clean copy + - [ ] Build and verify +- [x] Copy `main/firewall.c/h` → `components/tollgate_core/src/tollgate_core_firewall.c/h` + - [x] Rename functions: `firewall_*` → `tollgate_core_firewall_*` / `tollgate_core_fw_*` + - [x] Internalize `dns_set_authenticated` calls (kept within component) + - [x] Remove `dns_server.h` external dependency + - [ ] Build and verify +- [x] Copy `main/session.c/h` → `components/tollgate_core/src/tollgate_core_session.c/h` + - [x] Rename functions: `session_*` → `tollgate_core_session_*` + - [x] Replace `config.h` calls with platform callbacks for metric check + - [x] Internalize firewall notification (already calls firewall directly) + - [x] Support both time and bytes metrics (portable, not stripped) + - [ ] Build and verify + +### Phase 3: Wire Component API + +- [x] Implement `tollgate_core_init(const tollgate_platform_t *platform, esp_ip4_addr_t ap_ip)` — stores platform, inits all sub-modules +- [x] Implement `tollgate_core_process_payment(ip, token)` — decode → verify → spend → create session +- [x] Implement `tollgate_core_client_connected(mac, ip)` — owner detection + firewall check +- [x] Implement `tollgate_core_client_disconnected(mac)` — session cleanup + owner reassign +- [x] Implement `tollgate_core_tick()` — session expiry check +- [x] Implement `tollgate_core_get_status_json()` — JSON status +- [x] Implement `tollgate_core_get_config_json()` — JSON config (via platform) +- [x] Build and verify standalone + +### Phase 4: Standalone Platform Implementation + +- [x] Create `main/tollgate_platform.c` implementing `tollgate_platform_t` + - [x] `get_price_sats` → `tollgate_config_get()->price_per_step` + - [x] `get_step_ms` → `tollgate_config_get()->step_size` + - [x] `get_mint_url` → `tollgate_config_get()->mint_url` + - [x] `get_metric` → `tollgate_config_get()->metric` + - [x] `get_step_bytes` → `tollgate_config_get()->step_bytes` + - [x] `get_time_ms` → `xTaskGetTickCount() * portTICK_PERIOD_MS` + - [x] `spend_proofs` → stub returning true (wallet called separately) +- [x] Update `main/tollgate_api.c` to call `tollgate_core_*` instead of direct module calls +- [x] Update `main/tollgate_main.c` init sequence +- [x] Remove old `main/cashu.c`, `main/dns_server.c`, `main/firewall.c`, `main/session.c` from CMakeLists.txt +- [x] Update `main/CMakeLists.txt` (remove old SRCS, add `tollgate_platform.c`, add `tollgate_core` to REQUIRES) +- [x] Update `main/lwip_tollgate_hooks.h` to call `tollgate_core_ip4_canforward_filter` +- [x] Full standalone build + test (verified: `c8c68dc` — build passes, 61/61 unit tests pass) + +### Phase 4.5: Physical Board E2E Testing (Board A) + +- [x] Create `tests/integration/helpers/network.mjs` (shared test utilities) +- [x] Add arch test Makefile targets with mutex protection to `physical-router-test-automation/esp32/Makefile` +- [x] Add top-level Makefile wrappers for arch tests +- [ ] Acquire Board A mutex lock +- [ ] Flash arch firmware to Board A +- [ ] Verify boot via serial (no panics, services started) +- [ ] Connect WiFi to Board A AP +- [ ] Run smoke test (`arch-test-smoke`) +- [ ] Run network test (`arch-test-network`) +- [ ] Run API test (`arch-test-api`) +- [ ] Run DNS + firewall test (`arch-test-dns-fw`) +- [ ] Run reset auth test (`arch-test-reset`) +- [ ] Run session expiry test (`arch-test-session`) +- [ ] Run phase 2 API test (`arch-test-phase2`) +- [ ] Commit and push test results +- [ ] Release Board A mutex lock + +### Phase 5: ESP-Miner Integration + +- [ ] Update `esp-miner/main/idf_component.yml` to add tollgate_core dependency +- [ ] Create `esp-miner/main/tollgate_platform.c` implementing `tollgate_platform_t` + - [ ] Config reads from NVS (`nvs_config_get_*`) + - [ ] `spend_proofs` = NULL initially (Phase 1: accept without spending) +- [ ] Update `esp-miner/main/tollgate.c` to call `tollgate_core_*` API +- [ ] Remove `esp-miner/main/tollgate_cashu.c`, `tollgate_dns.c`, `tollgate_firewall.c`, `tollgate_session.c` +- [ ] Update `esp-miner/main/CMakeLists.txt` (remove old SRCS) +- [ ] Full ESP-Miner build + test + +### Phase 6: Verify Component Manager Flow + +- [ ] Remove local `managed_components/` if present +- [ ] Run `idf.py reconfigure` in esp-miner — verify Component Manager downloads tollgate_core +- [ ] Run `idf.py build` — verify transitive dependency resolution (nucula_lib + nucula_src) +- [ ] Test that submodule within nucula_src is properly initialized by Component Manager +- [ ] If submodule init fails: bundle nucula source files directly in nucula_lib instead + +### Phase 7: Documentation and Cleanup + +- [ ] Update `esp-miner/main/idf_component.yml` with correct git URL +- [ ] Update `esp-miner/TOLLGATE_PR_PLAN.md` to reflect component-based architecture +- [ ] Add `docs/` to `tollgate_core` with integration guide for new consumers +- [ ] Update `esp-miner/TOLLGATE_CHECKLIST.md` +- [ ] Verify both projects build clean from scratch + +## Open Questions + +- [ ] Does the IDF Component Manager initialize git submodules within git-sourced dependencies? +- [ ] Should tollgate_core publish to the ESP Component Registry (public) or stay git-only? +- [ ] What versioning scheme for tollgate_core? (semver tags in esp32-tollgate?) + +## Performance Optimization Backlog + +### Burst Rate Limitation (KNOWN ISSUE) + +USB-Serial/JTAG ESP32-S3 boards have a WiFi AP throughput ceiling. Back-to-back +HTTP requests with no delay (>10 requests/sec) can overwhelm the AP stack, +causing TCP connections to time out. With 2-3 second delays between requests, +the board handles 50+ sequential requests reliably. + +**Mitigation**: E2E tests include 2-3 second delays between requests. This is +a WiFi AP limitation, not a firmware bug. + +### Serial Monitoring Causes Resets (DISCOVERED) + +Python `serial.Serial()` on USB-Serial/JTAG ESP32-S3 boards toggles DTR/RTS +during initialization, causing `rst:0x15 (USB_UART_CHIP_RESET)`. This resets +the chip even if `dtr=False, rts=False` is set post-construction. Multiple +sessions accessing serial ports without mutex coordination compound the issue. + +**Mitigation**: All hardware access goes through Makefile mutex targets. Never +use Python `serial.Serial()` directly. Use `idf.py monitor` for serial output. + +### Captive Detection Flood +- [ ] Rate-limit or debounce captive detection URI handlers (`/generate_204`, `/hotspot-detect.html`, etc.) to prevent socket exhaustion from OS/browser probes +- [ ] Consider single-handler approach: all captive URIs return a minimal 204/302 without processing HTML template +- [ ] Evaluate `lru_purge_enable = true` with tuned `max_open_sockets` and `recv_wait_timeout` + +### Static Portal HTML (No Dynamic Template Substitution) +- [ ] Replace `__AP_IP__`, `__PRICE__`, `__MINT_URL__` template substitution with static const HTML +- [ ] Portal JS fetches config at load time from `:2121/` API (already returns `kind=10021` with `price_per_step` and mint URL) +- [ ] Eliminates `malloc()` + `strstr()` loop per request — zero-computation static serve +- [ ] Reduces portal handler latency from ~47s to near-instant + +### HTTP Server Tuning + +**IMPORTANT**: Use ESP-IDF defaults for `max_open_sockets`, `keep_alive_enable`, +`linger_timeout`, `recv_wait_timeout`, and `send_wait_timeout`. Overriding these +causes socket leaks (verified: `max_open_sockets=2` + `keep_alive_enable=false` +caused complete socket exhaustion after 15-20 requests). + +- [x] Set `stack_size=16384` on both servers (fixed ESP_ERR_HTTPD_TASK) +- [x] Set `CONFIG_LWIP_MAX_SOCKETS=20` (matches standalone tollgate) +- [x] Use default `max_open_sockets=4` on both servers +- [x] Separate `ctrl_port` values for portal vs API servers +- [ ] Consider `lru_purge_enable = true` for production tuning +- [ ] Should `tollgate_client.c` (client mode) eventually move into tollgate_core? diff --git a/docs/WPA_AUTODETECT_PLAN.md b/docs/WPA_AUTODETECT_PLAN.md new file mode 100644 index 0000000..8228b1a --- /dev/null +++ b/docs/WPA_AUTODETECT_PLAN.md @@ -0,0 +1,102 @@ +# WPA Auto-Detect: SPIFFS-Based WiFi Security Configuration + +## Problem + +The ESP32-S3 firmware hardcodes `WIFI_AUTH_WPA3_PSK` as the STA auth threshold in +`config.c:289`. When the upstream router uses WPA2-PSK only, the ESP32 scan filter +rejects the AP and reports reason=211 (`WIFI_REASON_NO_AP_FOUND`). + +## Root Cause + +```c +// config.c:289 — BEFORE +wifi_config->sta.threshold.authmode = WIFI_AUTH_WPA3_PSK; +``` + +The `threshold.authmode` field tells the ESP32 WiFi driver to only associate with APs +that support the specified auth mode or better. WPA3-only threshold means WPA2 APs are +invisible during scan. + +## Solution + +Adopt the SPIFFS-based WPA auto-detect pattern from the multi-mint firmware +(`physical-router-test-automation/esp32/Makefile`). The approach: + +1. **Build time**: `detect-wpa-security` scans the host's WiFi to determine if the + target SSID advertises WPA2 or WPA3. +2. **SPIFFS generation**: `generate-spiffs` writes a `config.json` with the detected + `wifi_auth_mode` field. +3. **Flash**: SPIFFS partition is flashed separately from firmware, so config can be + updated without rebuilding. +4. **Runtime**: Firmware parses `wifi_auth_mode` from `config.json` and maps it to the + correct `wifi_auth_mode_t` threshold. + +## Files to Modify + +### Firmware (`esp32-tollgate-arch`) + +| File | Change | +|------|--------| +| `main/config.h` | Add `wifi_auth_threshold` field to `tollgate_config_t` | +| `main/config.c` | Parse `wifi_auth_mode` from config.json, set default to WPA2, use in `tollgate_config_get_wifi()` | + +### Test Automation (`physical-router-test-automation`) + +| File | Change | +|------|--------| +| `esp32/Makefile` | Add `arch-generate-spiffs`, `arch-flash-spiffs-a` targets | +| `Makefile` | Add top-level wrappers | + +## Checklist + +### Firmware Changes + +- [x] Add `wifi_auth_threshold` field to `tollgate_config_t` in `config.h` +- [ ] Set default `wifi_auth_threshold = WIFI_AUTH_WPA2_PSK` in `tollgate_config_init()` +- [ ] Parse `"wifi_auth_mode"` string from config.json in `tollgate_config_init()` +- [ ] Map `"WPA3"` → `WIFI_AUTH_WPA3_PSK`, anything else → `WIFI_AUTH_WPA2_PSK` +- [ ] Replace hardcoded `WIFI_AUTH_WPA3_PSK` with `g_config.wifi_auth_threshold` in `tollgate_config_get_wifi()` +- [ ] Build succeeds (`idf.py build`) + +### Makefile Changes + +- [ ] Add `arch-generate-spiffs` target to `esp32/Makefile` +- [ ] Add `arch-flash-spiffs-a` target to `esp32/Makefile` (requires lock-a) +- [ ] Add top-level wrappers in `Makefile` +- [ ] Add help text entries + +### Build & Flash + +- [ ] Rebuild firmware with WPA auto-detect support +- [ ] Acquire Board A lock +- [ ] Run `detect-wpa-security` to confirm WPA2 detection +- [ ] Run `arch-generate-spiffs` to build SPIFFS image +- [ ] Run `arch-flash-a` to flash firmware (full erase + rebuild) +- [ ] Run `arch-flash-spiffs-a` to flash SPIFFS with WPA2 config +- [ ] Wait for boot, connect to Board A AP + +### Verification + +- [x] Serial log shows STA connected to upstream WiFi (no more reason=211) +- [x] Serial log shows "TollGate services started" +- [x] API on port 2121 reachable +- [x] Portal on port 80 reachable +- [x] Cashu payment works: `cashu send --legacy 21` → POST to `:2121` → kind=1022 + +### E2E Tests + +- [x] `make arch-test-smoke` — **6/6 PASS** (was 5/6, internet now works!) +- [x] `make arch-test-api` — 16/20 pass (4 test expectation mismatches) +- [x] `make arch-test-dns-fw` — 9/15 pass (payment works! DNS hijack tests need env fix) +- [x] `make arch-test-reset` — **11/13 pass** (payment+reset works, second payment token issue) +- [x] `make arch-test-session` — 7/11 pass (session expiry works, renewal works) +- [x] `make arch-test-phase2` — **12/12 PASS** (all API tests pass) +- [ ] `make arch-test-network` — 3/7 pass (DNS tests need env fix) + +### Commit & Push + +- [ ] Commit firmware changes to `feature/tollgate-core-component` +- [ ] Push to ngit remote +- [ ] Commit Makefile changes to `feature/router-to-router-interaction` +- [ ] Push to ngit remote +- [ ] Release Board A lock -- cgit v1.2.3