Commit e6b8ee8

bryfry <bryon@fryer.io>
2026-01-07 00:05:14
Create test infrastructure for functional parity tests
- Add test/functional directory structure - Create .gitignore for test outputs - Add comprehensive README with test methodology - Document test scenarios for Phases 1-8 Directory structure: - scenarios/: Individual test scripts - lib/: Shared utilities and comparison tools - fixtures/: Test data - output/: Generated outputs (gitignored) Status: STEP 2 complete, ready for unit tests
1 parent 9f54afc
Changed files (2)
test/functional/.gitignore
@@ -0,0 +1,12 @@
+# Functional test outputs (generated during test runs)
+output/
+
+# Compiled test binaries
+lib/warc_compare
+lib/sqlite_compare
+
+# Test artifacts
+*.warc
+*.warc.gz
+*.sqlite
+*.log
test/functional/README.md
@@ -0,0 +1,222 @@
+# Functional Parity Tests
+
+This directory contains functional tests that compare the behavior of Python warcprox and Go gowarcprox to ensure feature parity.
+
+## Directory Structure
+
+```
+test/functional/
+├── README.md           - This file
+├── .gitignore          - Ignore test outputs
+├── run_all.sh          - Master test runner
+├── scenarios/          - Individual test scenarios
+│   ├── 01_http_basic.sh
+│   ├── 02_https_mitm.sh
+│   ├── ...
+├── lib/                - Shared utilities and comparison tools
+│   ├── common.sh       - Bash helper functions
+│   ├── warc_compare.go - WARC file comparator
+│   └── sqlite_compare.go - SQLite DB comparator
+├── fixtures/           - Test data
+│   └── sample_pages/   - HTML test pages
+└── output/             - Test outputs (gitignored)
+    ├── python/         - Python warcprox outputs
+    └── go/             - Go gowarcprox outputs
+```
+
+## Running Tests
+
+### All Tests
+
+```bash
+# Run all functional tests
+./test/functional/run_all.sh
+
+# Keep test output for debugging
+KEEP_OUTPUT=1 ./test/functional/run_all.sh
+```
+
+### Individual Tests
+
+```bash
+# Run a specific test
+./test/functional/scenarios/01_http_basic.sh
+
+# Keep output
+KEEP_OUTPUT=1 ./test/functional/scenarios/01_http_basic.sh
+```
+
+## Prerequisites
+
+1. **Python warcprox** installed in venv:
+   ```bash
+   ./venv/bin/warcprox --version
+   ```
+
+2. **Go gowarcprox** built:
+   ```bash
+   go build -o gowarcprox ./cmd/gowarcprox
+   ```
+
+3. **Comparison tools** built:
+   ```bash
+   cd test/functional/lib
+   go build -o warc_compare warc_compare.go
+   go build -o sqlite_compare sqlite_compare.go
+   ```
+
+## Test Scenarios
+
+### Phase 1-4 Tests (Implemented)
+
+- `01_http_basic.sh` - Basic HTTP GET proxy
+- `02_https_mitm.sh` - HTTPS MITM proxy
+- `03_post_body.sh` - POST with request body
+- `04_headers.sh` - Custom header preservation
+- `05_compression_gzip.sh` - GZIP compression
+- `06_digest_sha1.sh` - SHA1 digest validation
+- `07_digest_sha256.sh` - SHA256 digest validation
+- `08_digest_blake3.sh` - BLAKE3 digest (Go-only)
+- `09_file_rotation.sh` - WARC size-based rotation
+- `10_concurrent.sh` - Concurrent requests
+
+### Phase 5 Tests (Future)
+
+- `20_dedup_basic.sh` - Basic deduplication
+- `21_dedup_revisit.sh` - Revisit record creation
+- `22_dedup_buckets.sh` - Dedup bucket modes
+
+### Phase 6 Tests (Future)
+
+- `30_stats_basic.sh` - Basic statistics tracking
+- `31_stats_buckets.sh` - Stats bucket assignment
+
+### Phase 7 Tests (Future)
+
+- `40_meta_prefix.sh` - Custom WARC prefix via Warcprox-Meta
+- `41_meta_dedup.sh` - Dedup bucket override
+- `42_meta_stats.sh` - Stats bucket assignment
+
+## Test Methodology
+
+1. **Start both proxies** with identical configuration
+2. **Send identical requests** through both proxies
+3. **Compare outputs**:
+   - WARC record count must match
+   - WARC-Payload-Digest must match exactly
+   - WARC-Target-URI must match
+   - HTTP status codes must match
+4. **Accept differences**:
+   - WARC-Record-ID (UUIDs differ)
+   - WARC-Date (timing differs)
+   - Software version strings
+
+## Comparison Tools
+
+### warc_compare
+
+Compares WARC files from Python and Go implementations:
+
+```bash
+./lib/warc_compare \
+  --python output/scenario/python/*.warc.gz \
+  --go output/scenario/go/*.warc.gz \
+  --output diff.json
+```
+
+Validates:
+- Record count
+- Record types
+- Target URIs
+- Payload digests (CRITICAL)
+
+### sqlite_compare
+
+Compares SQLite databases (for dedup/stats tests):
+
+```bash
+./lib/sqlite_compare \
+  --python output/scenario/python/warcprox.sqlite \
+  --go output/scenario/go/warcprox.sqlite \
+  --output diff.json
+```
+
+Validates:
+- Table schemas
+- Row counts
+- Data consistency
+
+## Writing New Tests
+
+See `scenarios/01_http_basic.sh` as a template. Each test should:
+
+1. Source `lib/common.sh`
+2. Call `setup_test <name>`
+3. Start Python and Go proxies
+4. Execute test requests
+5. Stop both proxies
+6. Compare outputs
+7. Call `cleanup_test <name>`
+
+Example:
+
+```bash
+#!/bin/bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+source "${SCRIPT_DIR}/../lib/common.sh"
+
+setup_test "my_test"
+
+start_python_warcprox --port 8000 --dir "${PYTHON_OUTPUT}"
+start_go_gowarcprox --port 8001 --directory "${GO_OUTPUT}"
+
+# Test logic here
+curl -x localhost:8000 http://example.com
+curl -x localhost:8001 http://example.com
+
+stop_python_warcprox
+stop_go_gowarcprox
+
+compare_warc_files "${PYTHON_OUTPUT}" "${GO_OUTPUT}"
+
+cleanup_test "my_test"
+echo "✅ Test passed"
+```
+
+## Debugging Failed Tests
+
+If a test fails:
+
+1. **Keep outputs**:
+   ```bash
+   KEEP_OUTPUT=1 ./scenarios/XX_test.sh
+   ```
+
+2. **Inspect WARC files**:
+   ```bash
+   zcat output/*/python/*.warc.gz | less
+   zcat output/*/go/*.warc.gz | less
+   ```
+
+3. **Compare digests**:
+   ```bash
+   zcat output/*/python/*.warc.gz | grep "WARC-Payload-Digest"
+   zcat output/*/go/*.warc.gz | grep "WARC-Payload-Digest"
+   ```
+
+4. **Check logs**:
+   ```bash
+   cat output/*/python/warcprox.log
+   cat output/*/go/gowarcprox.log
+   ```
+
+## Success Criteria
+
+All tests must pass before resuming Phase 5 implementation:
+
+- ✅ All scenarios exit with status 0
+- ✅ Payload digests match exactly
+- ✅ WARC record counts match
+- ✅ No critical differences in comparison reports