Commit e6b8ee8
Changed files (2)
test
functional
test/functional/.gitignore
@@ -0,0 +1,12 @@
+# Functional test outputs (generated during test runs)
+output/
+
+# Compiled test binaries
+lib/warc_compare
+lib/sqlite_compare
+
+# Test artifacts
+*.warc
+*.warc.gz
+*.sqlite
+*.log
test/functional/README.md
@@ -0,0 +1,222 @@
+# Functional Parity Tests
+
+This directory contains functional tests that compare the behavior of Python warcprox and Go gowarcprox to ensure feature parity.
+
+## Directory Structure
+
+```
+test/functional/
+├── README.md - This file
+├── .gitignore - Ignore test outputs
+├── run_all.sh - Master test runner
+├── scenarios/ - Individual test scenarios
+│ ├── 01_http_basic.sh
+│ ├── 02_https_mitm.sh
+│ ├── ...
+├── lib/ - Shared utilities and comparison tools
+│ ├── common.sh - Bash helper functions
+│ ├── warc_compare.go - WARC file comparator
+│ └── sqlite_compare.go - SQLite DB comparator
+├── fixtures/ - Test data
+│ └── sample_pages/ - HTML test pages
+└── output/ - Test outputs (gitignored)
+ ├── python/ - Python warcprox outputs
+ └── go/ - Go gowarcprox outputs
+```
+
+## Running Tests
+
+### All Tests
+
+```bash
+# Run all functional tests
+./test/functional/run_all.sh
+
+# Keep test output for debugging
+KEEP_OUTPUT=1 ./test/functional/run_all.sh
+```
+
+### Individual Tests
+
+```bash
+# Run a specific test
+./test/functional/scenarios/01_http_basic.sh
+
+# Keep output
+KEEP_OUTPUT=1 ./test/functional/scenarios/01_http_basic.sh
+```
+
+## Prerequisites
+
+1. **Python warcprox** installed in venv:
+ ```bash
+ ./venv/bin/warcprox --version
+ ```
+
+2. **Go gowarcprox** built:
+ ```bash
+ go build -o gowarcprox ./cmd/gowarcprox
+ ```
+
+3. **Comparison tools** built:
+ ```bash
+ cd test/functional/lib
+ go build -o warc_compare warc_compare.go
+ go build -o sqlite_compare sqlite_compare.go
+ ```
+
+## Test Scenarios
+
+### Phase 1-4 Tests (Implemented)
+
+- `01_http_basic.sh` - Basic HTTP GET proxy
+- `02_https_mitm.sh` - HTTPS MITM proxy
+- `03_post_body.sh` - POST with request body
+- `04_headers.sh` - Custom header preservation
+- `05_compression_gzip.sh` - GZIP compression
+- `06_digest_sha1.sh` - SHA1 digest validation
+- `07_digest_sha256.sh` - SHA256 digest validation
+- `08_digest_blake3.sh` - BLAKE3 digest (Go-only)
+- `09_file_rotation.sh` - WARC size-based rotation
+- `10_concurrent.sh` - Concurrent requests
+
+### Phase 5 Tests (Future)
+
+- `20_dedup_basic.sh` - Basic deduplication
+- `21_dedup_revisit.sh` - Revisit record creation
+- `22_dedup_buckets.sh` - Dedup bucket modes
+
+### Phase 6 Tests (Future)
+
+- `30_stats_basic.sh` - Basic statistics tracking
+- `31_stats_buckets.sh` - Stats bucket assignment
+
+### Phase 7 Tests (Future)
+
+- `40_meta_prefix.sh` - Custom WARC prefix via Warcprox-Meta
+- `41_meta_dedup.sh` - Dedup bucket override
+- `42_meta_stats.sh` - Stats bucket assignment
+
+## Test Methodology
+
+1. **Start both proxies** with identical configuration
+2. **Send identical requests** through both proxies
+3. **Compare outputs**:
+ - WARC record count must match
+ - WARC-Payload-Digest must match exactly
+ - WARC-Target-URI must match
+ - HTTP status codes must match
+4. **Accept differences**:
+ - WARC-Record-ID (UUIDs differ)
+ - WARC-Date (timing differs)
+ - Software version strings
+
+## Comparison Tools
+
+### warc_compare
+
+Compares WARC files from Python and Go implementations:
+
+```bash
+./lib/warc_compare \
+ --python output/scenario/python/*.warc.gz \
+ --go output/scenario/go/*.warc.gz \
+ --output diff.json
+```
+
+Validates:
+- Record count
+- Record types
+- Target URIs
+- Payload digests (CRITICAL)
+
+### sqlite_compare
+
+Compares SQLite databases (for dedup/stats tests):
+
+```bash
+./lib/sqlite_compare \
+ --python output/scenario/python/warcprox.sqlite \
+ --go output/scenario/go/warcprox.sqlite \
+ --output diff.json
+```
+
+Validates:
+- Table schemas
+- Row counts
+- Data consistency
+
+## Writing New Tests
+
+See `scenarios/01_http_basic.sh` as a template. Each test should:
+
+1. Source `lib/common.sh`
+2. Call `setup_test <name>`
+3. Start Python and Go proxies
+4. Execute test requests
+5. Stop both proxies
+6. Compare outputs
+7. Call `cleanup_test <name>`
+
+Example:
+
+```bash
+#!/bin/bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+source "${SCRIPT_DIR}/../lib/common.sh"
+
+setup_test "my_test"
+
+start_python_warcprox --port 8000 --dir "${PYTHON_OUTPUT}"
+start_go_gowarcprox --port 8001 --directory "${GO_OUTPUT}"
+
+# Test logic here
+curl -x localhost:8000 http://example.com
+curl -x localhost:8001 http://example.com
+
+stop_python_warcprox
+stop_go_gowarcprox
+
+compare_warc_files "${PYTHON_OUTPUT}" "${GO_OUTPUT}"
+
+cleanup_test "my_test"
+echo "✅ Test passed"
+```
+
+## Debugging Failed Tests
+
+If a test fails:
+
+1. **Keep outputs**:
+ ```bash
+ KEEP_OUTPUT=1 ./scenarios/XX_test.sh
+ ```
+
+2. **Inspect WARC files**:
+ ```bash
+ zcat output/*/python/*.warc.gz | less
+ zcat output/*/go/*.warc.gz | less
+ ```
+
+3. **Compare digests**:
+ ```bash
+ zcat output/*/python/*.warc.gz | grep "WARC-Payload-Digest"
+ zcat output/*/go/*.warc.gz | grep "WARC-Payload-Digest"
+ ```
+
+4. **Check logs**:
+ ```bash
+ cat output/*/python/warcprox.log
+ cat output/*/go/gowarcprox.log
+ ```
+
+## Success Criteria
+
+All tests must pass before resuming Phase 5 implementation:
+
+- ✅ All scenarios exit with status 0
+- ✅ Payload digests match exactly
+- ✅ WARC record counts match
+- ✅ No critical differences in comparison reports