Functional Parity Tests
This directory contains functional tests that compare the behavior of Python warcprox and Go gowarcprox to ensure feature parity.
Directory Structure
test/functional/
├── README.md - This file
├── .gitignore - Ignore test outputs
├── run_all.sh - Master test runner
├── scenarios/ - Individual test scenarios
│ ├── 01_http_basic.sh
│ ├── 02_https_mitm.sh
│ ├── ...
├── lib/ - Shared utilities and comparison tools
│ ├── common.sh - Bash helper functions
│ ├── warc_compare.go - WARC file comparator
│ └── sqlite_compare.go - SQLite DB comparator
├── fixtures/ - Test data
│ └── sample_pages/ - HTML test pages
└── output/ - Test outputs (gitignored)
├── python/ - Python warcprox outputs
└── go/ - Go gowarcprox outputs
Running Tests
All Tests
# Run all functional tests
./test/functional/run_all.sh
# Keep test output for debugging
KEEP_OUTPUT=1 ./test/functional/run_all.sh
Individual Tests
# Run a specific test
./test/functional/scenarios/01_http_basic.sh
# Keep output
KEEP_OUTPUT=1 ./test/functional/scenarios/01_http_basic.sh
Prerequisites
-
Python warcprox installed in venv:
./venv/bin/warcprox --version -
Go gowarcprox built:
go build -o gowarcprox ./cmd/gowarcprox -
Comparison tools built:
cd test/functional/lib go build -o warc_compare warc_compare.go go build -o sqlite_compare sqlite_compare.go
Test Scenarios
Phase 1-4 Tests (Implemented)
01_http_basic.sh- Basic HTTP GET proxy02_https_mitm.sh- HTTPS MITM proxy03_post_body.sh- POST with request body04_headers.sh- Custom header preservation05_compression_gzip.sh- GZIP compression06_digest_sha1.sh- SHA1 digest validation07_digest_sha256.sh- SHA256 digest validation08_digest_blake3.sh- BLAKE3 digest (Go-only)09_file_rotation.sh- WARC size-based rotation10_concurrent.sh- Concurrent requests
Phase 5 Tests (Future)
20_dedup_basic.sh- Basic deduplication21_dedup_revisit.sh- Revisit record creation22_dedup_buckets.sh- Dedup bucket modes
Phase 6 Tests (Future)
30_stats_basic.sh- Basic statistics tracking31_stats_buckets.sh- Stats bucket assignment
Phase 7 Tests (Future)
40_meta_prefix.sh- Custom WARC prefix via Warcprox-Meta41_meta_dedup.sh- Dedup bucket override42_meta_stats.sh- Stats bucket assignment
Test Methodology
- Start both proxies with identical configuration
- Send identical requests through both proxies
- Compare outputs:
- WARC record count must match
- WARC-Payload-Digest must match exactly
- WARC-Target-URI must match
- HTTP status codes must match
- Accept differences:
- WARC-Record-ID (UUIDs differ)
- WARC-Date (timing differs)
- Software version strings
Comparison Tools
warc_compare
Compares WARC files from Python and Go implementations:
./lib/warc_compare \
--python output/scenario/python/*.warc.gz \
--go output/scenario/go/*.warc.gz \
--output diff.json
Validates:
- Record count
- Record types
- Target URIs
- Payload digests (CRITICAL)
sqlite_compare
Compares SQLite databases (for dedup/stats tests):
./lib/sqlite_compare \
--python output/scenario/python/warcprox.sqlite \
--go output/scenario/go/warcprox.sqlite \
--output diff.json
Validates:
- Table schemas
- Row counts
- Data consistency
Writing New Tests
See scenarios/01_http_basic.sh as a template. Each test should:
- Source
lib/common.sh - Call
setup_test <name> - Start Python and Go proxies
- Execute test requests
- Stop both proxies
- Compare outputs
- Call
cleanup_test <name>
Example:
#!/bin/bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "${SCRIPT_DIR}/../lib/common.sh"
setup_test "my_test"
start_python_warcprox --port 8000 --dir "${PYTHON_OUTPUT}"
start_go_gowarcprox --port 8001 --directory "${GO_OUTPUT}"
# Test logic here
curl -x localhost:8000 http://example.com
curl -x localhost:8001 http://example.com
stop_python_warcprox
stop_go_gowarcprox
compare_warc_files "${PYTHON_OUTPUT}" "${GO_OUTPUT}"
cleanup_test "my_test"
echo "✅ Test passed"
Debugging Failed Tests
If a test fails:
-
Keep outputs:
KEEP_OUTPUT=1 ./scenarios/XX_test.sh -
Inspect WARC files:
zcat output/*/python/*.warc.gz | less zcat output/*/go/*.warc.gz | less -
Compare digests:
zcat output/*/python/*.warc.gz | grep "WARC-Payload-Digest" zcat output/*/go/*.warc.gz | grep "WARC-Payload-Digest" -
Check logs:
cat output/*/python/warcprox.log cat output/*/go/gowarcprox.log
Success Criteria
All tests must pass before resuming Phase 5 implementation:
- ✅ All scenarios exit with status 0
- ✅ Payload digests match exactly
- ✅ WARC record counts match
- ✅ No critical differences in comparison reports