Functional Parity Tests

This directory contains functional tests that compare the behavior of Python warcprox and Go gowarcprox to ensure feature parity.

Directory Structure

test/functional/
├── README.md           - This file
├── .gitignore          - Ignore test outputs
├── run_all.sh          - Master test runner
├── scenarios/          - Individual test scenarios
│   ├── 01_http_basic.sh
│   ├── 02_https_mitm.sh
│   ├── ...
├── lib/                - Shared utilities and comparison tools
│   ├── common.sh       - Bash helper functions
│   ├── warc_compare.go - WARC file comparator
│   └── sqlite_compare.go - SQLite DB comparator
├── fixtures/           - Test data
│   └── sample_pages/   - HTML test pages
└── output/             - Test outputs (gitignored)
    ├── python/         - Python warcprox outputs
    └── go/             - Go gowarcprox outputs

Running Tests

All Tests

# Run all functional tests
./test/functional/run_all.sh

# Keep test output for debugging
KEEP_OUTPUT=1 ./test/functional/run_all.sh

Individual Tests

# Run a specific test
./test/functional/scenarios/01_http_basic.sh

# Keep output
KEEP_OUTPUT=1 ./test/functional/scenarios/01_http_basic.sh

Prerequisites

Python warcprox installed in venv:
```
./venv/bin/warcprox --version
```

Go gowarcprox built:

go build -o gowarcprox ./cmd/gowarcprox

Comparison tools built:

cd test/functional/lib
go build -o warc_compare warc_compare.go
go build -o sqlite_compare sqlite_compare.go

Test Scenarios

Phase 1-4 Tests (Implemented)

01_http_basic.sh - Basic HTTP GET proxy
02_https_mitm.sh - HTTPS MITM proxy
03_post_body.sh - POST with request body
04_headers.sh - Custom header preservation
05_compression_gzip.sh - GZIP compression
06_digest_sha1.sh - SHA1 digest validation
07_digest_sha256.sh - SHA256 digest validation
08_digest_blake3.sh - BLAKE3 digest (Go-only)
09_file_rotation.sh - WARC size-based rotation
10_concurrent.sh - Concurrent requests

Phase 5 Tests (Future)

20_dedup_basic.sh - Basic deduplication
21_dedup_revisit.sh - Revisit record creation
22_dedup_buckets.sh - Dedup bucket modes

Phase 6 Tests (Future)

30_stats_basic.sh - Basic statistics tracking
31_stats_buckets.sh - Stats bucket assignment

Phase 7 Tests (Future)

40_meta_prefix.sh - Custom WARC prefix via Warcprox-Meta
41_meta_dedup.sh - Dedup bucket override
42_meta_stats.sh - Stats bucket assignment

Test Methodology

Start both proxies with identical configuration
Send identical requests through both proxies
Compare outputs:
- WARC record count must match
- WARC-Payload-Digest must match exactly
- WARC-Target-URI must match
- HTTP status codes must match
Accept differences:
- WARC-Record-ID (UUIDs differ)
- WARC-Date (timing differs)
- Software version strings

Comparison Tools

warc_compare

Compares WARC files from Python and Go implementations:

./lib/warc_compare \
  --python output/scenario/python/*.warc.gz \
  --go output/scenario/go/*.warc.gz \
  --output diff.json

Validates:

Record count
Record types
Target URIs
Payload digests (CRITICAL)

sqlite_compare

Compares SQLite databases (for dedup/stats tests):

./lib/sqlite_compare \
  --python output/scenario/python/warcprox.sqlite \
  --go output/scenario/go/warcprox.sqlite \
  --output diff.json

Validates:

Table schemas
Row counts
Data consistency

Writing New Tests

See scenarios/01_http_basic.sh as a template. Each test should:

Source lib/common.sh
Call setup_test <name>
Start Python and Go proxies
Execute test requests
Stop both proxies
Compare outputs
Call cleanup_test <name>

Example:

#!/bin/bash
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "${SCRIPT_DIR}/../lib/common.sh"

setup_test "my_test"

start_python_warcprox --port 8000 --dir "${PYTHON_OUTPUT}"
start_go_gowarcprox --port 8001 --directory "${GO_OUTPUT}"

# Test logic here
curl -x localhost:8000 http://example.com
curl -x localhost:8001 http://example.com

stop_python_warcprox
stop_go_gowarcprox

compare_warc_files "${PYTHON_OUTPUT}" "${GO_OUTPUT}"

cleanup_test "my_test"
echo "✅ Test passed"

Debugging Failed Tests

If a test fails:

Keep outputs:
```
KEEP_OUTPUT=1 ./scenarios/XX_test.sh
```

Inspect WARC files:

zcat output/*/python/*.warc.gz | less
zcat output/*/go/*.warc.gz | less

Compare digests:

zcat output/*/python/*.warc.gz | grep "WARC-Payload-Digest"
zcat output/*/go/*.warc.gz | grep "WARC-Payload-Digest"

Check logs:

cat output/*/python/warcprox.log
cat output/*/go/gowarcprox.log

Success Criteria

All tests must pass before resuming Phase 5 implementation:

✅ All scenarios exit with status 0
✅ Payload digests match exactly
✅ WARC record counts match
✅ No critical differences in comparison reports