main
Raw Download raw file

Functional Parity Tests

This directory contains functional tests that compare the behavior of Python warcprox and Go gowarcprox to ensure feature parity.

Directory Structure

test/functional/
├── README.md           - This file
├── .gitignore          - Ignore test outputs
├── run_all.sh          - Master test runner
├── scenarios/          - Individual test scenarios
│   ├── 01_http_basic.sh
│   ├── 02_https_mitm.sh
│   ├── ...
├── lib/                - Shared utilities and comparison tools
│   ├── common.sh       - Bash helper functions
│   ├── warc_compare.go - WARC file comparator
│   └── sqlite_compare.go - SQLite DB comparator
├── fixtures/           - Test data
│   └── sample_pages/   - HTML test pages
└── output/             - Test outputs (gitignored)
    ├── python/         - Python warcprox outputs
    └── go/             - Go gowarcprox outputs

Running Tests

All Tests

# Run all functional tests
./test/functional/run_all.sh

# Keep test output for debugging
KEEP_OUTPUT=1 ./test/functional/run_all.sh

Individual Tests

# Run a specific test
./test/functional/scenarios/01_http_basic.sh

# Keep output
KEEP_OUTPUT=1 ./test/functional/scenarios/01_http_basic.sh

Prerequisites

  1. Python warcprox installed in venv:

    ./venv/bin/warcprox --version
    
  2. Go gowarcprox built:

    go build -o gowarcprox ./cmd/gowarcprox
    
  3. Comparison tools built:

    cd test/functional/lib
    go build -o warc_compare warc_compare.go
    go build -o sqlite_compare sqlite_compare.go
    

Test Scenarios

Phase 1-4 Tests (Implemented)

  • 01_http_basic.sh - Basic HTTP GET proxy
  • 02_https_mitm.sh - HTTPS MITM proxy
  • 03_post_body.sh - POST with request body
  • 04_headers.sh - Custom header preservation
  • 05_compression_gzip.sh - GZIP compression
  • 06_digest_sha1.sh - SHA1 digest validation
  • 07_digest_sha256.sh - SHA256 digest validation
  • 08_digest_blake3.sh - BLAKE3 digest (Go-only)
  • 09_file_rotation.sh - WARC size-based rotation
  • 10_concurrent.sh - Concurrent requests

Phase 5 Tests (Future)

  • 20_dedup_basic.sh - Basic deduplication
  • 21_dedup_revisit.sh - Revisit record creation
  • 22_dedup_buckets.sh - Dedup bucket modes

Phase 6 Tests (Future)

  • 30_stats_basic.sh - Basic statistics tracking
  • 31_stats_buckets.sh - Stats bucket assignment

Phase 7 Tests (Future)

  • 40_meta_prefix.sh - Custom WARC prefix via Warcprox-Meta
  • 41_meta_dedup.sh - Dedup bucket override
  • 42_meta_stats.sh - Stats bucket assignment

Test Methodology

  1. Start both proxies with identical configuration
  2. Send identical requests through both proxies
  3. Compare outputs:
    • WARC record count must match
    • WARC-Payload-Digest must match exactly
    • WARC-Target-URI must match
    • HTTP status codes must match
  4. Accept differences:
    • WARC-Record-ID (UUIDs differ)
    • WARC-Date (timing differs)
    • Software version strings

Comparison Tools

warc_compare

Compares WARC files from Python and Go implementations:

./lib/warc_compare \
  --python output/scenario/python/*.warc.gz \
  --go output/scenario/go/*.warc.gz \
  --output diff.json

Validates:

  • Record count
  • Record types
  • Target URIs
  • Payload digests (CRITICAL)

sqlite_compare

Compares SQLite databases (for dedup/stats tests):

./lib/sqlite_compare \
  --python output/scenario/python/warcprox.sqlite \
  --go output/scenario/go/warcprox.sqlite \
  --output diff.json

Validates:

  • Table schemas
  • Row counts
  • Data consistency

Writing New Tests

See scenarios/01_http_basic.sh as a template. Each test should:

  1. Source lib/common.sh
  2. Call setup_test <name>
  3. Start Python and Go proxies
  4. Execute test requests
  5. Stop both proxies
  6. Compare outputs
  7. Call cleanup_test <name>

Example:

#!/bin/bash
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "${SCRIPT_DIR}/../lib/common.sh"

setup_test "my_test"

start_python_warcprox --port 8000 --dir "${PYTHON_OUTPUT}"
start_go_gowarcprox --port 8001 --directory "${GO_OUTPUT}"

# Test logic here
curl -x localhost:8000 http://example.com
curl -x localhost:8001 http://example.com

stop_python_warcprox
stop_go_gowarcprox

compare_warc_files "${PYTHON_OUTPUT}" "${GO_OUTPUT}"

cleanup_test "my_test"
echo "✅ Test passed"

Debugging Failed Tests

If a test fails:

  1. Keep outputs:

    KEEP_OUTPUT=1 ./scenarios/XX_test.sh
    
  2. Inspect WARC files:

    zcat output/*/python/*.warc.gz | less
    zcat output/*/go/*.warc.gz | less
    
  3. Compare digests:

    zcat output/*/python/*.warc.gz | grep "WARC-Payload-Digest"
    zcat output/*/go/*.warc.gz | grep "WARC-Payload-Digest"
    
  4. Check logs:

    cat output/*/python/warcprox.log
    cat output/*/go/gowarcprox.log
    

Success Criteria

All tests must pass before resuming Phase 5 implementation:

  • ✅ All scenarios exit with status 0
  • ✅ Payload digests match exactly
  • ✅ WARC record counts match
  • ✅ No critical differences in comparison reports