Deterministic Replay

What replay guarantees

Re-executing a plan with ixtract replay produces an identical decision surface: same workers, same chunk count, same chunking strategy, same plan fingerprint. The physical output may differ — timing and external system state are not controlled.

Replay guarantees identical decisions, not identical results.

How it works

Plan fingerprint

Every plan is serialized to canonical JSON before execution:

Keys sorted alphabetically
No whitespace
Floats rounded to 6 decimal places
NaN and inf → 0

The canonical JSON is hashed with SHA-256. The fingerprint is stored in the state store alongside every run.

Planner-free execution

execute_plan() takes a stored ExecutionPlan directly — no re-profiling, no estimator, no enrichment. The plan is executed exactly as stored.

Version check

Each plan is tagged with a plan_version. On replay, ixtract validates:

The stored fingerprint matches the re-serialized plan
The plan version is supported by the current engine

If either check fails, replay raises an error (or warns with --force).

Running a replay

CLI

# Find the run ID
ixtract history orders

# Replay it
ixtract replay --run-id run_001

# Replay to a different directory
ixtract replay --run-id run_001 --output-dir ./replay-output

# Force past a version warning
ixtract replay --run-id run_001 --force

Python API

from ixtract import replay, ExtractionIntent

intent = ExtractionIntent(
    source_type="postgresql",
    source_config={"host": "localhost", "database": "mydb", "user": "app"},
    object_name="orders",
)

replay_result = replay(run_id="run_001", intent=intent)

Replay output

The replay command shows a side-by-side comparison:

Decision Check
──────────────────────────────────────────────────
                original        replay
Workers         8               8
Chunks          20              20
Strategy        range_chunking  range_chunking
Plan Hash       f6b8048a...     f6b8048a...  ✔ identical

Outcome Delta
──────────────────────────────────────────────────
Throughput      856,410/s  →  858,168/s  (+0.3%)
Duration        11.7s      →  11.6s      (-0.1s)

Determinism: ✔ Verified  (plan_fingerprint match)

The Outcome Delta shows that throughput can vary slightly between runs — the source system, OS scheduling, and network introduce noise. The Decision Check verifies that the plan itself was identical.

Validation test result

From the 5-run real-world validation:

Run 1: 10M rows, pgbench, 8 workers, 20 chunks, 856K/s, 11.7s
Run 5 (replay of Run 1): same plan hash ✓, 858K/s, 11.6s (+0.3%, -0.1s)
Determinism: ✔ Verified

What replay does not guarantee

Identical throughput — external system state varies
Identical file layout — output segment boundaries may differ under a rotating writer
Identical row order within chunks — database query plans may differ
Identical timing — OS and network introduce noise

Error conditions

Error	Cause
`PlanCorruptionError`	Stored fingerprint does not match re-serialized plan
`UnsupportedPlanVersion`	Plan was created with an incompatible engine version
`RunNotFoundError`	Specified run ID does not exist in the state store

Use --force to proceed past UnsupportedPlanVersion if you understand the risk.

Use cases

Audit: prove that a historical extraction used exactly the plan you expect
Debugging: re-run a failing extraction with the same configuration
Regression testing: verify that a code change does not alter planning behavior
Compliance: demonstrate that data was extracted with a specific, recorded plan