Deterministic Replay
What replay guarantees
Section titled “What replay guarantees”Re-executing a plan with ixtract replay produces an identical decision surface: same workers, same chunk count, same chunking strategy, same plan fingerprint. The physical output may differ — timing and external system state are not controlled.
Replay guarantees identical decisions, not identical results.
How it works
Section titled “How it works”Plan fingerprint
Section titled “Plan fingerprint”Every plan is serialized to canonical JSON before execution:
- Keys sorted alphabetically
- No whitespace
- Floats rounded to 6 decimal places
NaNandinf→0
The canonical JSON is hashed with SHA-256. The fingerprint is stored in the state store alongside every run.
Planner-free execution
Section titled “Planner-free execution”execute_plan() takes a stored ExecutionPlan directly — no re-profiling, no estimator, no enrichment. The plan is executed exactly as stored.
Version check
Section titled “Version check”Each plan is tagged with a plan_version. On replay, ixtract validates:
- The stored fingerprint matches the re-serialized plan
- The plan version is supported by the current engine
If either check fails, replay raises an error (or warns with --force).
Running a replay
Section titled “Running a replay”# Find the run IDixtract history orders
# Replay itixtract replay --run-id run_001
# Replay to a different directoryixtract replay --run-id run_001 --output-dir ./replay-output
# Force past a version warningixtract replay --run-id run_001 --forcePython API
Section titled “Python API”from ixtract import replay, ExtractionIntent
intent = ExtractionIntent( source_type="postgresql", source_config={"host": "localhost", "database": "mydb", "user": "app"}, object_name="orders",)
replay_result = replay(run_id="run_001", intent=intent)Replay output
Section titled “Replay output”The replay command shows a side-by-side comparison:
Decision Check────────────────────────────────────────────────── original replayWorkers 8 8Chunks 20 20Strategy range_chunking range_chunkingPlan Hash f6b8048a... f6b8048a... ✔ identical
Outcome Delta──────────────────────────────────────────────────Throughput 856,410/s → 858,168/s (+0.3%)Duration 11.7s → 11.6s (-0.1s)
Determinism: ✔ Verified (plan_fingerprint match)The Outcome Delta shows that throughput can vary slightly between runs — the source system, OS scheduling, and network introduce noise. The Decision Check verifies that the plan itself was identical.
Validation test result
Section titled “Validation test result”From the 5-run real-world validation:
- Run 1: 10M rows, pgbench, 8 workers, 20 chunks, 856K/s, 11.7s
- Run 5 (replay of Run 1): same plan hash ✓, 858K/s, 11.6s (+0.3%, -0.1s)
- Determinism: ✔ Verified
What replay does not guarantee
Section titled “What replay does not guarantee”- Identical throughput — external system state varies
- Identical file layout — output segment boundaries may differ under a rotating writer
- Identical row order within chunks — database query plans may differ
- Identical timing — OS and network introduce noise
Error conditions
Section titled “Error conditions”| Error | Cause |
|---|---|
PlanCorruptionError | Stored fingerprint does not match re-serialized plan |
UnsupportedPlanVersion | Plan was created with an incompatible engine version |
RunNotFoundError | Specified run ID does not exist in the state store |
Use --force to proceed past UnsupportedPlanVersion if you understand the risk.
Use cases
Section titled “Use cases”- Audit: prove that a historical extraction used exactly the plan you expect
- Debugging: re-run a failing extraction with the same configuration
- Regression testing: verify that a code change does not alter planning behavior
- Compliance: demonstrate that data was extracted with a specific, recorded plan