Skip to content

Deterministic Replay

Re-executing a plan with ixtract replay produces an identical decision surface: same workers, same chunk count, same chunking strategy, same plan fingerprint. The physical output may differ — timing and external system state are not controlled.

Replay guarantees identical decisions, not identical results.

Every plan is serialized to canonical JSON before execution:

  • Keys sorted alphabetically
  • No whitespace
  • Floats rounded to 6 decimal places
  • NaN and inf0

The canonical JSON is hashed with SHA-256. The fingerprint is stored in the state store alongside every run.

execute_plan() takes a stored ExecutionPlan directly — no re-profiling, no estimator, no enrichment. The plan is executed exactly as stored.

Each plan is tagged with a plan_version. On replay, ixtract validates:

  1. The stored fingerprint matches the re-serialized plan
  2. The plan version is supported by the current engine

If either check fails, replay raises an error (or warns with --force).

Terminal window
# Find the run ID
ixtract history orders
# Replay it
ixtract replay --run-id run_001
# Replay to a different directory
ixtract replay --run-id run_001 --output-dir ./replay-output
# Force past a version warning
ixtract replay --run-id run_001 --force
from ixtract import replay, ExtractionIntent
intent = ExtractionIntent(
source_type="postgresql",
source_config={"host": "localhost", "database": "mydb", "user": "app"},
object_name="orders",
)
replay_result = replay(run_id="run_001", intent=intent)

The replay command shows a side-by-side comparison:

Decision Check
──────────────────────────────────────────────────
original replay
Workers 8 8
Chunks 20 20
Strategy range_chunking range_chunking
Plan Hash f6b8048a... f6b8048a... ✔ identical
Outcome Delta
──────────────────────────────────────────────────
Throughput 856,410/s → 858,168/s (+0.3%)
Duration 11.7s → 11.6s (-0.1s)
Determinism: ✔ Verified (plan_fingerprint match)

The Outcome Delta shows that throughput can vary slightly between runs — the source system, OS scheduling, and network introduce noise. The Decision Check verifies that the plan itself was identical.

From the 5-run real-world validation:

  • Run 1: 10M rows, pgbench, 8 workers, 20 chunks, 856K/s, 11.7s
  • Run 5 (replay of Run 1): same plan hash ✓, 858K/s, 11.6s (+0.3%, -0.1s)
  • Determinism: ✔ Verified
  • Identical throughput — external system state varies
  • Identical file layout — output segment boundaries may differ under a rotating writer
  • Identical row order within chunks — database query plans may differ
  • Identical timing — OS and network introduce noise
ErrorCause
PlanCorruptionErrorStored fingerprint does not match re-serialized plan
UnsupportedPlanVersionPlan was created with an incompatible engine version
RunNotFoundErrorSpecified run ID does not exist in the state store

Use --force to proceed past UnsupportedPlanVersion if you understand the risk.

  • Audit: prove that a historical extraction used exactly the plan you expect
  • Debugging: re-run a failing extraction with the same configuration
  • Regression testing: verify that a code change does not alter planning behavior
  • Compliance: demonstrate that data was extracted with a specific, recorded plan