v0.9.2 · MIT · 518 tests

Why did this extraction take three times longer today?

ixtract answers that question. Every time. With evidence.

Deterministic adaptive extraction runtime for PostgreSQL, MySQL, and SQL Server.

Step 1 of 4 Profile

bash

$ ixtract profile orders --database mydb --user app

Profile — orders
  rows_estimated:     10,241,847
  pk_range:           1 → 10,241,847
  skew_coefficient:   0.12  (low — balanced)
  latency_p50_ms:     8.1
  recommended_chunks: 20

bash

$ ixtract plan orders

Execution Plan
  workers:    8
  chunks:     20
  strategy:   range_chunking
  basis:      controller (run 4 of window)
  estimated:  12.1s @ 846K rows/sec
  plan_hash:  f6b8048a
  verdict:    ✓ SAFE TO RUN

bash

$ ixtract execute orders --output ./data/

  [████████████████████] 20/20 chunks

Summary
  rows_extracted:  10,241,847
  duration:        11.7s
  throughput:      875,371 rows/sec
  anomalies:       none
  output:          orders_20260413.parquet

bash

$ ixtract history orders

Run History — orders
  run_017  8w  856K/s  stable
  run_018  8w  847K/s  stable
  run_019  8w  831K/s  stable
  run_020  8w  824K/s  stable
  run_021  8w  806K/s  converged ✓

  Controller: converged at 8 workers (±3.8% drift)

Questions you ask every week.
That no tool has ever answered.

Not because the data doesn't exist. Because no tool was built to surface it.

Why is this job slower today than yesterday?
Why did runtime double without any code change?
Why does this table always take longer?
Why is throughput fluctuating mid-run?
Why are some chunks fast and others take 10x longer?

How many workers should I actually use?
Why does adding more workers make it worse?
Am I overloading the source database?
Is it safe to run this during business hours?
Can I trust this to run unattended?

ixtract was built to answer all of these.

"Why did this get slower today?"

Most tools show you that throughput dropped. ixtract tells you why — with a reasoning chain, not a label.

Every deviation from expected performance is classified. Not with a single word, but with a structured reasoning chain: the root cause, the evidence that supports it, and a recommendation.

When source latency spikes, ixtract knows. When a table is skewed and one chunk is doing 95% of the work, ixtract names it. When throughput drops 50% between runs, you don't have to guess — the diagnosis is waiting for you.

Evidence-based diagnosis. Not guess-based labels.

bash

$ ixtract diagnose --object events

Diagnosis — events (run_021 vs baseline)
────────────────────────────────────────────────────
Deviation      THROUGHPUT_DROP_SEVERE
Confidence     HIGH
Root Cause     DATA_SKEW

Evidence
  chunk_001     1,502,847 rows  2.07s   ← 97% of work
  chunks 002–006  ~10,000 rows    0.03s each
  skew_ratio:     43.2x max/median
  cv:             2.05  (threshold: 1.0)

Work Stealing    ACTIVE — LPT dispatch engaged
Effective Workers  1.1 / 3 planned

Recommendation
  Data distribution is highly non-uniform.
  Range chunking distributes key space, not work.
  Current mitigation: work_stealing active.
────────────────────────────────────────────────────

bash

$ ixtract plan orders \
    --source-load high \
    --network-quality degraded \
    --priority low

RuntimeContext
  source_load:      high       (multiplier: 0.50)
  network_quality:  degraded   (multiplier: 0.75)
  priority:         low

Worker Resolution
  base (controller):       8
  after env multipliers:   3   (×0.38 combined)
  after priority (low):    2
  final:                   2

Cost Comparison
  workers  duration  cost
  2        11.4s    $0.13  ← planned
  3        11.9s    $0.13
  8        14.2s    $0.14  (over-parallelized)

Verdict:  ✅ SAFE TO RUN

Stop guessing worker counts.

The right number of workers is not 8. It depends on your source, your table, your network, and your history. ixtract calculates it — and explains why.

Adding workers doesn't always help. Sometimes it makes things worse. ixtract's direction-aware controller tracks whether the last adjustment helped or hurt — not just whether throughput went up.

When you're running against a heavily-loaded source, fewer workers can outperform more. The controller discovers this through feedback, not configuration.

Real finding from testing:

2 workers on a high-load source: 920,000 rows/sec

8 workers on the same source: 856,000 rows/sec

Over-parallelization confirmed. The controller learned this in 3 runs without a single config change.

Never accidentally overload your source again.

ixtract maintains a conservative bias. Under uncertainty, it uses fewer workers. It will never let a misconfigured extraction kill a production database.

Conservative by default

When ixtract doesn't have enough history to be confident, it starts conservatively. It scales up as evidence accumulates — never the other way around.

Bounded adaptation

No single adjustment exceeds configured step limits. The controller cannot oscillate. It cannot runaway. Every move is bounded.

Source load awareness

Declare --source-load high and ixtract automatically constrains parallelism. No manual cap calculation. No guessing what "safe" means for your source.

In testing against Azure SQL Server (30ms p50 latency, 100× slower than local), ixtract flagged the anomaly at 44.3 standard deviations below the local baseline — and correctly constrained its own behavior without being told.

Real runs. Real numbers.

Five test runs across local PostgreSQL and Azure SQL Server. These are the actual results.

Run	Table / Config	Result	What it proves
1 — Baseline	pgbench_accounts (10M rows), 8 workers, default	856K rows/sec, 11.7s	Clean cold-start with profiler
2 — Source load	Same table, `--source-load high`, `--network-quality degraded`	920K rows/sec at 2 workers	Fewer workers outperformed 8 at high load
3 — Skewed table	skewed_events (1.55M rows, CV=2.05), work stealing active	43× skew detected, LPT dispatch engaged	Skew detection and mitigation working
4 — Cloud SQL Server	cloud_extraction_test (1M rows, Azure, p50=30ms)	8.7K rows/sec, anomaly flagged at 44.3σ	Cross-environment anomaly detection
5 — Replay	pgbench_accounts (Run 1 replayed), `--run-id run_001`	Plan hash ✓ identical, +0.3% throughput delta	Deterministic replay verified

Test environment: Ubuntu, local PostgreSQL 5432, Azure SQL Server (ixtract-db-server-46). 518 simulation tests passing. 12 integration tests passing. No cherry-picked runs — these are the full test sequence.

Every decision is recorded.
Every run can be replayed exactly.

ixtract is not probabilistic. It does not guess. Every plan is produced by the same deterministic rules from the same inputs — and can be reproduced six months later on different hardware.

✓ Same inputs → same plan. Always.
✓ Every decision has a structured justification you can inspect.
✓ Every run stores a plan fingerprint (SHA-256).
✓ Replay re-executes against the stored plan — not a reconstruction.
✓ Deviation from expected behavior is explained, not hidden.
✓ No probabilistic drift. No unsupervised learning. No black box.

"Replay guarantees identical decisions, not identical results."
Timing varies. Hardware varies. The plan does not.

bash

$ ixtract replay --run-id run_001

Replaying run_001 (pgbench_accounts, 2026-04-08)

Plan Integrity
  fingerprint:  f6b8048a4d2e...  ✔ verified
  version:      1.0              ✔ supported

Decision Check
──────────────────────────────────────────────────
              Original        Replay
──────────────────────────────────────────────────
Workers       8               8
Chunks        20              20
Strategy      range_chunking  range_chunking
Plan Hash     f6b8048a...     f6b8048a...  ✔
──────────────────────────────────────────────────

Outcome Delta
  rows:       10,241,847 → 10,241,847  ✔
  throughput: 856,341/s  → 858,284/s  (+0.3%)
  duration:   11.7s      → 11.6s     (-0.1s)

Determinism: ✔ Verified  (plan_fingerprint match)

Up and running in five minutes.

bash

$ pip install ixtract

extract.py

from ixtract import plan, execute, ExtractionIntent

intent = ExtractionIntent(
    source_type="postgresql",
    source_config={
        "host":     "localhost",
        "database": "mydb",
        "user":     "app",
    },
    object_name="orders",
)

result = plan(intent)
if result.is_safe:
    execution = execute(result)
    print(f"{execution.rows_extracted:,} rows in {execution.duration_seconds:.1f}s")

output

10,241,847 rows in 11.7s

Run stored. Diagnosis available. Controller learning.
Next run will be faster.

Full quickstart guide → CLI reference → Python API →

Built for the full extraction lifecycle.

Each tool does one thing. None of them do each other's job.

ixtract MIT Open Source

Extraction runtime. Self-tuning, deterministic, explainable. Converges to optimal parallelism. Explains every decision.

→ You are here

iPoxy MIT Open Source — Coming Soon

Pipeline reinforcement layer. Pre: gate extractions before they run. Watch: monitor pipelines in production. Gate: CI/CD checks for data pipelines.

ixora Commercial — Coming Soon

Fleet intelligence platform. SLA tracking, cost dashboards, multi-team visibility. Built on ixtract data, scaled to the enterprise.

Single engineer → ixtract

Team reliability → iPoxy

Platform scale → ixora

Why did this extraction take three times longer today?

Questions you ask every week.That no tool has ever answered.

"Why did this get slower today?"

Stop guessing worker counts.

Never accidentally overload your source again.

Conservative by default

Bounded adaptation

Source load awareness

Real runs. Real numbers.

Every decision is recorded.Every run can be replayed exactly.

Up and running in five minutes.

Built for the full extraction lifecycle.

Questions you ask every week.
That no tool has ever answered.

Every decision is recorded.
Every run can be replayed exactly.