ixtract answers that question. Every time. With evidence.
Deterministic adaptive extraction runtime for PostgreSQL, MySQL, and SQL Server.
Not because the data doesn't exist. Because no tool was built to surface it.
ixtract was built to answer all of these.
Most tools show you that throughput dropped. ixtract tells you why — with a reasoning chain, not a label.
Every deviation from expected performance is classified. Not with a single word, but with a structured reasoning chain: the root cause, the evidence that supports it, and a recommendation.
When source latency spikes, ixtract knows. When a table is skewed and one chunk is doing 95% of the work, ixtract names it. When throughput drops 50% between runs, you don't have to guess — the diagnosis is waiting for you.
Evidence-based diagnosis. Not guess-based labels.
$ ixtract diagnose --object events
Diagnosis — events (run_021 vs baseline)
────────────────────────────────────────────────────
Deviation THROUGHPUT_DROP_SEVERE
Confidence HIGH
Root Cause DATA_SKEW
Evidence
chunk_001 1,502,847 rows 2.07s ← 97% of work
chunks 002–006 ~10,000 rows 0.03s each
skew_ratio: 43.2x max/median
cv: 2.05 (threshold: 1.0)
Work Stealing ACTIVE — LPT dispatch engaged
Effective Workers 1.1 / 3 planned
Recommendation
Data distribution is highly non-uniform.
Range chunking distributes key space, not work.
Current mitigation: work_stealing active.
────────────────────────────────────────────────────
$ ixtract plan orders \
--source-load high \
--network-quality degraded \
--priority low
RuntimeContext
source_load: high (multiplier: 0.50)
network_quality: degraded (multiplier: 0.75)
priority: low
Worker Resolution
base (controller): 8
after env multipliers: 3 (×0.38 combined)
after priority (low): 2
final: 2
Cost Comparison
workers duration cost
2 11.4s $0.13 ← planned
3 11.9s $0.13
8 14.2s $0.14 (over-parallelized)
Verdict: ✅ SAFE TO RUN
The right number of workers is not 8. It depends on your source, your table, your network, and your history. ixtract calculates it — and explains why.
Adding workers doesn't always help. Sometimes it makes things worse. ixtract's direction-aware controller tracks whether the last adjustment helped or hurt — not just whether throughput went up.
When you're running against a heavily-loaded source, fewer workers can outperform more. The controller discovers this through feedback, not configuration.
Over-parallelization confirmed. The controller learned this in 3 runs without a single config change.
ixtract maintains a conservative bias. Under uncertainty, it uses fewer workers. It will never let a misconfigured extraction kill a production database.
When ixtract doesn't have enough history to be confident, it starts conservatively. It scales up as evidence accumulates — never the other way around.
No single adjustment exceeds configured step limits. The controller cannot oscillate. It cannot runaway. Every move is bounded.
Declare --source-load high and ixtract automatically
constrains parallelism. No manual cap calculation.
No guessing what "safe" means for your source.
In testing against Azure SQL Server (30ms p50 latency, 100× slower than local), ixtract flagged the anomaly at 44.3 standard deviations below the local baseline — and correctly constrained its own behavior without being told.
Five test runs across local PostgreSQL and Azure SQL Server. These are the actual results.
| Run | Table / Config | Result | What it proves |
|---|---|---|---|
| 1 — Baseline | pgbench_accounts (10M rows), 8 workers, default | 856K rows/sec, 11.7s | Clean cold-start with profiler |
| 2 — Source load | Same table, --source-load high, --network-quality degraded | 920K rows/sec at 2 workers | Fewer workers outperformed 8 at high load |
| 3 — Skewed table | skewed_events (1.55M rows, CV=2.05), work stealing active | 43× skew detected, LPT dispatch engaged | Skew detection and mitigation working |
| 4 — Cloud SQL Server | cloud_extraction_test (1M rows, Azure, p50=30ms) | 8.7K rows/sec, anomaly flagged at 44.3σ | Cross-environment anomaly detection |
| 5 — Replay | pgbench_accounts (Run 1 replayed), --run-id run_001 | Plan hash ✓ identical, +0.3% throughput delta | Deterministic replay verified |
Test environment: Ubuntu, local PostgreSQL 5432, Azure SQL Server (ixtract-db-server-46). 518 simulation tests passing. 12 integration tests passing. No cherry-picked runs — these are the full test sequence.
ixtract is not probabilistic. It does not guess. Every plan is produced by the same deterministic rules from the same inputs — and can be reproduced six months later on different hardware.
"Replay guarantees identical decisions, not identical results."
Timing varies. Hardware varies. The plan does not.
$ ixtract replay --run-id run_001
Replaying run_001 (pgbench_accounts, 2026-04-08)
Plan Integrity
fingerprint: f6b8048a4d2e... ✔ verified
version: 1.0 ✔ supported
Decision Check
──────────────────────────────────────────────────
Original Replay
──────────────────────────────────────────────────
Workers 8 8
Chunks 20 20
Strategy range_chunking range_chunking
Plan Hash f6b8048a... f6b8048a... ✔
──────────────────────────────────────────────────
Outcome Delta
rows: 10,241,847 → 10,241,847 ✔
throughput: 856,341/s → 858,284/s (+0.3%)
duration: 11.7s → 11.6s (-0.1s)
Determinism: ✔ Verified (plan_fingerprint match)
$ pip install ixtract from ixtract import plan, execute, ExtractionIntent
intent = ExtractionIntent(
source_type="postgresql",
source_config={
"host": "localhost",
"database": "mydb",
"user": "app",
},
object_name="orders",
)
result = plan(intent)
if result.is_safe:
execution = execute(result)
print(f"{execution.rows_extracted:,} rows in {execution.duration_seconds:.1f}s")
10,241,847 rows in 11.7s
Run stored. Diagnosis available. Controller learning.
Next run will be faster.
Each tool does one thing. None of them do each other's job.
Extraction runtime. Self-tuning, deterministic, explainable. Converges to optimal parallelism. Explains every decision.
→ You are here
Pipeline reinforcement layer. Pre: gate extractions before they run. Watch: monitor pipelines in production. Gate: CI/CD checks for data pipelines.
Fleet intelligence platform. SLA tracking, cost dashboards, multi-team visibility. Built on ixtract data, scaled to the enterprise.