Skip to content

Connectors

ixtract supports three database connectors. Each connector enforces its isolation level as an invariant — it is not user-configurable. If it ever becomes configurable, it becomes a plan field.


Driver: psycopg2
Isolation: REPEATABLE READ (connector-level invariant)
Chunking: Range chunking on integer PK boundaries

  • PostgreSQL 12+
  • Integer or bigint primary key
  • User must have SELECT access to the target table
source_config = {
"host": "localhost",
"port": 5432,
"database": "mydb",
"user": "app",
"password": "secret",
}

Each worker opens its own connection. REPEATABLE READ isolation guarantees a consistent snapshot across the extraction without blocking writes on the source.

Work-stealing is supported: if the profiler detects CV > 1.0 (skewed PK distribution), the engine activates LPT dispatch.

Range chunking splits the PK range evenly, not by row density. For heavily skewed tables (CV > 1.0), equal PK ranges produce unequal amounts of work. Density-aware chunking is planned for Phase 5.


Driver: PyMySQL
Isolation: START TRANSACTION WITH CONSISTENT SNAPSHOT per worker
Engine: InnoDB only

  • MySQL 8.0+, InnoDB engine
  • Integer or bigint primary key
  • User must have SELECT access
source_config = {
"host": "localhost",
"port": 3306,
"database": "mydb",
"user": "app",
"password": "secret",
}

Use source_type="mysql" in ExtractionIntent or --source-type mysql in CLI.

Each worker takes its own consistent snapshot. This is a connector-level invariant — ixtract does not support MySQL without per-worker snapshot isolation.

MyISAM tables are explicitly rejected at validation time.

Global snapshot coordination (ensuring all workers see the same logical point in time) is pending Phase 3D. Current behavior: each worker’s snapshot is taken at connection open time, which may differ by milliseconds.


Driver: pyodbc
Isolation: SNAPSHOT isolation
Tested: Azure SQL and SQL Server 2019+

  • SNAPSHOT isolation enabled on the database (ALTER DATABASE ... SET ALLOW_SNAPSHOT_ISOLATION ON)
  • ODBC driver installed (msodbcsql17 or msodbcsql18)
  • Integer primary key
  • User must have SELECT access
source_config = {
"server": "myserver.database.windows.net",
"database": "mydb",
"user": "app",
"password": "secret",
"driver": "ODBC Driver 18 for SQL Server",
}

Use source_type="sqlserver".

The SQL Server connector sets isolation level with autocommit=True before beginning the transaction. This ordering is required; reversing it causes a driver error.

For small tables (< 10,000 rows estimate), the connector uses COUNT(*) instead of statistics-based row count estimation.

Tested against Azure SQL (ixtract-db-server-46.database.windows.net):

  • 1M rows, p50 latency 30ms (30× local)
  • Throughput: ~8,700 rows/sec (vs. 856K/sec on local PostgreSQL)
  • Anomaly detection correctly flagged the result at 44.3σ below the local baseline

The 100× throughput difference is expected: cloud SQL latency dominates at small chunk sizes. Use larger chunk sizes or fewer workers for cloud SQL sources.


# Python API
intent = ExtractionIntent(
source_type="postgresql", # or "mysql" or "sqlserver"
...
)
# CLI
ixtract execute orders --source-type mysql ...

  • Columnar databases (Redshift, BigQuery, Snowflake) — planned as separate connectors
  • Oracle — not planned
  • Non-integer PKs — not supported; UUID/string PKs are on the Phase 5 list
  • Tables without a primary key — not supported