Connectors

ixtract supports three database connectors. Each connector enforces its isolation level as an invariant — it is not user-configurable. If it ever becomes configurable, it becomes a plan field.

PostgreSQL

Driver: psycopg2
Isolation: REPEATABLE READ (connector-level invariant)
Chunking: Range chunking on integer PK boundaries

Requirements

PostgreSQL 12+
Integer or bigint primary key
User must have SELECT access to the target table

Connection config

source_config = {
    "host":     "localhost",
    "port":     5432,
    "database": "mydb",
    "user":     "app",
    "password": "secret",
}

Behavior

Each worker opens its own connection. REPEATABLE READ isolation guarantees a consistent snapshot across the extraction without blocking writes on the source.

Work-stealing is supported: if the profiler detects CV > 1.0 (skewed PK distribution), the engine activates LPT dispatch.

Known limitation

Range chunking splits the PK range evenly, not by row density. For heavily skewed tables (CV > 1.0), equal PK ranges produce unequal amounts of work. Density-aware chunking is planned for Phase 5.

MySQL

Driver: PyMySQL
Isolation: START TRANSACTION WITH CONSISTENT SNAPSHOT per worker
Engine: InnoDB only

Requirements

MySQL 8.0+, InnoDB engine
Integer or bigint primary key
User must have SELECT access

Connection config

source_config = {
    "host":     "localhost",
    "port":     3306,
    "database": "mydb",
    "user":     "app",
    "password": "secret",
}

Use source_type="mysql" in ExtractionIntent or --source-type mysql in CLI.

Behavior

Each worker takes its own consistent snapshot. This is a connector-level invariant — ixtract does not support MySQL without per-worker snapshot isolation.

MyISAM tables are explicitly rejected at validation time.

Known limitation

Global snapshot coordination (ensuring all workers see the same logical point in time) is pending Phase 3D. Current behavior: each worker’s snapshot is taken at connection open time, which may differ by milliseconds.

SQL Server

Driver: pyodbc
Isolation: SNAPSHOT isolation
Tested: Azure SQL and SQL Server 2019+

Requirements

SNAPSHOT isolation enabled on the database (ALTER DATABASE ... SET ALLOW_SNAPSHOT_ISOLATION ON)
ODBC driver installed (msodbcsql17 or msodbcsql18)
Integer primary key
User must have SELECT access

Connection config

source_config = {
    "server":   "myserver.database.windows.net",
    "database": "mydb",
    "user":     "app",
    "password": "secret",
    "driver":   "ODBC Driver 18 for SQL Server",
}

Use source_type="sqlserver".

Isolation ordering

The SQL Server connector sets isolation level with autocommit=True before beginning the transaction. This ordering is required; reversing it causes a driver error.

COUNT(*) fallback

For small tables (< 10,000 rows estimate), the connector uses COUNT(*) instead of statistics-based row count estimation.

Real-world performance

Tested against Azure SQL (ixtract-db-server-46.database.windows.net):

1M rows, p50 latency 30ms (30× local)
Throughput: ~8,700 rows/sec (vs. 856K/sec on local PostgreSQL)
Anomaly detection correctly flagged the result at 44.3σ below the local baseline

The 100× throughput difference is expected: cloud SQL latency dominates at small chunk sizes. Use larger chunk sizes or fewer workers for cloud SQL sources.

Connector selection

# Python API
intent = ExtractionIntent(
    source_type="postgresql",  # or "mysql" or "sqlserver"
    ...
)

# CLI
ixtract execute orders --source-type mysql ...

What’s not supported

Columnar databases (Redshift, BigQuery, Snowflake) — planned as separate connectors
Oracle — not planned
Non-integer PKs — not supported; UUID/string PKs are on the Phase 5 list
Tables without a primary key — not supported