Connectors
ixtract supports three database connectors. Each connector enforces its isolation level as an invariant — it is not user-configurable. If it ever becomes configurable, it becomes a plan field.
PostgreSQL
Section titled “PostgreSQL”Driver: psycopg2
Isolation: REPEATABLE READ (connector-level invariant)
Chunking: Range chunking on integer PK boundaries
Requirements
Section titled “Requirements”- PostgreSQL 12+
- Integer or bigint primary key
- User must have
SELECTaccess to the target table
Connection config
Section titled “Connection config”source_config = { "host": "localhost", "port": 5432, "database": "mydb", "user": "app", "password": "secret",}Behavior
Section titled “Behavior”Each worker opens its own connection. REPEATABLE READ isolation guarantees a consistent snapshot across the extraction without blocking writes on the source.
Work-stealing is supported: if the profiler detects CV > 1.0 (skewed PK distribution), the engine activates LPT dispatch.
Known limitation
Section titled “Known limitation”Range chunking splits the PK range evenly, not by row density. For heavily skewed tables (CV > 1.0), equal PK ranges produce unequal amounts of work. Density-aware chunking is planned for Phase 5.
Driver: PyMySQL
Isolation: START TRANSACTION WITH CONSISTENT SNAPSHOT per worker
Engine: InnoDB only
Requirements
Section titled “Requirements”- MySQL 8.0+, InnoDB engine
- Integer or bigint primary key
- User must have
SELECTaccess
Connection config
Section titled “Connection config”source_config = { "host": "localhost", "port": 3306, "database": "mydb", "user": "app", "password": "secret",}Use source_type="mysql" in ExtractionIntent or --source-type mysql in CLI.
Behavior
Section titled “Behavior”Each worker takes its own consistent snapshot. This is a connector-level invariant — ixtract does not support MySQL without per-worker snapshot isolation.
MyISAM tables are explicitly rejected at validation time.
Known limitation
Section titled “Known limitation”Global snapshot coordination (ensuring all workers see the same logical point in time) is pending Phase 3D. Current behavior: each worker’s snapshot is taken at connection open time, which may differ by milliseconds.
SQL Server
Section titled “SQL Server”Driver: pyodbc
Isolation: SNAPSHOT isolation
Tested: Azure SQL and SQL Server 2019+
Requirements
Section titled “Requirements”- SNAPSHOT isolation enabled on the database (
ALTER DATABASE ... SET ALLOW_SNAPSHOT_ISOLATION ON) - ODBC driver installed (
msodbcsql17ormsodbcsql18) - Integer primary key
- User must have
SELECTaccess
Connection config
Section titled “Connection config”source_config = { "server": "myserver.database.windows.net", "database": "mydb", "user": "app", "password": "secret", "driver": "ODBC Driver 18 for SQL Server",}Use source_type="sqlserver".
Isolation ordering
Section titled “Isolation ordering”The SQL Server connector sets isolation level with autocommit=True before beginning the transaction. This ordering is required; reversing it causes a driver error.
COUNT(*) fallback
Section titled “COUNT(*) fallback”For small tables (< 10,000 rows estimate), the connector uses COUNT(*) instead of statistics-based row count estimation.
Real-world performance
Section titled “Real-world performance”Tested against Azure SQL (ixtract-db-server-46.database.windows.net):
- 1M rows, p50 latency 30ms (30× local)
- Throughput: ~8,700 rows/sec (vs. 856K/sec on local PostgreSQL)
- Anomaly detection correctly flagged the result at 44.3σ below the local baseline
The 100× throughput difference is expected: cloud SQL latency dominates at small chunk sizes. Use larger chunk sizes or fewer workers for cloud SQL sources.
Connector selection
Section titled “Connector selection”# Python APIintent = ExtractionIntent( source_type="postgresql", # or "mysql" or "sqlserver" ...)
# CLIixtract execute orders --source-type mysql ...What’s not supported
Section titled “What’s not supported”- Columnar databases (Redshift, BigQuery, Snowflake) — planned as separate connectors
- Oracle — not planned
- Non-integer PKs — not supported; UUID/string PKs are on the Phase 5 list
- Tables without a primary key — not supported