Writers
ixtract writes extracted data through a BaseWriter interface. The writer is selected by output_format in your intent or --output-format in the CLI.
Parquet (default)
Section titled “Parquet (default)”intent = ExtractionIntent( ... output_format="parquet", output_dir="./output",)ixtract execute orders --output ./outputEach worker writes a separate .parquet file. Files are named {table}_{chunk_id:04d}.parquet. On extraction complete, a _manifest.json is written to the output directory.
Requirements: pyarrow
intent = ExtractionIntent( ... output_format="csv", output_dir="./output",)ixtract execute orders --output-format csv --output ./outputEach chunk writes a separate .csv file with headers. UTF-8 encoded, comma-delimited.
intent = ExtractionIntent( ... output_format="s3", output_dir="s3://my-bucket/prefix/",)ixtract execute orders --output-format s3 --output s3://my-bucket/prefix/Uses multipart upload. Parts are uploaded in chunks as data is produced; upload is finalized on chunk completion. On failure, the multipart upload is aborted and cleaned up.
Requirements: boto3
Authentication: Standard AWS credential chain (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / IAM role / ~/.aws/credentials)
URI format
Section titled “URI format”s3://bucket-name/optional/prefix/The table name and chunk ID are appended automatically.
intent = ExtractionIntent( ... output_format="gcs", output_dir="gs://my-bucket/prefix/",)ixtract execute orders --output-format gcs --output gs://my-bucket/prefix/Uses resumable upload. Handles network interruptions gracefully.
Requirements: google-cloud-storage
Authentication: Application Default Credentials (gcloud auth application-default login or service account key)
URI format
Section titled “URI format”gs://bucket-name/optional/prefix/Rotating writer
Section titled “Rotating writer”Wraps any writer and splits output into size-bounded segments.
from ixtract import WriterConfig
config = WriterConfig( output_format="parquet", output_dir="./output", max_file_size_bytes=500 * 1024 * 1024, # 500 MB per file)When a file exceeds max_file_size_bytes, the writer finalizes the current file and opens a new segment. Segment naming: {table}_{chunk_id:04d}_seg{n:02d}.parquet.
Manifest
Section titled “Manifest”After every successful extraction, ixtract writes _manifest.json to the output directory:
{ "run_id": "run_001", "table": "orders", "rows_extracted": 10000000, "duration_seconds": 11.7, "throughput_rows_sec": 856410, "files": [ "orders_0001.parquet", "orders_0002.parquet" ], "plan_fingerprint": "f6b8048a...", "completed_at": "2026-04-13T14:22:01Z"}Writer selection summary
Section titled “Writer selection summary”output_format | Destination | Extra dep |
|---|---|---|
parquet | Local filesystem | pyarrow |
csv | Local filesystem | none |
s3 | AWS S3 | boto3 |
gcs | Google Cloud Storage | google-cloud-storage |