Skip to content

Writers

ixtract writes extracted data through a BaseWriter interface. The writer is selected by output_format in your intent or --output-format in the CLI.


intent = ExtractionIntent(
...
output_format="parquet",
output_dir="./output",
)
Terminal window
ixtract execute orders --output ./output

Each worker writes a separate .parquet file. Files are named {table}_{chunk_id:04d}.parquet. On extraction complete, a _manifest.json is written to the output directory.

Requirements: pyarrow


intent = ExtractionIntent(
...
output_format="csv",
output_dir="./output",
)
Terminal window
ixtract execute orders --output-format csv --output ./output

Each chunk writes a separate .csv file with headers. UTF-8 encoded, comma-delimited.


intent = ExtractionIntent(
...
output_format="s3",
output_dir="s3://my-bucket/prefix/",
)
Terminal window
ixtract execute orders --output-format s3 --output s3://my-bucket/prefix/

Uses multipart upload. Parts are uploaded in chunks as data is produced; upload is finalized on chunk completion. On failure, the multipart upload is aborted and cleaned up.

Requirements: boto3
Authentication: Standard AWS credential chain (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / IAM role / ~/.aws/credentials)

s3://bucket-name/optional/prefix/

The table name and chunk ID are appended automatically.


intent = ExtractionIntent(
...
output_format="gcs",
output_dir="gs://my-bucket/prefix/",
)
Terminal window
ixtract execute orders --output-format gcs --output gs://my-bucket/prefix/

Uses resumable upload. Handles network interruptions gracefully.

Requirements: google-cloud-storage
Authentication: Application Default Credentials (gcloud auth application-default login or service account key)

gs://bucket-name/optional/prefix/

Wraps any writer and splits output into size-bounded segments.

from ixtract import WriterConfig
config = WriterConfig(
output_format="parquet",
output_dir="./output",
max_file_size_bytes=500 * 1024 * 1024, # 500 MB per file
)

When a file exceeds max_file_size_bytes, the writer finalizes the current file and opens a new segment. Segment naming: {table}_{chunk_id:04d}_seg{n:02d}.parquet.


After every successful extraction, ixtract writes _manifest.json to the output directory:

{
"run_id": "run_001",
"table": "orders",
"rows_extracted": 10000000,
"duration_seconds": 11.7,
"throughput_rows_sec": 856410,
"files": [
"orders_0001.parquet",
"orders_0002.parquet"
],
"plan_fingerprint": "f6b8048a...",
"completed_at": "2026-04-13T14:22:01Z"
}

output_formatDestinationExtra dep
parquetLocal filesystempyarrow
csvLocal filesystemnone
s3AWS S3boto3
gcsGoogle Cloud Storagegoogle-cloud-storage