https://linen.dev logo
#troubleshooting
Title
# troubleshooting
e

Emily Cogsdill

03/22/2022, 8:22 PM
• Airbyte version: 0.35.27-alpha • OS Version / Instance: Ubuntu VM / GCP n1-standard-2 (2 vCPUs, 7.5 GB memory) • Deployment: Docker • Source Connector and version: S3 0.1.10 • Destination Connector and version: BigQuery 0.6.7 • Severity: Medium • Step where error happened: Loading data from source Hi team! I am trying to run a sync from an S3 source to BigQuery. The data actually seem to be getting delivered successfully to the destination, but I am still seeing an error in the logs, and the app is marking the sync as “Failed” and always retries twice before finally cancelling. The error I am seeing is:
Copy code
pyarrow.lib.ArrowInvalid: CSV parser got out of sync with chunker
Does anyone have thoughts on what might be going on here & how to troubleshoot? I am running several other similar S3->BigQuery syncs from the same bucket (but with different file paths) without issue - only this one is throwing an error.
Here are the logs:
m

Marcos Marx (Airbyte)

03/22/2022, 9:52 PM
If you create a sample csv with fewer data points the sync works? Also from Apache discussion could you try to tweak some reading configuration?
e

Emily Cogsdill

03/22/2022, 11:18 PM
I’ll have to try using a smaller CSV (tomorrow) - can you explain a little more about what configurations I might try and tweak? The thread doesn’t mention anything I might play around with in the S3 source connector 🤔
quick update: increasing the block size seems to have resolved the error! had to bump from 10K to 1M 👀 💦 but we seem to be in business now. Thanks @[DEPRECATED] Marcos Marx!
m

Marcos Marx (Airbyte)

03/24/2022, 1:12 AM
8 Views