Hi team, has anyone built out a connection with Bi...
# advice-data-ingestion
t
Hi team, has anyone built out a connection with Bigquery as the source and Snowflake as the destination? I have a use case where I need to replicate an initial 2 TB of data from Bigquery to Snowflake and then set up continuous ingestion of around 5GB daily. I did an initial test yesterday with syncing over around 9 GB of data but my Airbyte server crashed on an EC2 t2.large instance deployed via Docker. I did some research and believe it is related to this issue (https://github.com/airbytehq/airbyte/issues/6533#issuecomment-1188663632). Before digging in further I wanted to reach out to the community and see if anyone else has encountered this issue. Happy Friday!
Incase anybody runs into the same issue, I figured out that the sync was crashing because when using “Internal Stage” on the Snowflake destination connector, the Bigquery source connector opens one buffer and tries to read all the data from the Bigquery stream using only this buffer (i.e. it doesn’t close it and open a new one when a partition file is written to disk). This causes some sort of connection timeout from the GCP API and results in a hanging sync. I had one sync run for 10 hours with this being printed in the logs.
Copy code
INFO: I/O exception (java.io.IOException) caught when processing request to {s}-><https://gcpuscentral1-zb4s5wg-stage.storage.googleapis.com:443>: Stream Closed
I overcame the issue by using S3 staging on the Snowflake destination and now I see in the logs that a new buffer is created to read in data after every data file write to S3.