https://linen.dev logo
#ask-community-for-troubleshooting
Title
# ask-community-for-troubleshooting
i

Ignacio Aranguren Rojas

11/12/2021, 5:48 PM
Hey all 🙂 I just joined a company to help build the DataPlatform. We are planning to do the data ingestion soon into the DatWarehouse (Probably BigQUery or Snowflake). The challenge is that the data is stored in a SQL server with some tables having >1billion rows. Just the customer table have 21M rows (I know its a bit crazy that they did not have a DWH before). Airbyte seems like a really good option in comparison with other competitors that I used in the past (like Fivetran / Stitch). I was was wondering if any of you have any guidance on how to approach this special use case with >1TB of data to be ingested at first! Thanksss! 🙂
r

Rytis Zolubas

11/12/2021, 7:23 PM
I would suggest to dump the database or use backup files for an initial loading
👍 1
u

[DEPRECATED] Marcos Marx

11/15/2021, 1:34 PM
Using Airbyte, most users try to split the most large tables in individual connections to have a better control during the syncs and smaller tables in other connection.
i

Ignacio Aranguren Rojas

11/15/2021, 2:37 PM
@[DEPRECATED] Marcos Marx Thanks for the response! Is that something you can do within the Airbyte UI? In addition, what would you recommend to be the size of the VM hosting the airbyte deployment for such large tables?
u

[DEPRECATED] Marcos Marx

11/15/2021, 8:29 PM
it's not easy to answer the size. Airbyte use a default batch size, but for tables with large column records or a large number of columns you need to have a high-memory machine.
c

Chandini Nekkantti

11/19/2021, 11:22 AM
@[DEPRECATED] Marcos Marx - we have tried with even GCP E2 instance 941GB and couldn't sync a 300M records table from Snowflake in to BigQuery. We have been trying to do this for qutie a while and finally close to giving up on Airbyte, instead export to CSV and load manually. The frustrating part is you wait for 10-12 hours and find out that the sync fails. Does Airbyte do any benchmarking on the connectors ?
u

[DEPRECATED] Marcos Marx

11/20/2021, 1:59 AM
At the moment no, the team have plans to do benchmark in the next quarter
3 Views