Anton Podviaznikov05/24/2022, 1:26 PM
on k8s and trying to sync one table from PG to Snowflake. Table has 32 mln records. It takes airbyte anywhere from 2h30m to 3h30 min to do initial sync on this table. Pipelinewise takes 37min. I'm not sure how to get the same numbers. Another thing that confuses me that after sync is done I see that both tables in snowflake have 32 mln records. But the size of the table created by pipelinewise is 2.6GB and the one created by airbyte is 5GB (and on top of that why does airbye UI shows that 49.25 GB worth of data were processed - those numbers don't match). Why is that? Any ideas.
Augustin Lafanechere (Airbyte)05/24/2022, 5:52 PM
Davin Chia (Airbyte)05/25/2022, 9:19 AM
Liren Tu (Airbyte)05/25/2022, 9:11 PM
tables, those prefixed with
. And if normalization is enabled, the connector will trigger
to normalizes those raw tables to the final normalized tabled. Hence the physical size is roughly 2x.
on the UI is the size of the serialized data in JSON format. It is usually an overestimation of the actual data. For example, the actual data may be a number
, the serialized JSON looks like
, and its size is significantly larger.
Anton Podviaznikov05/26/2022, 5:57 PM