Anton Podviaznikov
05/24/2022, 1:26 PM0.38.4-alpha
on k8s and trying to sync one table from PG to Snowflake.
Table has 32 mln records.
It takes airbyte anywhere from 2h30m to 3h30 min to do initial sync on this table.
Pipelinewise takes 37min.
I'm not sure how to get the same numbers.
Another thing that confuses me that after sync is done I see that both tables in snowflake have 32 mln records.
But the size of the table created by pipelinewise is 2.6GB and the one created by airbyte is 5GB (and on top of that why does airbye UI shows that 49.25 GB worth of data were processed - those numbers don't match).
Why is that? Any ideas.Augustin Lafanechere (Airbyte)
05/24/2022, 5:52 PMDavin Chia (Airbyte)
05/25/2022, 9:19 AMLiren Tu (Airbyte)
05/25/2022, 9:11 PMraw
tables, those prefixed with _raw
. And if normalization is enabled, the connector will trigger dbt
to normalizes those raw tables to the final normalized tabled. Hence the physical size is roughly 2x.49.25GB
on the UI is the size of the serialized data in JSON format. It is usually an overestimation of the actual data.
For example, the actual data may be a number 1234
from column value
, the serialized JSON looks like {"value":1234}
, and its size is significantly larger.Anton Podviaznikov
05/26/2022, 5:57 PM