Hello, i’ve been trying for the last two month to ...
# give-feedback
o
Hello, i’ve been trying for the last two month to create a PostgreSQL to PostgreSQL CDC connection, but the initial sync (or rather the Incremental Append + Dedup) phase of the connection is too expensive when we’re talking about hundreds of millions of rows in multiple tables. The lack of parallel initial sync + the tax of having the jsonb raw storage + INSERT INTO with json transformations in one big transaction (with no partial commit to ease the load on the target database) ends up making the link a neverending initial sync. Is there any plans to address this ? or should we consider Airbyte’s architecture not suitable for large datasets ( >100GB) ?
e
In the docs, we talk about how postgres is not a great data warehouse, and doesn't do well with bulk operations on large amounts of data. Airbyte focuses on moving data out of postgres to other destinations (warehouses, lakes, files, etc) That said, this discussion is for thinking about chopping up a large load into multiple transactions... loosing some of the sync fidelity, but probably being more performant. Chime in on the Github discussion with your thoughts!
👀 1