Hi 👋
I'm a bit confused about Airbytes architecture with regard to change data capture. In the blog post Understanding Change Data Capture (CDC) you mention that "To support log-based CDC, Airbyte uses Debezium". But as far as I can understand it, Airbyte only runs on Temporal. Does the Debezium/ Kafka part only apply to CDC? The docs on CDC don't mention anything of it (neither do the docs on the postgres source connector).
Thanks for your help! 🙏
m
Marcos Marx (Airbyte)
06/20/2022, 4:59 PM
Yes you correct, Airbyte use Debezium SDK for CDK
🙏 1
t
Turar Sandybayev
11/22/2022, 7:50 PM
Following up on this, if I need to enable CDC replication from Postgres to S3 via open source Airbyte, do I also need to set up Kafka separately? I’m not finding that mentioned in the docs.
m
Marcos Marx (Airbyte)
11/22/2022, 8:25 PM
No, you need to configure Postgres CDC and select the CDC method as replication in the connector configuration
t
Turar Sandybayev
11/22/2022, 8:53 PM
Got it thank you! I think I also found a message here earlier that Airbyte doesn’t actually use Kafka directly for CDC, right? So if I were to install Airbyte on Kubernetes, the only separate parallel system I may need is Postgres db to hold Airbyte internal state, right? Nothing else beyond that.
j
Jannik Steinmann
12/02/2022, 9:53 AM
I learned that Airbyte uses the Debezium Engine instead of the standalone Debezium-with-Kafka setup. It essentially only uses the streaming replication/ logical decoding features of Debezium to get the change events from Postgres. The rest (persisting change events to Kafka and handling LSN management) which Debezium would usually offload to Kafka Connect, is now handled by Airbyte/ the Airbyte protocol.