why airbyte uses debezium as cdc reader in mongodb...
# ask-ai
h
why airbyte uses debezium as cdc reader in mongodb source.
u
Airbyte uses Debezium as a CDC reader in MongoDB source because Debezium is an open-source framework for Change Data Capture. It can scan the MongoDB change streams in near real-time and stream every row-level committed operation such as insert, update, and delete, maintaining the sequence in which the operations were carried out. This makes Debezium a reliable tool for capturing changes in the database and keeping the data up to date. In the case of MongoDB, Airbyte uses the Debezium core engine to connect to the database and parse the changes. This allows Airbyte to pass the changes onto the destination in a format that can be easily consumed. Moreover, using Debezium allows Airbyte to hide the complexity of setting up and managing CDC from the user, making it easier to use and manage. For more details, you can refer to these sources: - [MySQL CDC: Build an ELT pipeline from MySQL Database](https://airbyte.com/tutorials/mysql-change-data-capture-cdc) - [MongoDB source CDC support](https://github.com/airbytehq/airbyte/discussions/34214) - [Mongo DB Replication Methods](https://docs.airbyte.com/integrations/sources/mongodb-v2#replication-methods)
h
@kapa.ai is there any specific cases for which airbyte using debezium as handling cdc logs using change stream strategy of mongo db is easy to use and manage.
u
Yes, Airbyte uses Debezium for handling CDC logs in specific cases. For instance, when dealing with MySQL and PostgreSQL databases, Airbyte leverages Debezium to implement Change Data Capture (CDC). In the case of MySQL, Debezium scans the MySQL binary log (binlog) in near real-time and streams every row-level committed operation such as insert, update, and delete, maintaining the sequence in which the operations were carried out. Airbyte uses the Debezium core engine and Debezium MySQL connector, which connect to the database and parse the binlog in JSON format. Then, Airbyte passes the JSON records onto the destination. This process hides the complexity from the user, making it easy to use and manage [source](https://airbyte.com/tutorials/mysql-change-data-capture-cdc). For PostgreSQL, Airbyte uses Debezium to read from the PostgreSQL Write-Ahead Log (WAL) using the pgoutput plugin. This process allows Airbyte to capture changes from a PostgreSQL source and replicate them to various destinations, such as data warehouses or other databases. Again, this makes the CDC process easier to set up and manage compared to building a custom CDC solution from scratch [source](https://airbyte.com/tutorials/incremental-change-data-capture-cdc-replication). As for MongoDB, Airbyte's MongoDB connector utilizes MongoDB's change streams and Replica Set Oplog for Change Data Capture (CDC). This allows for reliable replication of any collection size with checkpointing and chunking of data reads [source](https://docs.airbyte.com/integrations/sources/mongodb-v2#mongo-db). However, the knowledge sources do not explicitly mention the use of Debezium for MongoDB's CDC implementation.