https://linen.dev logo
#ask-community-for-troubleshooting
Title
# ask-community-for-troubleshooting
m

Mahdir Ishmam

01/26/2022, 2:24 AM
hey folks, I'm evaluating Airbyte vs. Fivetran for MySQL -> Redshift CDC data replication. Have a few questions about how this works: 1. How does the Redshift connector handle updates and deletes from the MySQL source data on the Redshift side? 2. How fast can it run? What is the latency between MySQL to Redshift at the maximum possible replication speed Airbyte currently allows?
👀 1
a

Augustin Lafanechere (Airbyte)

01/26/2022, 2:49 PM
1. We are using Debezium for CDC ingestion, quoting their doc:
Debezium records in a transaction log all row-level changes committed to each database table. Each application simply reads the transaction logs they’re interested in, and they see all of the events in the same order in which they occurred.
On each job run Airbyte reads data from the transaction log up to the time that the sync was started. You'll find more details here. 2. The bottleneck here in terms of latency is bound to the scheduling time as this is not a real-time extraction and load to redshift but works as the other Airbyte replication, in batch with a minimum interval of 5mn. Let me know if it helped @Mahdir Ishmam.
m

Mahdir Ishmam

01/26/2022, 7:55 PM
Thanks @[DEPRECATED] Augustin Lafanechere that's really helpful. I've used Debezium in the past. Does it parse the binlog updates from the Debezium feed and perform updates/deletes as needed on Redshift tables using recommended best practices? Or is it just loading the binlog feed directly into Redshift and you need to parse it on Redshift side to get the current state of a table?
m

Marcos Marx (Airbyte)

01/26/2022, 9:09 PM
Airbyte have a parser to handle the binlog and generate the output table in Redshift
m

Mahdir Ishmam

01/26/2022, 10:11 PM
so how does Airbyte handle updated/deleted data? Let's say I'm replicating Table x from MySQL into Redshift and the pipeline is set to run every 5 min. After the first initial load, how does Airbyte ensure MySQL data is replicated in Redshift? like is it deleting rows from Redshift tables after consuming the CDC updates? If so, how is it deleting? Like a DELETE statement or DELETE USING? Same question for rows that were updated
m

Marcos Marx (Airbyte)

01/27/2022, 12:45 AM
The CDC will transfer the data with the state like: DELETE, UPDATE, INSERT with this information Airbyte will build the final table with the correct data
m

Mahdir Ishmam

01/27/2022, 1:58 AM
oh I see. That sounds perfect. thanks a lot for the clarifications @Marcos Marx (Airbyte) and rest of the Airbyte team. I'm gonna give this a try
Taking a deeper look I think I can't use it because of this issue - Already some columns are bigger than 65535 characters. I see that the PR addressing this is approved
4 Views