Hello, anyone know how to get flink-cdc or Flink SQL to propagate schema changes from mysql -> iceberg?
I'm using mysql-cdc connector at the source, and iceberg connector at the sink.
It's working great, except i can't figure out how to get schema changes propagated.
Schema changes can't be propagated via sql, because the tables are static.. this requires a datastream job wrapper to be developed.
An Iceberg sink is not supported in Flink CDC atm and the following should be the supported ones
(i believe the iceberg sink itself doesn't support dynamic schema changes though)
gratitude thank you 1
a
Abrar Sher
09/09/2024, 12:53 PM
Thank you @Giannis Polyzos, really appreciate the response here.
I'll look into developing a datastream job wrapper. We're exploring multiple approaches using Flink for CDC replay from MySQL->Iceberg, or MySQL -> Kafka -> Iceberg (I'm an engineer here at Affirm). Flink-cdc with the SQL is so clean so I was really hoping to get it working.
Abrar Sher
09/09/2024, 1:00 PM
We can grab the MySQL schema in the beginning of run-time.
I wonder if there's a way to build the SQL "dynamically" right in the beginning, and force a "restart/fail" every time we see a schema change, since it's rare. But it feels like an ugly hack to me.
1. build flink sql dynamically with the schema
2. read from mysql with mysql-cdc connector
3. have a filter step before inserting into iceberg
a. if ddl -> fail/restart the job back to step 1
b. if data change event -> move forward to step 4
4. insert into iceberg
Abrar Sher
09/09/2024, 1:02 PM
Anyways, thanks for the confirmation here!
I know you guys at Ververica are the experts here, so if you have any other advice on doing the MySQL -> Iceberg CDC replay, would be really appreciated.