Hello, anyone know how to get flink-cdc or Flink S...
# troubleshooting
a
Hello, anyone know how to get flink-cdc or Flink SQL to propagate schema changes from mysql -> iceberg? I'm using mysql-cdc connector at the source, and iceberg connector at the sink. It's working great, except i can't figure out how to get schema changes propagated.
I actually followed the docs here to get it setup, but this isn't propagating schema changes: https://nightlies.apache.org/flink/flink-cdc-docs-master/docs/connectors/flink-sources/tutorials/build-real-time-data-lake-tutorial/
@Yuxia Not sure if you're the same person, but I noticed you actually wrote the original docs for this on Alibaba. https://www.alibabacloud.com/blog/flink-cdc-series-part-3-synchronize-mysql-database-and-table-shard-to-build-an-iceberg-real-time-database_598970
g
Schema changes can't be propagated via sql, because the tables are static.. this requires a datastream job wrapper to be developed. An Iceberg sink is not supported in Flink CDC atm and the following should be the supported ones (i believe the iceberg sink itself doesn't support dynamic schema changes though)
gratitude thank you 1
a
Thank you @Giannis Polyzos, really appreciate the response here. I'll look into developing a datastream job wrapper. We're exploring multiple approaches using Flink for CDC replay from MySQL->Iceberg, or MySQL -> Kafka -> Iceberg (I'm an engineer here at Affirm). Flink-cdc with the SQL is so clean so I was really hoping to get it working.
We can grab the MySQL schema in the beginning of run-time. I wonder if there's a way to build the SQL "dynamically" right in the beginning, and force a "restart/fail" every time we see a schema change, since it's rare. But it feels like an ugly hack to me. 1. build flink sql dynamically with the schema 2. read from mysql with mysql-cdc connector 3. have a filter step before inserting into iceberg a. if ddl -> fail/restart the job back to step 1 b. if data change event -> move forward to step 4 4. insert into iceberg
Anyways, thanks for the confirmation here! I know you guys at Ververica are the experts here, so if you have any other advice on doing the MySQL -> Iceberg CDC replay, would be really appreciated.