Hello anyone know how to get flink cdc or Flink SQL to propa Apache Flink #troubleshooting

Hello, anyone know how to get flink-cdc or Flink S...

Abrar Sher

09/06/2024, 8:23 PM

Hello, anyone know how to get flink-cdc or Flink SQL to propagate schema changes from mysql -> iceberg? I'm using mysql-cdc connector at the source, and iceberg connector at the sink. It's working great, except i can't figure out how to get schema changes propagated.

Abrar Sher

09/06/2024, 8:24 PM

I actually followed the docs here to get it setup, but this isn't propagating schema changes: https://nightlies.apache.org/flink/flink-cdc-docs-master/docs/connectors/flink-sources/tutorials/build-real-time-data-lake-tutorial/

Abrar Sher

09/06/2024, 8:28 PM

@Yuxia Not sure if you're the same person, but I noticed you actually wrote the original docs for this on Alibaba. https://www.alibabacloud.com/blog/flink-cdc-series-part-3-synchronize-mysql-database-and-table-shard-to-build-an-iceberg-real-time-database_598970

Giannis Polyzos

09/07/2024, 3:47 PM

Schema changes can't be propagated via sql, because the tables are static.. this requires a datastream job wrapper to be developed. An Iceberg sink is not supported in Flink CDC atm and the following should be the supported ones (i believe the iceberg sink itself doesn't support dynamic schema changes though)

gratitude thank you 1

Abrar Sher

09/09/2024, 12:53 PM

Thank you @Giannis Polyzos, really appreciate the response here. I'll look into developing a datastream job wrapper. We're exploring multiple approaches using Flink for CDC replay from MySQL->Iceberg, or MySQL -> Kafka -> Iceberg (I'm an engineer here at Affirm). Flink-cdc with the SQL is so clean so I was really hoping to get it working.

Abrar Sher

09/09/2024, 1:00 PM

We can grab the MySQL schema in the beginning of run-time. I wonder if there's a way to build the SQL "dynamically" right in the beginning, and force a "restart/fail" every time we see a schema change, since it's rare. But it feels like an ugly hack to me. 1. build flink sql dynamically with the schema 2. read from mysql with mysql-cdc connector 3. have a filter step before inserting into iceberg a. if ddl -> fail/restart the job back to step 1 b. if data change event -> move forward to step 4 4. insert into iceberg

Abrar Sher

09/09/2024, 1:02 PM

Anyways, thanks for the confirmation here! I know you guys at Ververica are the experts here, so if you have any other advice on doing the MySQL -> Iceberg CDC replay, would be really appreciated.

2 Views

Open in Slack

Previous Next