Related but not an identical question is it possible to crea Apache Flink #random

Related, but not an identical question, is it poss...

Huib

10/16/2023, 3:07 PM

Related, but not an identical question, is it possible to create a table from an external system, without specifying the schema at all? Use case: I have a kafka topic with CDC data I’d like to store in my archive, and the underlying tables might change without Flink knowing about it (there are databases not in our control where we do have access to a CDC stream). I’d like the archive to contain all data, including of columns that were not there when the Flink job was first started. Imagine the flow DB -> Kafka -> Flink -> Kafka -> Flink -> … We’re currently using kafka connect to take a kafka topic and dump it into our archive, but this only really works for the first layer. @Anthony Daegele (Anthony) 1s soon as Flink is involved I would lose data (columns) until we adapt the schema and backfill the data. Is there a way around this?

Anthony Daegele (Anthony)

10/16/2023, 3:24 PM

For our CDC source, we use a schema registry for each of our topics that prohibits any breaking changes to the schema. If our data source wants to make a breaking change, it needs to be coordinated with the pipeline so that we can update our schemas accordingly, otherwise they will simply fail to write data to our Kafka topics. That said, for our use case, we are probably in a similar boat to you where we would need to adapt our schemas in order to process data after the change.

Anthony Daegele (Anthony)

10/16/2023, 3:25 PM

Even though it’s a bit of effort, if we do require a breaking schema change, we have a process for standing up a new version of our pipeline and deploying in a blue/green fashion. Once the newer version has been validated, we decommission the old version and cut our consumers over to the new version.

Huib

10/16/2023, 3:31 PM

What do you define as a breaking change? Removing a non-null column? Or adding an optional column?

Anthony Daegele (Anthony)

10/16/2023, 3:35 PM

The change needs to be backwards compatible in our platform. So yes, removing a non-null column would be breaking. Adding an optional column should be fine, provided you specify a default value.

Huib

10/16/2023, 3:45 PM

Yeah, same here, except it does somewhat break the application - since it is no longer fully aware of all upstream data…

Huib

10/16/2023, 4:47 PM

Btw sorry for the tag, no idea how that got in there….

Anthony Daegele (Anthony)

10/16/2023, 4:55 PM

Oh no worries!

Open in Slack

Previous Next