Related, but not an identical question, is it poss...
# random
h
Related, but not an identical question, is it possible to create a table from an external system, without specifying the schema at all? Use case: I have a kafka topic with CDC data I’d like to store in my archive, and the underlying tables might change without Flink knowing about it (there are databases not in our control where we do have access to a CDC stream). I’d like the archive to contain all data, including of columns that were not there when the Flink job was first started. Imagine the flow DB -> Kafka -> Flink -> Kafka -> Flink -> … We’re currently using kafka connect to take a kafka topic and dump it into our archive, but this only really works for the first layer. @Anthony Daegele (Anthony) 1s soon as Flink is involved I would lose data (columns) until we adapt the schema and backfill the data. Is there a way around this?
a
For our CDC source, we use a schema registry for each of our topics that prohibits any breaking changes to the schema. If our data source wants to make a breaking change, it needs to be coordinated with the pipeline so that we can update our schemas accordingly, otherwise they will simply fail to write data to our Kafka topics. That said, for our use case, we are probably in a similar boat to you where we would need to adapt our schemas in order to process data after the change.
Even though it’s a bit of effort, if we do require a breaking schema change, we have a process for standing up a new version of our pipeline and deploying in a blue/green fashion. Once the newer version has been validated, we decommission the old version and cut our consumers over to the new version.
h
What do you define as a breaking change? Removing a non-null column? Or adding an optional column?
a
The change needs to be backwards compatible in our platform. So yes, removing a non-null column would be breaking. Adding an optional column should be fine, provided you specify a default value.
h
Yeah, same here, except it does somewhat break the application - since it is no longer fully aware of all upstream data…
Btw sorry for the tag, no idea how that got in there….
a
Oh no worries!