Hi team, I want to ask is datahub maintain schema ...
# ingestion
c
Hi team, I want to ask is datahub maintain schema versioning?
g
Hey Anung đź‘‹ could you elaborate on your question? Metadata model schemas do not have versions. However, the datahub project is intentional about not publishing backward-incompatible changes to models. Once a model is created you should feel confident it won't have a backwards incompatible change published
m
@calm-lawyer-777 if your question was about the schema aspect of dataset, then yes; each update to the schema will result in a new versioned row being written to the metadata store(Eg MySQL). The UI doesn’t show this version history today.
c
Hi @green-football-43791 @mammoth-bear-12532 thank you for your kind reply. Yes it is about dataset schema, columns and datatypes. I have a use case to track changes to the metadata so users can have the history of changes.
great to know that in the backend database it is traceable
m
Were you thinking about showing this in the UI as well? Or just offering API for it.
c
It's great to have it incorporated in the UI
c
+1
i have a least one use case for this as well
Do I understand the documentation correctly that the versioning is fixed to v0, v1 etc.
I can't choose it myself?
g
Hey @colossal-furniture-76714 - the versioning system at our storage layer is fixed. However, we intend to add a semantic layer on top that partitions version by column additions or removals, and groups versions together that fall into the same partition. Would that address your use case? Or do you imagine wanting to group multiple schema changes that happened over time into a single version?
c
Hey @green-football-43791, thanks for your answer. This might address my use case. Could you elaborate on "[...] and groups versions together that fall into the same partition"? When do they fall into the same partition? How is a partition defined? Our problem is that we do not hold a schema registry for the table, the engineers can change the schema and data for their technology area on the device and the database digest the doc string which it receives...
Maybe the "ORIGIN" aspect is usable for the kind of versioning we are looking for though it was intended a bit different. Instead of "PROD" I imagine a development phase that sets the boundaries for the engineers as origin. E.g "phase 2018-q2"