in Pinot, how do you deal with schema evolution? L...
# general
a
in Pinot, how do you deal with schema evolution? Let’s say new field or metric is added to the message, is there a way to alter the table on the fly?
For realtime tables, the reload API will skip the consuming segment and may cause partial data to be dropped at query time due to schema inconsistency. So a safer approach would be to restart the servers, which will cause temporary data staleness as servers reconsume from the previous offset.
There are ongoing efforts to address the issue with realtime tables in #CNBLA2M6Y that @Bo Zhang mentioned.
a
> > For offline tables, you can call the reload API: means that segments are recreated with additional columns beforehand?
h
Correct. They will be replaced with the additional columns added and default values assigned
k
Internally it’s not recreating the entire segment. It just create the new column and updates the metadata file. It’s a lightweight process < 1s
a
oh, nice. So, publish updated schema + segments reload
k
Yes
We should probably provide a updateschema command which does this automatically
It can also do some validation as needed
h
Good point. Now we use the same API for schema creation and update
we can add some compatibility check and hook it with the reload api
👍 1
c
@Haibo Wang lets add that to the issue (updateschema command suggestion by Kishore)
h
will do