Hi folks! I interested in Pinot for our real-time...
# general
m
Hi folks! I interested in Pinot for our real-time process mining product. We ingest clickstream-like events from applications and analyze them to produce process visualizations. The one challenge I see with Pinot is that our events do not have a fixed schema which, of course, means that our database solution must support schema evolution. We always accrete new columns, never remove them. I read through https://github.com/apache/pinot/issues/4225 and I can't make out the current state of the issue. What is the proper way to gracefully evolve a Pinot schema evolution in realtime tables?
2
m
Schema evolution is support for real-time tables cc: @User
Also side note, You can explore json data indexing if you don’t have a fixed schema
m
Thanks! I saw the json data indexing and that definitely looks valuable. From my limited understanding, I don't think it completely removes the need for schema evolution.
m
Right, it is not supposed to be a replacement for schema evolution.
j
Realtime table schema evolution is supported, but the actual data can only be ingested after the current consuming segments are committed. For the current consuming segments and already sealed segments, default value will be set for the new columns
m
Can the current consuming segment be force closed? Or, do we have to wait until it closes naturally?
m
It cannot be force closed, but you could restart server for it to restart consuming from previous checkipoint
m
I see. I think this is the solution described in the github issue. The github issue indicates that it can take a long time for the server to reload the database after a restart.
m
It should be fast in general.
I do see quite a few PRs in the issue that were merged.
m
Thanks. This gives me confidence to move forward with a POC.