Hi folks I interested in Pinot for our real time process min Apache Pinot #general

Hi folks! I interested in Pinot for our real-time...

Mark Addleman

02/03/2022, 4:49 PM

Hi folks! I interested in Pinot for our real-time process mining product. We ingest clickstream-like events from applications and analyze them to produce process visualizations. The one challenge I see with Pinot is that our events do not have a fixed schema which, of course, means that our database solution must support schema evolution. We always accrete new columns, never remove them. I read through https://github.com/apache/pinot/issues/4225 and I can't make out the current state of the issue. What is the proper way to gracefully evolve a Pinot schema evolution in realtime tables?

➕ 2

Mayank

02/04/2022, 3:39 PM

Schema evolution is support for real-time tables cc: @User

Mayank

02/04/2022, 3:40 PM

Also side note, You can explore json data indexing if you don’t have a fixed schema

Mark Addleman

02/04/2022, 5:02 PM

Thanks! I saw the json data indexing and that definitely looks valuable. From my limited understanding, I don't think it completely removes the need for schema evolution.

Mayank

02/04/2022, 6:50 PM

Right, it is not supposed to be a replacement for schema evolution.

Jackie

02/04/2022, 9:49 PM

Realtime table schema evolution is supported, but the actual data can only be ingested after the current consuming segments are committed. For the current consuming segments and already sealed segments, default value will be set for the new columns

Mark Addleman

02/04/2022, 9:54 PM

Can the current consuming segment be force closed? Or, do we have to wait until it closes naturally?

Mayank

02/05/2022, 4:52 PM

It cannot be force closed, but you could restart server for it to restart consuming from previous checkipoint

Mark Addleman

02/05/2022, 4:53 PM

I see. I think this is the solution described in the github issue. The github issue indicates that it can take a long time for the server to reload the database after a restart.

Mayank

02/05/2022, 9:50 PM

It should be fast in general.

Mayank

02/05/2022, 9:51 PM

I do see quite a few PRs in the issue that were merged.

Mark Addleman

02/05/2022, 11:54 PM

Thanks. This gives me confidence to move forward with a POC.

Open in Slack

Previous Next