Hello again Not a Pinot only question, but I'm sur...
# troubleshooting
j
Hello again Not a Pinot only question, but I'm sure most of you had to deal with this issue so here I go Given a limited Kafka retention, how do you handle recreating a table with past data that is no longer available in Kafka ? Basically, what is the "workflow" that you use to repopulate a Pinot table from past data ?
m
Typically, folks ETL the kafka data into a sot store like HDFS. You can then backfill via an offline pipeline. However, this pattern is applicable to hybrid tables
For realtime-only tables, you are going to be limited by kafka retentnion
j
For realtime-only tables, you are going to be limited by kafka retentnion
Probably not a smart solution, but dumping past data (from some object store) into Kafka into the REALTIME table could be a solution too I guess ? Otherwise, does using an hybrid table add a lot of complexity / limitations ?
s
j
Thanks @Subbu Subramaniam I'll give it a good read 🙂
k
Thanks @Subbu Subramaniam - I didn’t know about this built-in support for auto-offlining old data.
🤑 1
j
Hello again @Subbu Subramaniam ^^ Is this solution "enough" in the face of backward incompatible table changes ? Typically where end up having to drop the whole table and "recreate" it (from something <- This is my question 😄)
m
Schema evolution does not support Backward incompatible changes, and that does require rebootstrapping the table
âž• 1
j
Yes @Mayank That's why I'm wondering if https://docs.pinot.apache.org/operators/operating-pinot/pinot-managed-offline-flows "solves" that reboostrapping step (but I think it does not ?)
m
It does not. The managed flow helps with avoiding to setup an offline pipeline, but not with backward incompatible schema evolution
j
I see, thanks @Mayank 🙂