Hello again Not a Pinot only question but I m sure most of y Apache Pinot #troubleshooting

Hello again Not a Pinot only question, but I'm sur...

Jonathan Meyer

06/04/2021, 4:38 PM

Hello again Not a Pinot only question, but I'm sure most of you had to deal with this issue so here I go Given a limited Kafka retention, how do you handle recreating a table with past data that is no longer available in Kafka ? Basically, what is the "workflow" that you use to repopulate a Pinot table from past data ?

Mayank

06/04/2021, 4:39 PM

Typically, folks ETL the kafka data into a sot store like HDFS. You can then backfill via an offline pipeline. However, this pattern is applicable to hybrid tables

Mayank

06/04/2021, 4:40 PM

For realtime-only tables, you are going to be limited by kafka retentnion

Jonathan Meyer

06/04/2021, 4:41 PM

For realtime-only tables, you are going to be limited by kafka retentnion

Probably not a smart solution, but dumping past data (from some object store) into Kafka into the REALTIME table could be a solution too I guess ? Otherwise, does using an hybrid table add a lot of complexity / limitations ?

Subbu Subramaniam

06/04/2021, 5:10 PM

@Jonathan Meyer you can also use https://docs.pinot.apache.org/operators/operating-pinot/pinot-managed-offline-flows

Jonathan Meyer

06/04/2021, 5:39 PM

Thanks @Subbu Subramaniam I'll give it a good read 🙂

Ken Krugler

06/04/2021, 10:46 PM

Thanks @Subbu Subramaniam - I didn’t know about this built-in support for auto-offlining old data.

🤑 1

Jonathan Meyer

06/08/2021, 1:49 PM

Hello again @Subbu Subramaniam ^^ Is this solution "enough" in the face of backward incompatible table changes ? Typically where end up having to drop the whole table and "recreate" it (from something <- This is my question 😄)

Mayank

06/08/2021, 1:59 PM

Schema evolution does not support Backward incompatible changes, and that does require rebootstrapping the table

➕ 1

Jonathan Meyer

06/08/2021, 2:18 PM

Yes @Mayank That's why I'm wondering if https://docs.pinot.apache.org/operators/operating-pinot/pinot-managed-offline-flows "solves" that reboostrapping step (but I think it does not ?)

Mayank

06/08/2021, 2:34 PM

It does not. The managed flow helps with avoiding to setup an offline pipeline, but not with backward incompatible schema evolution

Jonathan Meyer

06/08/2021, 3:13 PM

I see, thanks @Mayank 🙂

Open in Slack

Previous Next