Curious Are there any plans of introducing per doc TTL Apache Pinot #general

Join Slack

Curious: Are there any plans of introducing per do...

# general

Buchi Reddy

08/31/2020, 8:37 PM

Curious: Are there any plans of introducing per doc TTL?

Mayank

08/31/2020, 8:38 PM

Not at the moment. What's the use case?

Buchi Reddy

08/31/2020, 8:41 PM

Within a table, we have some events of interest (they have a special value in a column, let’s say) that we want to retain for much longer duration. Probably, 1% of the total data we need for long duration.

Buchi Reddy

08/31/2020, 8:42 PM

Obviously, using a separate table is another approach, so this isn’t a blocker but that forces us to introduce different additional jobs and workflows in the above layers. Per doc TTL is the best way to leverage the store. HBase, as an example, has this concept.

Mayank

08/31/2020, 8:43 PM

Perhaps minion job can help do that

Mayank

08/31/2020, 8:43 PM

Setup a job that purges records that are not of interest

Mayank

08/31/2020, 8:44 PM

You won't need a separate data pipeline in that case.

Mayank

08/31/2020, 8:45 PM

So we do have per doc TTL of sorts 🤔

Buchi Reddy

08/31/2020, 8:50 PM

So, the purger rewrites the segments after deleting the unwanted records?

Mayank

08/31/2020, 8:50 PM

Yep

Buchi Reddy

08/31/2020, 8:51 PM

Is that done in an atomic way so that the query results aren’t messed up?

Mayank

08/31/2020, 8:51 PM

Think of it as segment refresh

Mayank

08/31/2020, 8:53 PM

Purger will download segments, purge records and upload the new segments.

Buchi Reddy

08/31/2020, 8:55 PM

Yah. I’m trying to understand how the old segment is replaced with the new one? is it by switching the segment link in ZK or is there some segment replace API that’s used?

Mayank

08/31/2020, 8:56 PM

Upload api

Kishore G

08/31/2020, 9:30 PM

@Buchi Reddy segment name is unique identifier

Kishore G

08/31/2020, 9:31 PM

if you upload a segment with same name, it gets replaced

Mayank

08/31/2020, 9:32 PM

➕

Buchi Reddy

08/31/2020, 9:35 PM

Cool. thanks

Buchi Reddy

08/31/2020, 11:25 PM

So, how stable are the minions and the framework around that? Are people running them in prod? The documentation for Minion doesn’t include any starting commands.

Neha Pawar

08/31/2020, 11:34 PM

LinkedIn uses in prod. to purge records from segments based on GDPR requirements

Buchi Reddy

08/31/2020, 11:35 PM

Cool. Good to know that.

Open in Slack

Previous Next