Curious: Are there any plans of introducing per do...
# general
b
Curious: Are there any plans of introducing per doc TTL?
m
Not at the moment. What's the use case?
b
Within a table, we have some events of interest (they have a special value in a column, let’s say) that we want to retain for much longer duration. Probably, 1% of the total data we need for long duration.
Obviously, using a separate table is another approach, so this isn’t a blocker but that forces us to introduce different additional jobs and workflows in the above layers. Per doc TTL is the best way to leverage the store. HBase, as an example, has this concept.
m
Perhaps minion job can help do that
Setup a job that purges records that are not of interest
You won't need a separate data pipeline in that case.
So we do have per doc TTL of sorts 🤔
b
So, the purger rewrites the segments after deleting the unwanted records?
m
Yep
b
Is that done in an atomic way so that the query results aren’t messed up?
m
Think of it as segment refresh
Purger will download segments, purge records and upload the new segments.
b
Yah. I’m trying to understand how the old segment is replaced with the new one? is it by switching the segment link in ZK or is there some segment replace API that’s used?
m
Upload api
k
@Buchi Reddy segment name is unique identifier
if you upload a segment with same name, it gets replaced
m
b
Cool. thanks
So, how stable are the minions and the framework around that? Are people running them in prod? The documentation for Minion doesn’t include any starting commands.
n
LinkedIn uses in prod. to purge records from segments based on GDPR requirements
b
Cool. Good to know that.