Hi Everyone, i've facing one issue regarding segme...
# getting-started
m
Hi Everyone, i've facing one issue regarding segments retention, as i've used RealtimeTable.json example from pinot 0.10.0 docs "retentionTimeUnit": "DAYS", "retentionTimeValue": "5", but after 5 days segments still there in controller data path also in server data path. did i miss something in my table json.
m
Retention is a periodic job, and happens every few hours in the background. Also a segment is deleted only when all records are older than retention. And as a good practice you should always have explicit time filter and not rely on retention to serve data for a particular period.
m
Hi, segment is deleted when all records older than retention. then how retention work on realtime tables, as new data came on table continuously.
m
Segments are periodically sealed and committed in the background in real-time as well. Sealed segments are deleted as and when they become old
m
Hi, thankyou for reply , can you share an example table json if its possible, where segments get deleted from local/hdfs after retention time.
m
Any sample config in the docs page will have retention time and unit
m
Hi, for simplify our discussion, i've attached my table json. where in table ive almost 30million records here i facing issue of retention, as i understands retention means while query over table, i am expecting results with in retention time cause before that data will be deleted. is thier any gap in my understanding.
m
The retention in your config is 2 days. So if your data that you pushed has time stamp values (for the time column you specified) older than 2 days, the data will be deleted the next time the retention task is kicked in
m
can we trigger retention task manually? As my data not getting deleted.
m
If it is not getting deleted that means there are records in segment that are not old
Can you query min and max time
m
sure , i'll check and let you know!
Hi, i got your point, but when did segments files will be deleted from physical layer (controller path and server index path)
Also ive one thing notice, thier is no tar/zip files in server/segmentTar folder, although my table contains now more then 40million records. By when tar/zip files were created
m
Tar zip is on deepstore, not on server. Segments are periodically flushed to disk based on whatever setting you specified in table config (flush threshold for rows)
m
Here ive set "realtime.segment.flush.threshold.rows": "0", that is why its not getting deleted from disk?
m
There are a few settings, I think you probably set either time or size
m
I am following this doc https://docs.pinot.apache.org/configuration-reference/table, unable to find any setting, except realtime.segment.flush.threshold.time
And ive set flush.thresold.rows to '0', as mentioned in same doc "Desired size of the completed segments. This value can be set as a human readable string such as
150M
, or
1.1G
, etc. This value is used when
realtime.segment.flush.threshold.rows
is set to 0. Default is
200M
i.e. 200 MegaBytes"
m
Yeah so when segment is around that size it will be flushed. You probably also have a time threshold
m
it means let say i've set 50M segments (which may contains 1million records) and set realtime.segment.flush.threshold.time to 1d so once segment 50M reached new segment created, now for previous segment will be on disk till 1d then its get deleted. Please correct me if i am wrong!
m
In you setting above 1d is threshold to flush in memory to disk. That is different from retention. If you want to delete after x days, then set the retention value and time to be that, those are different configs
👍 1
m
Thankyou @Mayank for clearing out my questions, As im newbee on this so sometimes ask some silly doubts 😄
👍 1
m
Now worries