Hi team, I’ve a question about RealtimeToOfflineSe...
# troubleshooting
a
Hi team, I’ve a question about RealtimeToOfflineSegmentsTask. If I configure this task like this: If I’m not misunderstanding the properties, it means every time this task is executed, 1 hour data older than 24h will be removed from realtime table to offline table. If it’s the case, when I keep bucketTimePeriod the same, and change the task to be executed every 2 hours, will there be longer and longer time data in the realtime table not be moved to offline table? If stream data constantly comes in this realtime table.
m
Bucket time is min-max time for segment. Buffer time is look-back window to search for records in bucket interval
a
So no matter how often the task is executed, only 24h data keeps in realtime table?
a
message has been deleted
m
ah you already found it
but the data isn't actually removed from the real-time table as part of this job. It's only written to the offline table. There is then a separate retention job that deletes data from the real-time table
a
I think I’m still not very clear what does it exactly mean by “at a time” or “for each run”.
m
you can schedule it to run (i.e.
schedule
) but you can also run it manually if you want
so every time the task runs is 'a run'
a
If so, when bucketTimePeriod is set 1h, and task runs every 2 hours, more and more data will be kept in realtime table.
m
AFAIK, this task doesn't delete data from the real-time table
it only creates it in the offline taable
but @Mayank can confirm on that
a
I think I see what you mean. Data older than 25 hours will be deleted by retention task. “retentionTimeUnit”: “HOURS”, “retentionTimeValue”: “25"
m
yeh. But it is a good question - should the scheduling frequency and bucket time be the same
👌 1
l
i can confirm that yea it moves but it doesn’t delete from RT
as you have pointed out the retention setup is what will remove it from RT