Another question on managed offline flows. We are ...
# troubleshooting
s
Another question on managed offline flows. We are getting the exact results we want in our OFFLINE table, data nicely rolled up. What we're seeing though is the REALTIME segment is not getting destroyed. The logs say: Trying to destroy segment : immutable_events__0__0__20220620T1729Z, and there's no indication that anything failed that I can find, but the REALTIME segment is still there.
n
realtime table’s segments don’t get deleted based on when they’ve moved to the offline table. They will follow their own retentionTimeValue set in realtime table
you’re seeing them stick around even after the retention in the realtime table has passed?
s
Well maybe I'm confused about the realtime to offline flows then b/c my assumption that it would rollup the data into the offline table and then delete the realtime data. Which the logs seem to be saying it's trying to do.
Like this makes perfect sense to me: Trying to destroy segment : immutable_events__0__0__20220620T1729Z
I would expect that once the data is rolled up and in the offline table that it would then destroy the realtime segments. But those segments are not getting destroyed
and I see no errors in the logs (I'm looking at the minion logs)
n
it would rollup the data into the offline table and then delete the realtime data
- this doesn’t happen. what’s the retention set in your realtime table?
s
"retentionTimeUnit": "DAYS", "retentionTimeValue": "7",
bot the realtime and offline table configs for my table have the same retention days at 7
n
and is the startTime and endTime in the segment zk metadata for`immutable_events__0__0__20220620T1729Z` past 7 days ago?
s
No it's not older than 7 days ago
n
s
ok, I'm kind of understanding then. If we want a realtime table say w/ 2d of data, we retain that, and we configure the offline flows job to move anything older than 2d to offline and rollup?
n
yup. typically, R2O has bucket “1d” (or “1h” if hourly needed), buffer “1d”, and then you can set retention for realtime table “3d” (some more after the buffer, in case there’s some issues catching up, start with even more generous number until you’re sure this is working out)
s
ok thanks I'll try that and see!
What about overlaps though? During that period where the data is in the realtime and offline table? It did seem pinot was smart enough to query the realtime segment when querying the table
ok so it's now coming together for me in my brain. The R2O process is nothing more than an offline segment builder (rollups or dedup) and orders the data nicely in those segments for later use. Which segments pinot actually queries though is based on retention configurations.
n