So, in total would I need both `RealtimeToOfflineS...
# general
s
So, in total would I need both
RealtimeToOfflineSegmentsTask
and
MergeRollupTask
?
n
this config
Copy code
"bucketTimePeriod": "1h",
        "bufferTimePeriod": "2h",
        "roundBucketTimePeriod": "1m",
will create segments containing 1 hour of data per segment. It won’t really be an
hourly rollup
because the
roundBucketTimePeriod
is
1m
. So from the data rows perspective, it will be a minute level rollup
to additionally do a daily segment rollup, you would have to setup MergeRollupTask
that config will create both hourly and daily segments. there’s a periodic task, which will clean up any hourly segments that have been merged into a daily segment
s
So, in order to have an hourly rollup and daily rollup, the idea is to have :
Copy code
"RealtimeToOfflineSegmentsTask": {
        "bucketTimePeriod": "1h",
        "bufferTimePeriod": "2h",
        "roundBucketTimePeriod": "1h",
        "mergeType": "rollup",
        "revenue.aggregationType": "sum",
        "maxNumRecordsPerSegment": "100000"
      }
and
Copy code
"MergeRollupTask": {
        "1day.mergeType": "rollup",
        "1day.bucketTimePeriod": "1d",
        "1day.bufferTimePeriod": "1d",
        "1day.roundBucketTimePeriod": "1d",
        "1day.maxNumRecordsPerSegment": "1000000",
        "1day.maxNumRecordsPerTask": "5000000",
        "metricColA.aggregationType": "sum",
        "metricColB.aggregationType": "max"
      }
So we do the hourly segment rollup through
RealtimeToOfflineSegmentsTask
and daily rollup through
MergeRollupTask
. Is that it?
n
yes this sounds right (assuming that when you say rollup, you want the time column to get rolled up to that granularity)
s
Thank you. 🙏 . Yes. I want the data to be rolled up to the time granularity windows.
d
@Neha Pawar If 10 day old data is ingested by a realtime table with a retention period of 5 days, will this data be handled by the realtime to offline task? Or will the data be cleaned up before the data can be moved to an offline table?
n
it will likely be cleaned up. The retention cleanup and realtimeToOffline are independent of each other. Retention cleanup happens every 6 hours, r2o happens as frequently as you’ve configured. so won’t be able to tell if it will be moved or deleted, but not something i would rely on.
d
Ok. In our situation we want to ingest 90 days of historical data to our realtime table and have this data be converted to offline segments. But we also want the realtime table to have a retention period of 1 day. To achieve this, would you advise to first create our realtime table without a retention period and once the data has been churned through and moved to offline segments we can add the 1 day retention value?
n
yes ^ . also depending on the frequency of the r2o job, you may want to have the retention be slightly more (if frequency is 1d, then atleast 3d retention. if frequency is 1h, then 1day retention is fine)
d
Alright that makes sense. Additional question, if data for a time period is queried and segments exist in both realtime and offline segments exist that cover this time period, what segments are used for the query? Do offline or realtime segments take precedence?