Hi team, I’ve a question about r2o task. In order ...
# troubleshooting
a
Hi team, I’ve a question about r2o task. In order to make my question clear, I show my table config here. The following config means this r2o task runs every 2 hours. It will take all realtime segments that contain rows with timestamp earlier than current time - 6h, and create new segments. Every new segment has 1h data with no more than 2000000 rows. It’ll split into more segments if row number in this hour is larger than 2000000 here. Could you help me confirm if I’m understanding this task config correctly? Thanks.
Copy code
"taskTypeConfigsMap": {
      "RealtimeToOfflineSegmentsTask": {
        "bucketTimePeriod": "1h",
        "bufferTimePeriod": "6h",
        "schedule": "0 0 0/2 * * ?",
        "roundBucketTimePeriod": "1m",
        "mergeType": "rollup",
        "value.aggregationType": "max",
        "maxNumRecordsPerSegment": "2000000"
      }
    }
m
Yes, correct
a
If 1h data is split into 3 segments, these 3 segments should have the same start timestamp and end timestamp and different sequential number in the segment name. According to my test, most segments follows this pattern. But some segments don’t. Is it designed to be so?
l
3 segments should have the same start timestamp and end timestamp
Need not be. This start_time and end_time in segment name is determined by the timestamp of the first event and last event in that segment.
a
Ok. Just I think it makes backfilling a little harder.