Hi folks. I have two questions? 1. Why is Realtime...
# troubleshooting
k
Hi folks. I have two questions? 1. Why is RealtimeToOfflineSegmentsTask executed in a single thread, and it is easy to time out due to a large amount of data. Are there any restrictions? 2. Is there any API for converting segment to record (GenericRow) directly from s3?
m
@Xiaobing ^^. For 2, do you mean you want to convert pinot segment into other format? There’s cli for that (eg PinotSegmentToAvroConverter).
k
Instead of converting to other formats, doing processing based on RealtimeToOfflineSegmentsTask, we often time out. So thought, as an alternative. Read the tar file directly from s3 for processing. Among them, the segment needs to be parsed into a record, and the fields in the record need to be processed, for example, to do aggregation.
m
I’d rather vote for fixing RT2OFF
k
This is of course good and requires a process.
x
re 1: we have an in-house parallel version of RT2OFF to overcome that single-thread limitation (fyi docs about it) re 2: we use SegmentProcessorFramework (a util in the OSS repo) to convert segments to GenericRows to generate new segments.
k
@Xiaobing Excuse me, is there a release schedule for the internal parallel version of RT2OFF?