Instead of converting to other formats, doing processing based on RealtimeToOfflineSegmentsTask, we often time out. So thought, as an alternative. Read the tar file directly from s3 for processing. Among them, the segment needs to be parsed into a record, and the fields in the record need to be processed, for example, to do aggregation.