Hi, I'm trying to batch ingest a lot of data in some ORC files, what is the recommended way of doing this? I'm currently using the SegmentCreationAndMetadataPush job with the command line interface.
08/16/2021, 3:24 PM
Thats a good way to get started. In prod, you use spark to setup these jobs.
08/16/2021, 3:26 PM
Thanks! Also, is there a way to configure the segment generation with batch ingest? For example, is it possible to pass in 1 ORC file, and specify it to create N number of segments or to create segments of specific size?
08/16/2021, 3:33 PM
Not as of now. right now its input file -> one pinot segment
there is a segment process framework WIP that can allow you to do some of these things
08/16/2021, 3:40 PM
Ok got it. How important are segment sizes in pinot? I saw on the FAQ that the recommended size is 100-500MB. Should I try to make it so that all the segments are roughly the same size?