https://pinot.apache.org/ logo
#getting-started
Title
# getting-started
t

Tiger Zhao

08/16/2021, 3:19 PM
Hi, I'm trying to batch ingest a lot of data in some ORC files, what is the recommended way of doing this? I'm currently using the SegmentCreationAndMetadataPush job with the command line interface.
k

Kishore G

08/16/2021, 3:24 PM
Thats a good way to get started. In prod, you use spark to setup these jobs.
t

Tiger Zhao

08/16/2021, 3:26 PM
Thanks! Also, is there a way to configure the segment generation with batch ingest? For example, is it possible to pass in 1 ORC file, and specify it to create N number of segments or to create segments of specific size?
k

Kishore G

08/16/2021, 3:33 PM
Not as of now. right now its input file -> one pinot segment
there is a segment process framework WIP that can allow you to do some of these things
t

Tiger Zhao

08/16/2021, 3:40 PM
Ok got it. How important are segment sizes in pinot? I saw on the FAQ that the recommended size is 100-500MB. Should I try to make it so that all the segments are roughly the same size?
m

Mayank

08/16/2021, 5:23 PM
As long as you are in the ballpark, it is fine.
👍 1