Hi, I'm trying to batch ingest a lot of data in so...
# getting-started
t
Hi, I'm trying to batch ingest a lot of data in some ORC files, what is the recommended way of doing this? I'm currently using the SegmentCreationAndMetadataPush job with the command line interface.
k
Thats a good way to get started. In prod, you use spark to setup these jobs.
t
Thanks! Also, is there a way to configure the segment generation with batch ingest? For example, is it possible to pass in 1 ORC file, and specify it to create N number of segments or to create segments of specific size?
k
Not as of now. right now its input file -> one pinot segment
there is a segment process framework WIP that can allow you to do some of these things
t
Ok got it. How important are segment sizes in pinot? I saw on the FAQ that the recommended size is 100-500MB. Should I try to make it so that all the segments are roughly the same size?
m
As long as you are in the ballpark, it is fine.
👍 1