Hi All, I am new to Pinot. I am trying to ingest/u...
# general
r
Hi All, I am new to Pinot. I am trying to ingest/upload a 7GB Lineitem TPCH table into Pinot. Entire file is getting uploaded as a single segment. Does Pinot support any configuration to specify segmentation column(column based on which segments get created from ingested file/data)? When I explicitly split file into multiple files then multiple segments are getting created. Does Pinot expect pre-segmented data to be ingested/uploaded?
m
IIRC one input file translates to one segment. But translating one directory with m files into k segments of optimal size would be a good feature to have (I think it was in the works)
r
"segmentPartitionConfig", is this applicable for partitioning with in the segment?
m
No, it is for partitioning data across segments
r
I am confused with segments partitioning. From your earlier answer, it seemed like segmenting/splitting huge file does not happen in pinot. Then how partition happen across segments? Doe pinot combine contents of multiple files during ingestion to partition across segments?
m
SegmentPartitionConfig is only to inform Pinot that the data has already been partitioned (outside of Pinot). It is not about telling Pinot to partition it
Does that make sense?
r
Ok. Is this additional metadata provided during ingestion to help pinot to optimize query execution?
m
Correct
r
Thanks, now it is clear.