Hi All,
I am new to Pinot. I am trying to ingest/upload a 7GB Lineitem TPCH table into Pinot. Entire file is getting uploaded as a single segment.
Does Pinot support any configuration to specify segmentation column(column based on which segments get created from ingested file/data)?
When I explicitly split file into multiple files then multiple segments are getting created. Does Pinot expect pre-segmented data to be ingested/uploaded?
m
Mayank
08/10/2020, 2:53 AM
IIRC one input file translates to one segment. But translating one directory with m files into k segments of optimal size would be a good feature to have (I think it was in the works)
r
Ravikiran Katneni
08/11/2020, 4:40 AM
"segmentPartitionConfig", is this applicable for partitioning with in the segment?
m
Mayank
08/11/2020, 4:41 AM
No, it is for partitioning data across segments
r
Ravikiran Katneni
08/11/2020, 4:46 AM
I am confused with segments partitioning. From your earlier answer, it seemed like segmenting/splitting huge file does not happen in pinot. Then how partition happen across segments? Doe pinot combine contents of multiple files during ingestion to partition across segments?
m
Mayank
08/11/2020, 4:48 AM
SegmentPartitionConfig is only to inform Pinot that the data has already been partitioned (outside of Pinot). It is not about telling Pinot to partition it
Mayank
08/11/2020, 4:48 AM
Does that make sense?
r
Ravikiran Katneni
08/11/2020, 4:49 AM
Ok. Is this additional metadata provided during ingestion to help pinot to optimize query execution?