https://pinot.apache.org/ logo
#general
Title
# general
r

Ravikiran Katneni

08/10/2020, 2:51 AM
Hi All, I am new to Pinot. I am trying to ingest/upload a 7GB Lineitem TPCH table into Pinot. Entire file is getting uploaded as a single segment. Does Pinot support any configuration to specify segmentation column(column based on which segments get created from ingested file/data)? When I explicitly split file into multiple files then multiple segments are getting created. Does Pinot expect pre-segmented data to be ingested/uploaded?
m

Mayank

08/10/2020, 2:53 AM
IIRC one input file translates to one segment. But translating one directory with m files into k segments of optimal size would be a good feature to have (I think it was in the works)
r

Ravikiran Katneni

08/11/2020, 4:40 AM
"segmentPartitionConfig", is this applicable for partitioning with in the segment?
m

Mayank

08/11/2020, 4:41 AM
No, it is for partitioning data across segments
r

Ravikiran Katneni

08/11/2020, 4:46 AM
I am confused with segments partitioning. From your earlier answer, it seemed like segmenting/splitting huge file does not happen in pinot. Then how partition happen across segments? Doe pinot combine contents of multiple files during ingestion to partition across segments?
m

Mayank

08/11/2020, 4:48 AM
SegmentPartitionConfig is only to inform Pinot that the data has already been partitioned (outside of Pinot). It is not about telling Pinot to partition it
Does that make sense?
r

Ravikiran Katneni

08/11/2020, 4:49 AM
Ok. Is this additional metadata provided during ingestion to help pinot to optimize query execution?
m

Mayank

08/11/2020, 4:49 AM
Correct
r

Ravikiran Katneni

08/11/2020, 4:50 AM
Thanks, now it is clear.