Hi,
I want to ask question about pinot data structure. I was confused a bit.
Firstly, I want to give a feedback about the Apache pinot documentations. I think, documentations can be improved. I could not find some necessary informations from docs. This could be my fault. I do not know, you can warn me.
Pinot segments is created incrementally based on time. For example, for daily granularity, segments will be created for each day.
This is the default and necessary feature for segments like Apache Druid segments. So, we can get best performance when use time based query. Can we define any segment configurations like maximum segment size? For example, for Sunday, we have 10gb data. I want to create 5 segments with 2gb data instead of one segment with 10gb data. Do this operation supported?
Segment have metadata information. We can get all segment names of a table from controller api, then we can get metadata(segment uri etc) of each segment from controller api.
• In segment metadata, what location "segment URI" represents? Deep storage or offline server location?
• Can I get all segments metadata information from controller in one query? (because, i saw the api accepts only one segment to get segment metadata information)
Also, I want to read segment. Apache Pinot creates segment file, then compress it to
.tar.gz
format. Can I read compressed segment file(eg: segment_1.tar.gz) with
PinotSegmentRecordReader
directly? Or do I
unTar
compressed file firstly? Shortly, can I read compressed segment file from offline server or deep storage directly? Or do I have to download compressed segment file to local first, and untar and read it? How flow can be to read segment?
Thank you so much!