Is there any performance implication associated wi...
# general
e
Is there any performance implication associated with the number of segments that compose a given table?
m
Great question. I thought it was documented somewhere. Let me find it, and if not, let me create a FAQ.
In general too many small segments is not great. And too few large segments are also not great. You want to size your segments to be around 100MB-500MB (pinot segment size).
👍 1
e
ah my apologies if it’s documented and I missed it. RTFM right? 😉
@User is that size recommendation before or after compression? I’ve seen most segments as
.tar.gz
in documentation
m
My apologies if it is not 😀
Before tar gz
👍 1
e
so then is it recommended to choose a segmenting strategy (time interval) that results in segments within the size 100-500MB range, or can a given interval be split into multiple segments? Ex. Let’s say it’s preferred to segment data by day, but that would yield segments of 800MB. So instead each day is represented by something like:
Copy code
/var/pinot/airlineStats/segments/2014/01/15/airlineStats_batch_2014-01-15_2014-01-15_0001.tar.gz

/var/pinot/airlineStats/segments/2014/01/15/airlineStats_batch_2014-01-15_2014-01-15_0002.tar.gz
To your example, it won't make a huge difference either way, unless you are trying to tune p99 latency in millis range. In that case, let's chat more.
e
Awesome, appreciate the offer! I’m not quite at that level of competency but will try to keep this chat in mind for later 🙏
👍 1