Hi team, I see suggested segment size is between 1...
# troubleshooting
a
Hi team, I see suggested segment size is between 100MB and 500MB. But, in my case, based on daily data size, 500MB segment is generated every 15min per partition. I see no other way to reduce segment number beyond increasing segment size. Could you please recommend some?
m
100-500MB is for realtime ingestion. You can use minion merge rollup in the background to get larger segment size. You also want to see if you have scope to rollup or pre-aggregate during ingestion by using lower time granularity. (For example, if you only care about querying hourly granularity, then not a good idea to store in millis granularity).
I have personally seen offline segments to be 2GB each as well (using minion based ingestion).
a
We give up minion merge because it’s too slow to rollup millis to minute or hour.
m
Hmm, we have been able to ingest 1 TB in 20min or less. It depends on right sizing the minions.
Also, you can rollup during ingestion as well.
a
Glad to hear that. Could you give some suggestion on how to right size the minions?
In our test, we found it took more than 1 hour to process 15 mins’ realtime data to offline.
m
What’s your current minion setup? Also cc: @Haitao Zhang @Xiaobing
x
I assume you were using RealtimeToOfflineSegmentTask? This task uses a single task and single thread for segment conversion, as designed for simplicity. So it’s likely not able to catch up with the real-time table.
a
Yes. And your suggestion?
v
Keep Offheap setting on . move all the double datatypes to no dictionary column . the later will help you reduce the segment size .
a
Thanks, @Vibhor Jaiswal Offheap is set on and there’s only one double type column.