Hi team I am trying to optimise the realtime ingestion of a Apache Pinot #troubleshooting

Hi team, I am trying to optimise the realtime inge...

Prashant Pandey

08/13/2022, 4:20 AM

Hi team, I am trying to optimise the realtime ingestion of a table with the following config:

Copy code

Input Partitions: 96
Average Ingestion Rate (30 days): 154k
Max Ingestion Rate (30 days): 390k (spike)
Avg. ingestion rate / partitions ~2000 (200,000 / 96 with some buffer)
Retention Hours (on realtime servers): 1h, segments are relocated post this to OFFLINE servers.
Max Usable Host Memory: 100G (128G total, 28G for query processing)
Most Queried Time Interval: 1h (hence retention = 1h)

Command:

Copy code

./pinot-admin.sh RealtimeProvisioningHelper -tableConfigFile /var/pinot/tableConfig.json -sampleCompletedSegmentDir /var/pinot/mySegment/ -numPartitions 96 -numHosts 4,6,8,10,12 -numHours 1 -ingestionRate 2000 -maxUsableHostMemory 100G -retentionHours 1

Results:

Copy code

numHosts --> 4               |6               |8               |10              |12              |
numHours
 1 --------> 46.53G/46.53G   |31.02G/31.02G   |23.26G/23.26G   |19.39G/19.39G   |15.51G/15.51G   |

numHosts --> 4               |6               |8               |10              |12              |
numHours
 1 --------> 1.68G           |1.68G           |1.68G           |1.68G           |1.68G           |

numHosts --> 4               |6               |8               |10              |12              |
numHours
 1 --------> 46.53G          |31.02G          |23.26G          |19.39G          |15.51G          |

numHosts --> 4               |6               |8               |10              |12              |
numHours
 1 --------> 24              |16              |12              |10              |8               |

Wanted to understand why is memory being mapped when total consuming memory + mapped memory < total memory available for ingestion in all the four cases?

Mayank

08/15/2022, 5:50 PM

@Sajjad Moradi can you comment? ^^

Sajjad Moradi

08/15/2022, 6:13 PM

Non consuming segments are loaded into memory either by memory-mapping or direct memory. That's controlled by "loadMode" parameter in "tableIndexConfig" section. The output of the tool is printed as "mapped memory" for non-consuming segments, but it's in fact the amount of memory needed for non-consuming segments. I guess "mapped memory" was used because that was the common case when the tool was added.

Mayank

08/15/2022, 6:14 PM

Thanks, perhaps clarify this in the docs (unless already done)?

Open in Slack

Previous Next