Apache Pinot

Hello, I am looking for general guidance here. We are loading data for offline table from AWS S3 using the job spec for spark job. The segment size is ~ 400 MB on the disc. I am noticing that the servers run into OOM while trying to transitioning segment state after downloading it to the server disc. We are using 15 servers with 4 cpu and 32 GB ram and using 16 GB for heap and also using offheap.  The servers have 2 TB disc each i.e. total of 30 TB disc space, and we are loading a total of 2 TB of data. We have also configured inverted index on top of some fields in the data.

<@U03QVGLAGJV> can you please share the parameters for the ingestion job?

image.png

Please note that the S3 bucket is in us-east-1 and the servers are in us-west-2. Both the bucket and the servers will be in the same region in the prod set up