Hi team , I am trying to run ingestion job on EMR...
# pinot-perf-tuning
k
Hi team , I am trying to run ingestion job on EMR for about 87k parquet files with total size of 4.4 Tbi in a single s3 folder but I got timeout error on s3 list (error below, check attachment for detailed errors ). I believe the computation resources I assigned to Pinot and EMR cluster(used for spark-submit ingestion job) is adequate. The ingestion job works fine when it was used for loading smaller dataset. 24/09/25 145955 INFO S3PinotFS: Listed 8000 files from URI: s3://location/8000_files/, is recursive: true 24/09/25 150133 ERROR ApplicationMaster: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds] similar issue that @somanath joglekar got on Sep last year. I have same issue he asked in the past 1. Is there a limit on memory when listing files for ingestion on s3 2. Is there a limit on number of files or size of files when trying to ingestion data Can anyone help me with this question?
x
Which job are you using for ingestion? Suggest to try minion FileIngestionTask