Hi Everyone , I am trying to run ingestion job on...
# pinot-perf-tuning
s
Hi Everyone , I am trying to run ingestion job on EMR for about 8000 files with total size of 460 gb in a single folder but I get a timeout error on s3 list (error below, check attachment for detailed errors ) , each file size ~50mb avro format 24/09/25 145955 INFO S3PinotFS: Listed 8000 files from URI: s3://location/8000_files/, is recursive: true 24/09/25 150133 ERROR ApplicationMaster: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) the same setup (including EMR computing resources + commands) works for 6600 files with total size of 190gb of same data Questions 1. Is there a limit on memory when listing files for ingestion on s3 2. Is there a limit on number of files or size of files when trying to ingestion data
k
Hello Somanath, I hope this message find u well. I am getting this error as well when I was trying to load large data from s3 to Pinot. I am also thinking that there might be a limit for file listing when data ingestion on 3. May I ask how you resolve this error eventually?