Hi Everyone ,
I am trying to run ingestion job on EMR for about 8000 files with total size of 460 gb in a single folder but I get a timeout error on s3 list (error below, check attachment for detailed errors ) , each file size ~50mb avro format
24/09/25 145955 INFO S3PinotFS: Listed 8000 files from URI: s3://location/8000_files/, is recursive: true
24/09/25 150133 ERROR ApplicationMaster: Uncaught exception:
java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263)
the same setup (including EMR computing resources + commands) works for 6600 files with total size of 190gb of same data
Questions
1. Is there a limit on memory when listing files for ingestion on s3
2. Is there a limit on number of files or size of files when trying to ingestion data