Hi team, I would like to get some suggestions about what does the pinot batch ingestion story look like in Production environment. Ideally we want to use spark cluster mode for ingestion in production, but we ran into lots of issue when submitting job in distributed fashion to our production spark clusters on yarn. Currently we only have spark local mode and pinot standalone ingestion working for batch data, but we are worried this will not be sustainable for ingesting larger production tables. What do people generally use for ingesting pinot data in production? Asking because I don’t see too much documentation and discussion around using spark generation job with yarn master and cluster deploy mode.
Besides, we are at hadoop 2.9.1, spark 2.4.6 on yarn, pinot 0.9.2, also interested to know if anyone has successfully set up cluster mode batch ingestion with similar hadoop/spark environment👀.