Apache Pinot

Hi team,
any thoughts/suggestion/preference on using pinot-minion framework vs spark+airflow for doing offline batch ingestion in pinot?

What’s your data size? And what kind of transformations do you do prior to pushing data to Pinot 

These are daily/hourly partitioned tables which are quite big in GBs/PBs and present in parquet format but transformations required prior to Pinot are not very heavy (just dropping certain columns before ingesting into Pinot)

Theoretically both minion/spark are valid options. If you have PBs of data to be processed per job, may be start with spark though.

okay, also we want to schedule this job periodically to ingest data and #no. of such schedule could be high (hourly and assume 100s of different batch pipelines) so I thought of standard spark+airflow to do the job.

I felt like - in minion framework, we are using pinot-controller as a scheduler which might have some impact on the overall cluster throughput if no. of jobs are high. Am I right to have that assumption?