https://pinot.apache.org/ logo
d

Dan Hill

08/05/2020, 5:27 AM
I'm about to start writing a daily mapreduce to prepare segments for offline ingestion. I'm using Flink for the streaming ingestion. Any design tips for using Flink? • Should I have Flink write the files to S3 and then run LaunchDataIngestionJob using a workflow tool? • What's the status of the batch plugins? Does this make it easy to encapsulate the client-side parts of LaunchDataIngestionJob? https://docs.pinot.apache.org/plugins/pinot-batch-ingestion I'm also fine with writing this in Spark if it makes it a lot easier. I'd prefer Flink to keep the implementation consistent.
k

Kishore G

08/05/2020, 5:29 AM
Thats one option. We should have a job that automatically does this from real-time to offline tables this quarter.
Spark is probably better
d

Dan Hill

08/05/2020, 5:31 AM
Sweet!
Are there any example Spark projects that write to Pinot?
d

Dan Hill

08/05/2020, 5:11 PM
I'm not as familiar with Spark. It looks like the examples are separate Spark Jobs. Does Spark come with a workflow to make it easy to run the ingestion job after doing other batch processing? Or do people usually use a tool like Apache Airflow, Oozie, etc to do this?