I'm about to start writing a daily mapreduce to prepare segments for offline ingestion. I'm using Flink for the streaming ingestion.
Any design tips for using Flink?
• Should I have Flink write the files to S3 and then run LaunchDataIngestionJob using a workflow tool?
• What's the status of the batch plugins? Does this make it easy to encapsulate the client-side parts of LaunchDataIngestionJob?
https://docs.pinot.apache.org/plugins/pinot-batch-ingestion
I'm also fine with writing this in Spark if it makes it a lot easier. I'd prefer Flink to keep the implementation consistent.