Hello team 2 qq on best practices
1. I'd like to run a Flink job that polls/runs pipelines on a custom schedule. I've seen a few SO posts such as
this. But I can't find any docs, examples, or blogs showing a more prod-ready solution. Specifically my main concern is related to fault tolerance. E.g. if I want to poll once per minute on time-bound data but our app goes down for 10 minutes I'd want to recover those 10 minutes in their 1 minute intervals. We use Airflow for this currently which seems more naturally suited to the task. But it'd simplify our workflow if we could run this within Flink
2. Is it an anti-pattern to store data longish-term within Flink? My idea is to have a pipeline that reduces daily data to a few metrics and store the last x days of data in a FIFO ds. Once a new day's worth of data is in we drop the oldest day's data and add our new data. I believe this should be possible with a keyed stream and checkpointing but I'm concerned about fault tolerance and if it'd be better to store this in a database and just query the db in order to get the data.
Thanks