Hello team 2 qq on best practices 1. I'd like to ...
# troubleshooting
m
Hello team 2 qq on best practices 1. I'd like to run a Flink job that polls/runs pipelines on a custom schedule. I've seen a few SO posts such as this. But I can't find any docs, examples, or blogs showing a more prod-ready solution. Specifically my main concern is related to fault tolerance. E.g. if I want to poll once per minute on time-bound data but our app goes down for 10 minutes I'd want to recover those 10 minutes in their 1 minute intervals. We use Airflow for this currently which seems more naturally suited to the task. But it'd simplify our workflow if we could run this within Flink 2. Is it an anti-pattern to store data longish-term within Flink? My idea is to have a pipeline that reduces daily data to a few metrics and store the last x days of data in a FIFO ds. Once a new day's worth of data is in we drop the oldest day's data and add our new data. I believe this should be possible with a keyed stream and checkpointing but I'm concerned about fault tolerance and if it'd be better to store this in a database and just query the db in order to get the data. Thanks
m
What is your use case? Are you talking about batch or streaming jobs?
m
Streaming, both cases data is unbounded
m
So something like your business logic is external and you want to poll that every X minutes, to see if your logic has changed and then use that new config?
m
We're running pipelines doing anomaly detection, latency is important so we want to streamline everything as much as possible and we have to get frequent updates Business logic is within the flink job. For 1 we need to hit an endpoint to retrieve new data. Currently we have an Airflow job that hits the endpoint and forwards it to Kafka (which we then read from Flink). Our usecase is expanding so I wanted to look into eliminating Airflow For 2 we're reading from Kafka.
m
Why not create a custom connector for Flink that calls your endpoint? That would eliminate Airflow and Kafka for that use case
👀 1
That would probably be working better with Flinks fault tolerance overall
On 2, it’s not an anti pattern but a business requirements one imho. If you treat Flink state as your system of record, it probably means that you have additional requirements in terms of stability, backup and restore, etc. What happens if you can’t upgrade Flink because it would break savepoint compatibility and you are stuck on an older version etc
m
Yeah those are good points. Had a feeling my solution was trying to be too clever. My thought process was to add a custom connector. I've seen stuff like this but I'm worried our use-case doesn't apply in the same way Not having a default connector for hitting an API on a schedule or something similar made me concerned there was something I was missing. Since missing any amount of data (eventually is fine just not completely skipping) I was worried about how to implement this. I'm not fully sure how to implement this in a way that guarantees we don't miss anything. Anyway. Any information you have that can help us would be fantastic
Essentially implementing the connector in such a way that it complies with all the regular expectations we'd expect out of an out of box connector