Apache Flink

Can anyone share any experience on running Flink jobs across data centers?

I am trying to create a Multi site/Geo Replicated Kafka cluster. I want that my Flink job to be closely colocated with my Kafka multi site cluster. If the Flink job is bound to a single data center, I believe we will observe a lot of client latency by trying to access the broker in another DC.

Rather if I can make my Flink Kafka collectors as rack aware and start fetching data from the closest Kafka broker, I should get better results.

I will be deploying Flink 1.16 on Kubernetes with Strimzi managed Apache Kafka. Thanks.

I think you would benefit when <https://github.com/apache/flink-connector-kafka/pull/20> is merged and released

This is awesome!

So with this supported and my Task Managers spread across multiple DCs, this should look like a Geo Red cluster, right?

Any anti-thesis points against cross DC TM-JM network latency?

No idea to be honest :smile: not my expertise