Can I run ~200 Flink jobs that read data from Kafk...
# random
m
Can I run ~200 Flink jobs that read data from Kafka and sink into Postgres with Flink in streaming mode? Every job will be joining two or more Kafka topics having around (1 Mill records each). Anything that I should be taking notes before doing this ?
🎉 1
r
thats is going to be a nice system to build by itself and also depends on volume and load (bursty vs constant). Here is some of the things i would consider: • session cluster vs dedicated cluster for flink. • kube/eks/karpenter or scalable managed node groups. • save point + check point • distribution of jm + tm across 200 flink jobs. • schema validation: assuming you will be using a schema registry. • deployment operator vs. helm charts. • custom flink build via distroless docker vs. existing flink image off public repos. • stateful jobs vs. stateless based on what are you doing in these 200 jobs.