thats is going to be a nice system to build by itself and also depends on volume and load (bursty vs constant).
Here is some of the things i would consider:
• session cluster vs dedicated cluster for flink.
• kube/eks/karpenter or scalable managed node groups.
• save point + check point
• distribution of jm + tm across 200 flink jobs.
• schema validation: assuming you will be using a schema registry.
• deployment operator vs. helm charts.
• custom flink build via distroless docker vs. existing flink image off public repos.
• stateful jobs vs. stateless based on what are you doing in these 200 jobs.