[flink-k8s-operator] Hi community, I would like to...
# troubleshooting
k
[flink-k8s-operator] Hi community, I would like to test flink k8s operator's HA capabilities for TM and JM failover. The simple test I did for TM failover was as follows: - run Flink session cluster in native mode - submit FlinkSessionJob resource with SAVEPOINT upgreade mode. - kill task manager pod It turns out that after I killed the TM, k8s operator does not create a new TM that would replace the killed one. The job was canceled and landed in Job Status -> Failed. I had an impression that for TM HA no extra configuration is needed. I have found [1] and [2]. But I'm not sure if this is for JM failvoer only or both, TM and JM. Also it is not clear for me if when using flink k8s operator do I still need to configure [1]? [1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/#kubernetes-ha-services [2] https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/configuration/#leader-election-and-high-availability
The thing is that when I've deployed an application cluster like in example [1] without any extra configuration and then I killed the TM, submitted job was moved to "RESTARTING state and then new TM was created after which job was running again. This is a different behavior that i see when I'm running session cluster [2]. How I can enable TM HA for session cluster? [1] https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/overview/#application-deployments [2] https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/overview/#session-cluster-deployments
g
Please stop crossposting questions on both Slack and the mailing list
it鈥檚 not very respectful towards the community
馃槥 1
Also this question has been answered already on the mailing list
k
OK, my apologies. I assume that not everyone is on both slack and mailing list. I will not be doing that from now on.
Btw, you mean my question was answered or similar one. If former that I don't think so, I've provided more info there. If the latter, then could you share a link plz? Before I posted my question I've tried to search for a similar one on the list, however I did not managed to find one.
didn't want to do any disrespectful to anyone. Just trying to find answers. The answer on user list that I got from Chen Zhanghao did not fully answer my questions, especially since I do see different behavior when I'm using session or application cluster.
For a moment I was thinning that maybe TM HA for session cluster is not supported or it requiters additional configuration. For Application cluster it worked out of the box.
g
There is no such thing as Taskmanager HA , the jobmanager starts new taskmanagers if they die
Regardless if this application or session mode
At least in Native deployment mode . In standalone mode there is a Kubernetes Deployment for the TMs so Kubernetes itself restarts then when they die
k
thank You