flink k8s operator Hi community I would like to test flink Apache Flink #troubleshooting

[flink-k8s-operator] Hi community, I would like to...

Krzysztof Chmielewski

09/16/2023, 11:59 PM

[flink-k8s-operator] Hi community, I would like to test flink k8s operator's HA capabilities for TM and JM failover. The simple test I did for TM failover was as follows: - run Flink session cluster in native mode - submit FlinkSessionJob resource with SAVEPOINT upgreade mode. - kill task manager pod It turns out that after I killed the TM, k8s operator does not create a new TM that would replace the killed one. The job was canceled and landed in Job Status -> Failed. I had an impression that for TM HA no extra configuration is needed. I have found [1] and [2]. But I'm not sure if this is for JM failvoer only or both, TM and JM. Also it is not clear for me if when using flink k8s operator do I still need to configure [1]? [1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/#kubernetes-ha-services [2] https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/configuration/#leader-election-and-high-availability

Krzysztof Chmielewski

09/17/2023, 10:02 AM

The thing is that when I've deployed an application cluster like in example [1] without any extra configuration and then I killed the TM, submitted job was moved to "RESTARTING state and then new TM was created after which job was running again. This is a different behavior that i see when I'm running session cluster [2]. How I can enable TM HA for session cluster? [1] https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/overview/#application-deployments [2] https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/overview/#session-cluster-deployments

Gyula Fóra

09/17/2023, 1:28 PM

Please stop crossposting questions on both Slack and the mailing list

Gyula Fóra

09/17/2023, 1:28 PM

it’s not very respectful towards the community

😞 1

Gyula Fóra

09/17/2023, 1:28 PM

Also this question has been answered already on the mailing list

Krzysztof Chmielewski

09/17/2023, 1:51 PM

OK, my apologies. I assume that not everyone is on both slack and mailing list. I will not be doing that from now on.

Krzysztof Chmielewski

09/17/2023, 1:55 PM

Btw, you mean my question was answered or similar one. If former that I don't think so, I've provided more info there. If the latter, then could you share a link plz? Before I posted my question I've tried to search for a similar one on the list, however I did not managed to find one.

Krzysztof Chmielewski

09/17/2023, 2:08 PM

didn't want to do any disrespectful to anyone. Just trying to find answers. The answer on user list that I got from Chen Zhanghao did not fully answer my questions, especially since I do see different behavior when I'm using session or application cluster.

Krzysztof Chmielewski

09/17/2023, 2:11 PM

For a moment I was thinning that maybe TM HA for session cluster is not supported or it requiters additional configuration. For Application cluster it worked out of the box.

Gyula Fóra

09/17/2023, 4:41 PM

There is no such thing as Taskmanager HA , the jobmanager starts new taskmanagers if they die

Gyula Fóra

09/17/2023, 4:42 PM

Regardless if this application or session mode

Gyula Fóra

09/17/2023, 4:42 PM

At least in Native deployment mode . In standalone mode there is a Kubernetes Deployment for the TMs so Kubernetes itself restarts then when they die

Krzysztof Chmielewski

09/17/2023, 8:10 PM

thank You

2 Views

Open in Slack

Previous Next