I m trying to run Flink using the Native Kubernetes methodol Apache Flink #troubleshooting

I'm trying to run Flink using the Native Kubernete...

Alex Craig

09/13/2023, 5:11 PM

I'm trying to run Flink using the Native Kubernetes methodology (the Operator + FlinkDeployments) but I'm unable to configure high availability. In fact, running a FlinkDeployment using the PyFlink entry class and python_demo.py works fine until I attempt to add:

Copy code

high-availability.type: kubernetes
high-availability.storageDir: file:///flink/recovery

At which point the JobManager fails to locate the TaskManager, with no logs offered. The job simply fails when these two configs are added and remarks that there are no task slots available to it. Can I rectify this? I'd like to have HA turned on :)

Gyula Fóra

09/13/2023, 6:12 PM

I think the HA storage dir has to be on durable storage or something accessible from everywhere

Gyula Fóra

09/13/2023, 6:13 PM

not sure of this is the cause but that for sure looks problematic

Gyula Fóra

09/13/2023, 6:13 PM

we have this HA example: https://github.com/apache/flink-kubernetes-operator/blob/main/examples/basic-checkpoint-ha.yaml

Gyula Fóra

09/13/2023, 6:14 PM

but

file:///flink-data

is mounted as a shared volume

Alex Craig

09/13/2023, 6:14 PM

Thanks, I tried that also. I think this field is missing from the HA docs I'm reading: high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory

Alex Craig

09/13/2023, 6:14 PM

I just enabled that in my config and my logs changed.

Gyula Fóra

09/13/2023, 6:14 PM

what operator / Flink version are you using?

Gyula Fóra

09/13/2023, 6:15 PM

high-availability.type: kubernetes

is a newer syntax only supported recently

Gyula Fóra

09/13/2023, 6:15 PM

you can try simply:

Copy code

high-availability:  kubernetes

Alex Craig

09/13/2023, 6:16 PM

Flink 1.16 😞 may be a simple documentation reading issue on my part

Gyula Fóra

09/13/2023, 6:16 PM

the

.type

is a config change in 1.17 I believe to make it yaml compliant

Gyula Fóra

09/13/2023, 6:17 PM

so for 1.16 you need to go without the

.type

but you can set it to

kubernetes

nevertheless 🙂

Gyula Fóra

09/13/2023, 6:17 PM

you dont need to specify the legacy factory name

Alex Craig

09/13/2023, 6:17 PM

That's helpful thanks. I just wanted some logs to work with...

Gyula Fóra

09/13/2023, 6:17 PM

we should update the example

Alex Craig

09/13/2023, 6:17 PM

Now at least I can see it trying and failing to construct HA

Alex Craig

09/13/2023, 7:28 PM

@Gyula Fóra Thanks for the pointers, I successfully have HA in Kubernetes running pointed at Azure using the abfss protocol

👍 1

Open in Slack

Previous Next