I'm trying to run Flink using the Native Kubernete...
# troubleshooting
a
I'm trying to run Flink using the Native Kubernetes methodology (the Operator + FlinkDeployments) but I'm unable to configure high availability. In fact, running a FlinkDeployment using the PyFlink entry class and python_demo.py works fine until I attempt to add:
Copy code
high-availability.type: kubernetes
high-availability.storageDir: file:///flink/recovery
At which point the JobManager fails to locate the TaskManager, with no logs offered. The job simply fails when these two configs are added and remarks that there are no task slots available to it. Can I rectify this? I'd like to have HA turned on :)
g
I think the HA storage dir has to be on durable storage or something accessible from everywhere
not sure of this is the cause but that for sure looks problematic
but
file:///flink-data
is mounted as a shared volume
a
Thanks, I tried that also. I think this field is missing from the HA docs I'm reading: high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
I just enabled that in my config and my logs changed.
g
what operator / Flink version are you using?
high-availability.type: kubernetes
is a newer syntax only supported recently
you can try simply:
Copy code
high-availability:  kubernetes
a
Flink 1.16 😞 may be a simple documentation reading issue on my part
g
the
.type
is a config change in 1.17 I believe to make it yaml compliant
so for 1.16 you need to go without the
.type
but you can set it to
kubernetes
nevertheless 🙂
you dont need to specify the legacy factory name
a
That's helpful thanks. I just wanted some logs to work with...
g
we should update the example
a
Now at least I can see it trying and failing to construct HA
@Gyula Fóra Thanks for the pointers, I successfully have HA in Kubernetes running pointed at Azure using the abfss protocol
👍 1