Hello All I am using flink kubernetes operator with FlinkDep Apache Flink #troubleshooting

Hello All, I am using flink kubernetes operator wi...

Chetan Patel

09/18/2024, 12:22 PM

Hello All, I am using flink kubernetes operator with FlinkDeployment CRD to spin up the job and task manager pods. Everything worked fine for quiet some time and suddenly task manager pod restarted and then remained in ContainerCreating state and remained there forever. It neither give any error on why it remained in container creating state. As a temp work around, I deleted whole namespace and then it started working. Any idea what could be issue? Thanks in advance

D. Draco O'Brien

09/18/2024, 1:18 PM

To troubleshoot further: Run

Copy code

kubectl describe pod <pod-name> to get more details about the pod's status and events.

Check the Kubernetes cluster’s control plane logs, especially kubelet logs on the node where the pod was scheduled to run. These logs often contain more specific error messages. Inspect the events in the namespace with

Copy code

kubectl get events --all-namespaces.

If you’re using a managed Kubernetes service, check the cloud provider’s console for any related alerts or issues. If the issue recurs, try setting up more comprehensive logging and monitoring to catch these issues early and diagnose them. (edited)

Chetan Patel

09/18/2024, 1:25 PM

Ok let me see kubelet logs because pod event does not show any errors

Chetan Patel

09/18/2024, 1:44 PM

Thanks for suggestion

D. Draco O'Brien

09/18/2024, 3:46 PM

Yes, I would look through the logs. Basically there are a few things that could cause this. One is certainly lack of resources CPU/Disk etc. can cause this.

D. Draco O'Brien

09/18/2024, 3:47 PM

I have usually seen this error with an image pull issue

D. Draco O'Brien

09/18/2024, 3:47 PM

this can be be either network connectivity or access to docker image etc.

Chetan Patel

09/18/2024, 5:32 PM

I checked resources for the node. There is plenty of resources available. And regarding image pull, other application pods are running fine which is pulling images from same source and having image size larger than flink pods.

Chetan Patel

09/18/2024, 5:33 PM

I have asked for kubelet logs to check further

2 Views

Open in Slack

Previous Next