Hi everyone. We are using the Flink Kubernetes ope...
# troubleshooting
b
Hi everyone. We are using the Flink Kubernetes operator. FlinkSessionJob for submitting flink jobs was recommended over REST API by Gyula a few months ago. So we have recently tried using flinkSessionJobs Custom Resource and noticing a few issues where it's not consistently updating the state of the job in CR status. A few issues: 1. Say we submitted a job and it is in a running state and fails, the CR status will be marked as reconciling(we have set kubernetes.operator.job.restart.failed: "true") and sometimes gets stuck in this state infinitely. On checking logs we are seeing, it's resubmitting the flink job and keeps checking its status, now if flink JM restarts it will log with warn level saying jobID <job-id> not found, and get's stuck here. Ideally it should mark the CR job status to a terminal state. 2. Sometimes a job status remains in UPGRADING state with below error even if the Job Manager is stable(say after some restart). Ideally it should reconcile and try to submit job.
Copy code
{
  "type": "org.apache.flink.kubernetes.operator.exception.ReconciliationException",
  "message": "java.util.concurrent.TimeoutException",
  "throwableList": [
    {
      "type": "java.util.concurrent.TimeoutException"
    }
  ]
}
In case of job retries when the job fails, it will be helpful to get a restart count in FlinkSessionJob CR status to know what exactly is going on. In the current state if CR's status.jobStatus.state is null, then it's hard to determine the state of a job. For an external observer watching these session jobs, we can't determine the state just based on if the status.error is null or not as we don't know if the error is transient or not. Please let me know if anyone already using FlinkSessionJob then how are you determining the flink job state based on it.
@Gyula F贸ra since you recommended trying the SessionJob over the REST API, let me know your thoughts. Sorry for the direct tagging.
g
Please do not tag me on these new questions in the future even if you believe that I can answer them :) i simply don't have the time to answer these questions all the time. When I have time, I will monitor the channel
馃憤 1
There are a lot of very knowledgeable devs and users who can also answer in most cases
r
@Bhupendra Yadav I am also facing the same issue. Do you have any suggestions how did you get pass that ?