Apache Flink

Hello, I'm looking to migrate a running job(A) deployment method from standalone to <https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/|native deployment kubernetes.> However, the job broke with multiple errors like "akka.framesize" too small or Java heap out of space. I managed to get job A back to a steady running state on standalone but I'm at a standstill and worried if I try again, the job will break again.

The job has somewhat large state for checkpoints(~10GB). I also cannot afford to dumb/clean the state. I noticed in the <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/kafka/#upgrading-to-the-latest-connector-version|docs> that it says to not upgrade Flink and Kafka connector at the same time. I missed this and did the upgrade at the same time. I'm not sure if this is the reason for the issue but putting it there as a possibility.

My question:
• What would be the best actions to ensure a safe upgrade without having the job go down again?