Nicholas Erasmus
09/19/2023, 9:22 AMhigh-availability
directory (which is where the reference to the last checkpoint is)
• doing kubectl delete flinkdeployment
also deletes the contents in the high-availability
directory
• doing kubectl delete deployment
leads to the Flink operator recreating the job/task pods which means they’ll run in both kubernetes clusters and lead to duplicates
How is one supposed to do this? It seems like an extremely common requirement. There is a very laborious process one can do by suspending with savepoint, copying last savepoint, redeploying in the new job with a reference to the savepoint. This surely can’t be the standard way of doing this.
Any help/advice would be appreciatedFlaviu Cicio
09/19/2023, 10:43 AMNicholas Erasmus
09/19/2023, 10:46 AMhigh-availability.type: kubernetes
and
high-availability.storageDir: <s3://flink/recovery>
The issue is that this directory in S3 is cleared out every time we do a suspend
. So we haven’t been able to find a way of stopping the Job while leaving the contents in the high-availability directory.
Hope that makes sense