Hi Team , I am attempting to scale my Flink application by utilizing Horizontal Pod Autoscaling (HPA). When the usage surpasses a predefined threshold, the task manager undergoes a restart.
My job involves consuming records from Hudi, performing processing operations on them, and producing the results to a Kafka topic. However, when the job restarts, it inadvertently generates duplicate records in the sink.
My question is as follows: If the Flink job restarts between two checkpoints, will it reprocess the records that were already processed after the last checkpoint?
Furthermore, if a savepoint is utilized, which also includes a time interval, does that imply that the records will be reprocessed in the event of a savepoint?