Hi Team , I am attempting to scale my Flink applic...
# troubleshooting
s
Hi Team , I am attempting to scale my Flink application by utilizing Horizontal Pod Autoscaling (HPA). When the usage surpasses a predefined threshold, the task manager undergoes a restart. My job involves consuming records from Hudi, performing processing operations on them, and producing the results to a Kafka topic. However, when the job restarts, it inadvertently generates duplicate records in the sink. My question is as follows: If the Flink job restarts between two checkpoints, will it reprocess the records that were already processed after the last checkpoint? Furthermore, if a savepoint is utilized, which also includes a time interval, does that imply that the records will be reprocessed in the event of a savepoint?
d
Hi @Martijn Visser, Any thoughts ? The events processed between 2 checkpoints or between 2 savepoints undergo duplicate processing if the pod is killed and restarted. Appreciate any inputs. Thanks
m
I don't know if Hudi supports exactly once, so I can't answer that question
d
Thanks @Martijn Visser, checking with Apache Hudi - #general-hudi-question>