I am facing the same issue as described in this st...
# troubleshooting
z
I am facing the same issue as described in this stackflow post. Anyone knows what might cause this? Flink 1.17, Java 11, Scala 2.12.17 https://stackoverflow.com/questions/75940903/flink-kafkasink-takes-long-time-to-start
m
Flink doesn’t support Scala 2.12.17
Is this also a new job that you are starting, or restarting from a savepoint?
z
From a savepoint. I think it is due to some corrupted states of kafka transactions. After I cleared the state by removing
eos
and reenabling it, the job is running fine now.
m
How old was the savepoint that you tried to recover from?
z
It was pretty recent. It started to happen after an upgrade. I use
flink-k8s-operator
to handle the upgrade process.
so it happens again after an upgrade.
t
@Zhong Chen I believe the long startup time is caused by a loop at restore time that attempts to abort lingering transactions from previous executions of the job. Flink’s KafkaSink currently doesn’t have a good way to directly query Kafka and list lingering transactions, so instead it needs to iterate through all possible permutations of transaction IDs to abort and try to abort them. We’re collaborating with folks on the Kafka side to see if we can improve this.
just curious: did you start seeing the long startup times only after upgrading to 1.17? Or was this observed in older versions as well?
z
after upgrading to 1.17 I believe
I am just starting using Flink, so I can’t confidently say it only happens to 1.17. I only briefly used 1.16. Since I noticed that auto scaling is only applicable to 1.17, I decided to use the latest version.