https://flink.apache.org/ logo
Title
z

Zhong Chen

05/24/2023, 10:56 PM
I am facing the same issue as described in this stackflow post. Anyone knows what might cause this? Flink 1.17, Java 11, Scala 2.12.17 https://stackoverflow.com/questions/75940903/flink-kafkasink-takes-long-time-to-start
m

Martijn Visser

05/25/2023, 4:57 AM
Flink doesn’t support Scala 2.12.17
Is this also a new job that you are starting, or restarting from a savepoint?
z

Zhong Chen

05/25/2023, 5:55 AM
From a savepoint. I think it is due to some corrupted states of kafka transactions. After I cleared the state by removing
eos
and reenabling it, the job is running fine now.
m

Martijn Visser

05/25/2023, 8:03 AM
How old was the savepoint that you tried to recover from?
z

Zhong Chen

05/25/2023, 4:02 PM
It was pretty recent. It started to happen after an upgrade. I use
flink-k8s-operator
to handle the upgrade process.
so it happens again after an upgrade.
t

Tzu-Li (Gordon) Tai

05/25/2023, 9:05 PM
@Zhong Chen I believe the long startup time is caused by a loop at restore time that attempts to abort lingering transactions from previous executions of the job. Flink’s KafkaSink currently doesn’t have a good way to directly query Kafka and list lingering transactions, so instead it needs to iterate through all possible permutations of transaction IDs to abort and try to abort them. We’re collaborating with folks on the Kafka side to see if we can improve this.
just curious: did you start seeing the long startup times only after upgrading to 1.17? Or was this observed in older versions as well?
z

Zhong Chen

05/25/2023, 9:18 PM
after upgrading to 1.17 I believe
I am just starting using Flink, so I can’t confidently say it only happens to 1.17. I only briefly used 1.16. Since I noticed that auto scaling is only applicable to 1.17, I decided to use the latest version.