Hey everyone Using flink 1 17 1 Currently encounter an issue Apache Flink #troubleshooting

Hey everyone, Using flink 1.17.1, Currently encou...

Or Keren

06/08/2023, 11:11 AM

Hey everyone, Using flink 1.17.1, Currently encounter an issue where I use a KafkaSink with EXACTLY_ONCE policy, and a KafkaSource with isolation.level="read_committed". When using read_uncommitted I can see messages in the consumer, but when using "read_committed" I don't get any message. The only reason for it to happen is that the producer does not commit the transaction, and all of the transactions gets aborted although the checkpoint completed (this is a theory, I have no way of confirming it). Does anyone have any idea what can I do to fix this?

Martijn Visser

06/08/2023, 11:45 AM

If the checkpoint has completed successfully, all of the KafkaSink transactions are committed as part of the 2-phase commit protocol in Flink

Or Keren

06/08/2023, 11:47 AM

The checkpoint has completed successfully, but I get this error messages in the logs:

Copy code

[2:26 PM] org.apache.flink.connector.kafka.sink.KafkaCommitter - Unable to commit transaction (org.apache.flink.streaming.runtime.operators.sink.committables.CommitRequestImpl@7df51821) because it's in an invalid state. Most likely the transaction has been aborted for some reason. Please check the Kafka logs for more details.

I don't see any errors in the Kafka logs as suggested by this error.

Martijn Visser

06/08/2023, 11:48 AM

That sounds like the Kafka transaction timeout window has passed

Martijn Visser

06/08/2023, 11:48 AM

From the Kafka broker perspective of things

Or Keren

06/08/2023, 11:49 AM

Thought so as well, but it was set to 15 minutes, and the checkpoint completes in a matter of ms

Or Keren

06/08/2023, 11:50 AM

Or Keren

06/08/2023, 11:51 AM

I know that this configuration should work, because when I tried a bigger number it crashed due to the max timeout allowed by confluent broker

Martijn Visser

06/08/2023, 12:02 PM

Can you verify that you have your KafkaSink setup something like explained on https://docs.immerok.cloud/docs/how-to-guides/development/exactly-once-with-apache-kafka-and-apache-flink/ ?

Or Keren

06/08/2023, 5:07 PM

It seems that the checkpoint time took too long, When checkpoints take around 5 minutes, it reaches the timeout of the transaction (which is 15 minutes) for some reason

Or Keren

06/08/2023, 5:08 PM

I will try to reduce the checkpoint time, but it seems that it shouldn't happen when the checkpoint is 10 minutes less than the transaction timeout

Martijn Visser

06/09/2023, 1:02 PM

Are you using aligned checkpoints? You can consider enabling unaligned checkpoints

Martijn Visser

06/09/2023, 1:02 PM

Perhaps there's data skew, causing checkpoints to fail

Or Keren

06/09/2023, 1:15 PM

Do you have any suggestion on what metrics to look at and how to act according to each metric? Thanks for your replies!

Open in Slack

Previous Next