Hi folks, I have a question regarding number of ta...
# troubleshooting
a
Hi folks, I have a question regarding number of task managers available and number of topic partitions on Kafka sources. Is it a problem to have more task managers than partitions on my several sources? I'm using Flink 1.17.0 and these sources don't have
withIdleness
enabled on their event-time-based watermark strategies.
m
d
If you have instances of the KafkaSource without partitions assigned to them they will never advance their watermarks, which cause problems with any event-time logic in your job. You'll need to either • use
withIdleness
• decrease the parallelism of the source to match the # of partitions • implement a custom watermark strategy that somehow works around this problem
a
Got it. Thanks for the confirmations @Martijn Visser and @David Anderson
Following up: does it make sense to increase parallelism (thus increase number of task managers) and introduce
withIdleness
to spread the tasks load among the TMs available? I understand that the source tasks wouldn't benefit from the excess number of TMs (because of the upper bound limit of # partitions) but the inner operators tasks do. Is it correct?
d
Yes, that can be a good idea.
a
Follow up question: Each task only has to have one partition assigned to it correct? If a source consumes from 20 topics, each topic with 4 partitions, in theory we could have up to 80 units of parallelism. Is that a fair assumption?