Honza Drazil
09/08/2023, 8:06 AMtransactional.id
for every checkpoint, and I'm finding it challenging to comprehend its purpose. Why isn't there a single transaction (per task) for writing new data and multiple transactions for finalizing old transactions (to support multiple concurrent checkpoints)?
My concern arises from the fact that I have a producer writing to 100 topics, each with an average of 20 partitions. In this setup, a single transaction record in Kafka's Transactional Coordinator amounts to approximately 1MB in size. With 32 tasks and a checkpoint occurring every 2 minutes, this leads to nearly 1 GB of required RAM per hour on a broker. Kafka typically retains transactions for 7 days by default, implying a need for approximately 160 GB just for transaction mapping.
After understanding how transactions work internally in Kafka and how the Kafka sink approaches implementing the exactly-once delivery feature, I still cannot figure out the reasoning behind using so many transactions.Martijn Visser
09/08/2023, 8:08 AMKafka typically retains transactions for 7 days by defaultThat's not the case, the default is 15 mins https://docs.confluent.io/platform/current/installation/configuration/broker-configs.html#transaction-max-timeout-ms
Martijn Visser
09/08/2023, 8:10 AMHonza Drazil
09/08/2023, 8:11 AM<http://transactional.id.expiration.ms|transactional.id.expiration.ms>
Honza Drazil
09/08/2023, 8:12 AMHonza Drazil
09/08/2023, 8:13 AM<http://transactional.id.expiration.ms|transactional.id.expiration.ms>
to a single day, brokers relased almost instantly all the memory.Martijn Visser
09/08/2023, 8:14 AMHonza Drazil
09/08/2023, 9:03 AM