Kai Sun
12/01/2023, 8:52 PMpersistExecutor
is 0, which makes it a basically a SynchronizedQueue. The code of ingestion path in SeekableStreamIndexTaskRunner.runInternal()
is async with this push path running in the persistExecutor
thread pool if the queue size is greater than 0. When the queue size is 0, it is synced. Basically, when persist action happens, the ingestion path is blocked.
The question is this:
If the queue size is 0, the restore logic in SeekableStreamIndexTaskRunner.runInternal()
would keep the exact once semantic of ingestion of Kafka record.
If the queue size is greater than 0, the restore logic in SeekableStreamIndexTaskRunner.runInternal()
would potentially ingest some Kafka record more than once?
If my understanding is right, tuning the queue size of persistExecutor
thread pool has very important semantic difference in terms of ingestion duplication. It would be great to note this in the config docs,
The current config doc states the following for maxPendingPersists
parameter which is the queue size of the thread pool.
This seems not be accurate. In fact, it should be stated as:no (default = 0, meaning one persist can be running concurrently with ingestion, and none can be queued up)
no (default = 0, meaning persist would block the ingestion. This would ensure ingestion task recovery would have no duplication of Kafka record ingested.)