Hi Everyone, I am new to this community as well! ...
# general
t
Hi Everyone, I am new to this community as well! Do you have a set of best practices while running real-time queries? I went through one of videos on YouTube explaining the architecture of Pinot and it's use cases. In the example, the person is using Kafka topic as source and driving queries on the real-time table, I am wondering if the idempotency is built in within these tables or do we need to write the queries in such a way that would take into account idempotency, accuracy of the metrics, quality checks etc. What happens if I reset offset for the consumer or the source publishes events twice or thrice?
k
Idempotency is built in at a event level.. I.e Pinot will ensure that each message is ingested only once.. But if the source publishes events twice then Pinot will also consume it twice.. you can use upsert feature in Pinot to dedup.. it’s a bit heavy weight but we are in the process of adding support for dedup shortly
❤️ 1
t
Thanks @Kishore G !
y
Welcome @Tanay Karmarkar 👋
👋 1
h
@Kishore G - Is there any ETA for the dedup feature. Also how different would be dedup compared to upsert. We have been testing out upsert to solve for dedup events coming in a stream.