Is there anything similar to a WAL or CDC stream i...
# getting-started
p
Is there anything similar to a WAL or CDC stream in Pinot? If I want to trigger events related to a record when it lands (like trigger an aggregation query to get the latest state), is there a way to be notified from Pinot? The alternative is to listen to the same event stream that is feeding Pinot and add a delay equal to how long the event would take to be ingested by pinot.
b
Are you looking to trigger events when any record is added or filter for something?
p
Correct. If I have a service that is interested in events of type ‘x’, I want a source to watch that will tell me a new event of type ‘x’ has been added, and it can query the new state.
m
I don’t think so. But if you know the offset of the record in e.g. kafka, you can see when Pinot has ingested it via https://startree.ai/blog/apache-pinot-tm-0-12-consumer-record-lag
j
@Phil Sheets Can you help with what is the use-case for this particular need? Since it seems to be like something more akin to what a transactional system would have vs OLAP
p
Thinking about 2 different use cases: 1. Event driven dashboard updates. If we have a dashboard tracking events of type ‘x’, we could trigger a refresh every time a new type ‘x’ event lands. 2. Pre-caching. We want to have very low SLAs on certain metrics (5-10ms), having an event log would allow us to update caches ahead of time.
j
1. Can a max(timestamp) query every 1-2 second be helpful basically query to see if a new event exists then refresh.
p
I think there is a solution using that idea + watching the source kafka topic. • Subscribe to the kafka topic and filter on the event type I care about. • When a new event arrives, start polling pinot until I see the eventId or timestamp • Trigger the update
j
The above process should take cumulative 1-2 seconds I think would it not be easier to preemptively refresh those specific metrics every 1-2 seconds itself.
p
That is a good point.
👍 1
m
Shouldn’t you also just be able to use the
minConsumingFreshnessTimeMs
with a very simple fast query? Just an idea I’m throwing out there
👍 3
k
This is coming up a few times.. is it possible to file a GitHub issue and we can consider implementing it if there is enough interest
👍 1
1
p