Hello all ^^ I was looking into Slack history, try...
# troubleshooting
j
Hello all ^^ I was looking into Slack history, trying to find an answer to my question - couldn't seem to find any so here I go We're doing standalone (apache/pinot Docker image in a WF) batch integrations - and we're seeing queries hitting Pinot before integrated data is available ("stale data") My use case is that we're doing data integration, firing off a Kafka event (after the
pinot-admin
step is finished), then querying Pinot, that's where we're seeing stale data Is there any way to • Have
./bin/pinot-admin.sh LaunchDataIngestionJob
wait for the data to be fully query-able ? • Have Pinot somehow notify when data becomes fully query-able ? NOTE: Job type is
SegmentCreationAndTarPush
m
Is this for production or for testing? If for testing, you probably have some options:
Copy code
1. Wait for a fixed time (might be a bit brittle).
2. Wait for IS == EV (might need to write some checks for this).
👍 1
j
It is for production
m
In production, what does it mean to be fully queryable? You will have data constantly being pushed right?
j
We've got batch data that comes in, and some materialized view that needs to be updated by taking into account the newly integrated data
m
So you need atomic push? If so, @User added this feature?
s
@User the building blocks are there but we need to implement the client. Also, we support REFRESH only.
j
So you need atomic push? If so, @User added this feature?
Not quite Data -> Pinot job -> Kafka message saying ''new data available'' -> Other service queries service backed by Pinot (which is ''stale'' until it completes ingestion / indexing)
s
@User If you have realtime table, you won’t have this staleness since the data will be updated in near realtime fashion whenever there’s new data gets ingested. You will have this issue when you have offline table only. We currently don’t provide a way to notify the new data being available. A generic way to provide the functionality would be that we provide the interface so that the user can provide the function that is executed after the offline ingestion. Feel free to file the issue on github for the feature request.
j
Yes my use case is with offline table What you propose sounds like a good way to solve my issue, thanks @User, will do !