https://pinot.apache.org/ logo
#troubleshooting
Title
# troubleshooting
j

Jonathan Meyer

04/04/2022, 8:08 AM
Hello all ^^ I was looking into Slack history, trying to find an answer to my question - couldn't seem to find any so here I go We're doing standalone (apache/pinot Docker image in a WF) batch integrations - and we're seeing queries hitting Pinot before integrated data is available ("stale data") My use case is that we're doing data integration, firing off a Kafka event (after the
pinot-admin
step is finished), then querying Pinot, that's where we're seeing stale data Is there any way to • Have
./bin/pinot-admin.sh LaunchDataIngestionJob
wait for the data to be fully query-able ? • Have Pinot somehow notify when data becomes fully query-able ? NOTE: Job type is
SegmentCreationAndTarPush
m

Mayank

04/04/2022, 4:52 PM
Is this for production or for testing? If for testing, you probably have some options:
Copy code
1. Wait for a fixed time (might be a bit brittle).
2. Wait for IS == EV (might need to write some checks for this).
👍 1
j

Jonathan Meyer

04/04/2022, 5:36 PM
It is for production
m

Mayank

04/04/2022, 6:03 PM
In production, what does it mean to be fully queryable? You will have data constantly being pushed right?
j

Jonathan Meyer

04/04/2022, 6:05 PM
We've got batch data that comes in, and some materialized view that needs to be updated by taking into account the newly integrated data
m

Mayank

04/04/2022, 6:47 PM
So you need atomic push? If so, @User added this feature?
s

Seunghyun

04/04/2022, 6:48 PM
@User the building blocks are there but we need to implement the client. Also, we support REFRESH only.
j

Jonathan Meyer

04/04/2022, 7:27 PM
So you need atomic push? If so, @User added this feature?
Not quite Data -> Pinot job -> Kafka message saying ''new data available'' -> Other service queries service backed by Pinot (which is ''stale'' until it completes ingestion / indexing)
s

Seunghyun

04/04/2022, 7:39 PM
@User If you have realtime table, you won’t have this staleness since the data will be updated in near realtime fashion whenever there’s new data gets ingested. You will have this issue when you have offline table only. We currently don’t provide a way to notify the new data being available. A generic way to provide the functionality would be that we provide the interface so that the user can provide the function that is executed after the offline ingestion. Feel free to file the issue on github for the feature request.
j

Jonathan Meyer

04/04/2022, 7:43 PM
Yes my use case is with offline table What you propose sounds like a good way to solve my issue, thanks @User, will do !