https://pinot.apache.org/ logo
a

Ambika

05/07/2021, 2:42 PM
Hi Team - I am trying to create a Hybrid table. I created the offline and _realtime versions of the table. I had the kafka stream loaded with a few events so as soon as I created the tables, the realtime data showed up on the query console. After that I uploaded a CSV with some data using the cli tool LaunchDataIngestionJob . It was successful. But when I tried to query the table again, it didn't show the data that i loaded through the batch ingestion.  But when I query the OFFLINE table separately it shows up. Not sure what could be wrong. Could you please guide ?? Attaching some screenshots - So the usecase would be, I have historical data in my warehouse and new data in a stream. I want the Hybrid table so that I can bootstrap using the batch job and then start consuming  data in the stream for future data..
k

Kishore G

05/07/2021, 2:50 PM
what is the time column. is the data uploaded to offline table older than the real-time table
a

Ambika

05/07/2021, 2:51 PM
yeah..
real time ts : Tue May 18 2021 000000
csv ts : Mon May 17 2021 000000
since kafka had events, the data got loaded to the RT table first
then I ran the offline load job
Discussed with Kishore in separate chat.. this is working as expected but since there should be at least 1 day overlap between the batch data and real time data, May 17th data will be searched for in the realtime table. This boundary condition is explained in this doc - https://docs.pinot.apache.org/basics/components/broker . I uploaded May 16th data via batch upload and verified that the data shows up and May17th data in kafka also shows up as part of query from realtime table.