Hi Team I am trying to create a Hybrid table I created the o Apache Pinot #troubleshooting

Hi Team - I am trying to create a Hybrid table. I ...

Ambika

05/07/2021, 2:42 PM

Hi Team - I am trying to create a Hybrid table. I created the offline and _realtime versions of the table. I had the kafka stream loaded with a few events so as soon as I created the tables, the realtime data showed up on the query console. After that I uploaded a CSV with some data using the cli tool LaunchDataIngestionJob . It was successful. But when I tried to query the table again, it didn't show the data that i loaded through the batch ingestion. But when I query the OFFLINE table separately it shows up. Not sure what could be wrong. Could you please guide ?? Attaching some screenshots - So the usecase would be, I have historical data in my warehouse and new data in a stream. I want the Hybrid table so that I can bootstrap using the batch job and then start consuming data in the stream for future data..

Kishore G

05/07/2021, 2:50 PM

what is the time column. is the data uploaded to offline table older than the real-time table

Ambika

05/07/2021, 2:51 PM

yeah..

Ambika

05/07/2021, 2:51 PM

real time ts : Tue May 18 2021 000000

Ambika

05/07/2021, 2:51 PM

csv ts : Mon May 17 2021 000000

Ambika

05/07/2021, 2:52 PM

since kafka had events, the data got loaded to the RT table first

Ambika

05/07/2021, 2:52 PM

then I ran the offline load job

Ambika

05/07/2021, 5:03 PM

Discussed with Kishore in separate chat.. this is working as expected but since there should be at least 1 day overlap between the batch data and real time data, May 17th data will be searched for in the realtime table. This boundary condition is explained in this doc - https://docs.pinot.apache.org/basics/components/broker . I uploaded May 16th data via batch upload and verified that the data shows up and May17th data in kafka also shows up as part of query from realtime table.

Open in Slack

Previous Next