Hi Team, I have few queries regarding Pinot hybri...
# general
a
Hi Team, I have few queries regarding Pinot hybrid table. 1) Lets say, we have a primary key pk1, which is both available in realtime and offline table, on query which table is be preferred by pinot i.e from which table data will be shown in output? 2) Can i append "1 record" into existing offline table? if yes , how soon it will be available to query? Thanks
m
Copy code
1. Pinot queries both the offline and realtime components for specific time window. For example, it queries realtime table for the latest data (say 1 day for example), and offline for rest. It is not a fucntion of pk.
2. Data ingestion to offline is at segment level and not record level. For realtime, it is record level and the record is available as soon as it is ingested inside of Pinot.
s
Hello Mayank, Thanks for the response. Me and anish are working together on same product. I would like to add exact use case in anish's question: If i ingest(via spark job) segments of older then 7 days in my offline table and keep having latest in realtime(via kafka). My use case is many time i need to update more then 7 days older data as well. here i can just add that record(new version) in realtime table. In this case older state of that record will be already in offline table and newer version will in realtime. What will be final output in such scenario ?
Anish's 2nd question : If i receive any older data then realtime table recency and willing to append into segment, how to do this. Currently we are using apache druid and willing to replace that for same reason that we have to overwrite the entire segment even for just 1 record append need.
m
@User For first question, by updating a record do you mean mutating column values for a record identified by a primary key? If so, this is called upsert in Pinot, and currently it works only if you have just the realtime table.
For 2: If you have a realtime only table, then older data can be consumed without problem. If you have hybrid table, the older data still gets ingested into Pinot, but if it is older than the time-boundary from offline data, it is filtered out today.
a
@User, for mutating use we are looking into upsert property of realtime table. For case2, let say , we have data till 26th oct in offline table. Now in realtime we have data of 25th oct, came in today, it will be filtered out ?
m
Yes for 2, it will be filtered out because offline table will be queried for data <= 10/25, and realtime for rest.