Any plans for pinot to support 'RealtimeToOfflineS...
# general
p
Any plans for pinot to support 'RealtimeToOfflineSegementsTask' for upsert enabled tables. Use case is this: 1. Normal queries; key contains a business date; only want to see the last value for a day (as per current upsert config works well) 2. Detail / as of queries; want to see record up to a given timestamp for a given business date (skipUpsert and LASTWITHTIME( col, asOfTime ). e.g. compare 11am yesterday v today 3. Historical want to move data beyond a week to OFFLINE 4. Historical want to drop data points e.g. milli seconds -> hours per day. Like reducing data resolution over time through some data cleaning job.
l
number 3 and 4 i believe you can do with the current RTToOffline job
p
I see *tree have a recipe for upsert to offline but seems to hit a validation error saying its not supported for upsert. Will try again this week and also explicitly create the offline table which I hadn't done. https://github.com/startreedata/pinot-recipes/tree/main/recipes/upserts-real-time-offline-job
l
oh sorry i totally missed upsert 😄
but upsert only happen on RT tables right?
wouldn’t you do the upsert operation first and then more that segment to offline?
p
we seem to have found a way around this using lastwithtime operator and removing upsert. memory usage is also much better without upsert enabled, think it was caching the keys to last value for our 500m entry dataset