Question all: Pinot seems really heavily focused o...
# general
a
Question all: Pinot seems really heavily focused on the realtime/streaming tables. Does anyone use it for JUST batching/offline data?
d
Not JUST for that, but I do have a project being built where I only use OFFLINE ingestion. It's still under heavy development. But I have another one, already in production, which uses REALTIME only.
n
yes, there are usecases i’ve seen practically that use just offline (several in LinkedIn for example)
d
In this project I'm building, there will be load jobs being triggered every day, but each job will always regenerate the same segment for the month in which it runs, so that we have segments that we have segments not spanning more than a month of data (for optimization purposes). This is something that isn't convenient to do with REALTIME ingestion (if at all feasible).
l
Our main workflow is a combination of both online/realtime so a hybrid setup and Pinot works just fine, do you do have to get your data in order so it goes according to what Pinot expects