https://pinot.apache.org/ logo
v

vmarchaud

01/22/2021, 1:30 PM
Hey, quick question: we have realtime segment marked as completed and we would like to move it to a offline table however the endpoint to download the segment (
get /segments/{tableName}/{segmentName}
)is trying to fetch it from the deep store. I was just thinking of downloading it and upload it on the offline table directly, how could i achieve this ? Thanks
1
@vmarchaud This should be possible using a minion based on the second link
v

vmarchaud

01/22/2021, 1:47 PM
Yeah i saw that but i don't really want the minion to rebuild the segment, i just want to move them as-is
Is there a way to do this @Will Briggs ?
I'm actually deep into how the task works and i'm seeing
realtimeSegmentZKMetadata.getDownloadUrl()
which i didn't follow yet
but i guess i could find my response there ?
w

Will Briggs

01/22/2021, 1:50 PM
I’m not sure - the tricky part is that the offline table might not have the same partitioning / sorting / indexing as the realtime table, and the
RealtimeToOfflineSegmentsTask
handles that generically for you - by simply moving the segments as-is, you are kind of shoehorning yourself into never diverging the offline table.
v

vmarchaud

01/22/2021, 1:51 PM
Well i configured the realtime table the exact same as the offline so i should be fine right ?
w

Will Briggs

01/22/2021, 1:52 PM
For the short-term, yes
v

vmarchaud

01/22/2021, 1:52 PM
What would be the problem in the long term ? I mean if i have an issue i can just re-index the segment and re-upload it ?
w

Will Briggs

01/22/2021, 1:52 PM
It’s not an approach I would put into production, though
v

vmarchaud

01/22/2021, 1:54 PM
From my comprehension the only difference between realtime and offline when a segment is completed would be that the realtime stores it locally but the offline does it in the deepstore
I guess i'm missing something ?
w

Will Briggs

01/22/2021, 1:56 PM
Realtime tables also push segments to the deep store
v

vmarchaud

01/22/2021, 1:58 PM
Hmmm, thats surely something i missed
Is it automatic when the segment is completed ?
v

vmarchaud

01/22/2021, 2:01 PM
Thanks, something that i haven't mentionned is that my stream are high level. I'm seeing in the task code that it only works with low level
w

Will Briggs

01/22/2021, 2:01 PM
Ah, I have no experience using the high level stream consumer, unfortunately. I went straight to low-level based on the limitations of the high level streams.
v

vmarchaud

01/22/2021, 2:03 PM
Thanks anyway, you were very helpful
n

Neha Pawar

01/22/2021, 4:21 PM
We don't recommend or maintain high level any more. Any particular reason you are using high level?
Simply moving segments to offline table by downloading and reuploading can give you incorrect results in the time boundary calculation at the brokers
v

vmarchaud

01/22/2021, 4:56 PM
@Neha Pawar We are currently using GCP's pubsub system for pinot and it doesnt have any "partition" system
You only get one subscription for every consumer