https://pinot.apache.org/ logo
#general
Title
# general
d

Diogo Baeder

02/12/2022, 10:59 PM
Hi folks! Just a question: when a server instance has a consuming segment, where is the data for that segment stored? Is it in memory? Or local disk? Or something else?
m

Mayank

02/12/2022, 11:15 PM
Consuming is in memory. If server restarts, then it starts consuming from previously saved checkpoint
d

Diogo Baeder

02/12/2022, 11:17 PM
Got it. But even within a consuming segment it has checkpoints, so that if a server goes down, it doesn't just lose the whole segment, right?
m

Mayank

02/12/2022, 11:33 PM
Checkpoints are created when segments are committed, not for in memory
d

Diogo Baeder

02/12/2022, 11:33 PM
Ah, got it
Alright, thanks man! 🙂
k

Kishore G

02/13/2022, 12:05 AM
Minor correction - Consuming is a mix of memory and disk. Indexes are in memory but actual data is written to disk and mmapped
p

Priyank Bagrecha

02/13/2022, 12:55 AM
What about the committed segment then? Does the index still live in memory? Or is that also written to the disc?
m

Mayank

02/13/2022, 12:56 AM
Committed segments are entirely mmaped, so get paged in/out as needed.
p

Priyank Bagrecha

02/13/2022, 1:16 AM
Thanks guys!
👍 1
d

Diogo Baeder

02/13/2022, 4:23 AM
The reason why I asked my question here was, we had to move a Pinot cluster into another set of nodes in AWS, but for some reason I thought that we could do that without losing data, because I thought that even consuming segments would still keep the data on disk. We lost some data however after we moved - for example, a table that had more than 97M rows dropped to a bit more than 95M rows, which I assume happened because we probably lost the consuming segments. I recon I should probably have done this more carefully though.
m

Mayank

02/13/2022, 4:35 AM
@User But the new table should have resumed consumption from the committed checkpoint and caught up right? From that perspective, there should not be a data loss.
d

Diogo Baeder

02/13/2022, 4:38 AM
I don't know... what I did was, I stopped the consumption by stopping to send events to Kafka, and then proceeded with the migration, while keeping the EBS blocks available but then mounted on the new nodes, and still having Pinot using S3 as deep store. After deploying the new cluster, though, I noticed the drop in rows. I made a few mistakes, though, like not taking note of exactly how many rows we had before we moved. But anyway, none of this is critical, it's only our internal usage data, so we can live with that. But it's important for us to know that something like that could have happened if we didn't commit the segments before moving.
I should have committed the consuming segments just before moving the cluster, I think.
k

Kishore G

02/13/2022, 4:44 AM
No, you don’t need to do that.. Pinot will catch up from the last committed offset. You might see that the number of rows go down while it’s catching up but that should be minimal
d

Diogo Baeder

02/13/2022, 4:46 AM
Oh... so I should expect it to eventually go back up then?
It hasn't gone up since I wrote my question in this thread though. We're at 95772133 now, whereas we had more than 97M rows before we moved the cluster.
p

Priyank Bagrecha

02/13/2022, 4:51 AM
Could it have something to do with data retention? Since that runs periodically and maybe it ended up kicking around the same time?
d

Diogo Baeder

02/13/2022, 4:52 AM
I don't think so, we didn't configure data retention because we want to keep all the data