Hi folks! Just a question: when a server instance ...
# general
d
Hi folks! Just a question: when a server instance has a consuming segment, where is the data for that segment stored? Is it in memory? Or local disk? Or something else?
m
Consuming is in memory. If server restarts, then it starts consuming from previously saved checkpoint
d
Got it. But even within a consuming segment it has checkpoints, so that if a server goes down, it doesn't just lose the whole segment, right?
m
Checkpoints are created when segments are committed, not for in memory
d
Ah, got it
Alright, thanks man! 🙂
k
Minor correction - Consuming is a mix of memory and disk. Indexes are in memory but actual data is written to disk and mmapped
p
What about the committed segment then? Does the index still live in memory? Or is that also written to the disc?
m
Committed segments are entirely mmaped, so get paged in/out as needed.
p
Thanks guys!
👍 1
d
The reason why I asked my question here was, we had to move a Pinot cluster into another set of nodes in AWS, but for some reason I thought that we could do that without losing data, because I thought that even consuming segments would still keep the data on disk. We lost some data however after we moved - for example, a table that had more than 97M rows dropped to a bit more than 95M rows, which I assume happened because we probably lost the consuming segments. I recon I should probably have done this more carefully though.
m
@User But the new table should have resumed consumption from the committed checkpoint and caught up right? From that perspective, there should not be a data loss.
d
I don't know... what I did was, I stopped the consumption by stopping to send events to Kafka, and then proceeded with the migration, while keeping the EBS blocks available but then mounted on the new nodes, and still having Pinot using S3 as deep store. After deploying the new cluster, though, I noticed the drop in rows. I made a few mistakes, though, like not taking note of exactly how many rows we had before we moved. But anyway, none of this is critical, it's only our internal usage data, so we can live with that. But it's important for us to know that something like that could have happened if we didn't commit the segments before moving.
I should have committed the consuming segments just before moving the cluster, I think.
k
No, you don’t need to do that.. Pinot will catch up from the last committed offset. You might see that the number of rows go down while it’s catching up but that should be minimal
d
Oh... so I should expect it to eventually go back up then?
It hasn't gone up since I wrote my question in this thread though. We're at 95772133 now, whereas we had more than 97M rows before we moved the cluster.
p
Could it have something to do with data retention? Since that runs periodically and maybe it ended up kicking around the same time?
d
I don't think so, we didn't configure data retention because we want to keep all the data