Hi! We are trying to use tiered storage with an NF...
# troubleshooting
e
Hi! We are trying to use tiered storage with an NFS volume mounted on "server-b". When we trigger the rebalance and segments move from server-a to server-b we get alot of errors like:
Copy code
Caused by: java.nio.file.FileSystemException: /var/pinot/server/data/index/environment_OFFLINE/environment_OFFLINE_1618208070664_1649743939567_7/v3/.nfs000000000134004000000058: Device or resource busy
in the logs from server-b. Could this be a problem with how the server is implemented or is it strictly an NFS problem on our end? The end result is that some or all segments go into an error state and the data goes missing during a rebalance.
d
Seems like a linux mounting / NFS issue. Pinot could be responsible of overloading the NFS service it’s not scaled as it needs to be.
m
Do server-a and server-b share the same NFS? And if so what’s the dataDir specified in this server? Wondering if both are trying to overwrite each other
e
No, server-a uses a local disk.
m
Hmm I am wondering if Pinot-server is doing operations that work ok on local disk but not on nfs? If so should be an easy fix, because all it has to do is move a bunch of files around for rebalance
e
I don't know much about NFS but I asked our system admin guy and he said moving files around on NFS is more expensive (something like they have to be copied and then deleted rather than just moved). It looks like the segments are all first downloaded into /tmp and then moved to the proper location so maybe this is a part of the problem? I'm gonna get our resident linux expert to look into our setup after Easter. I guess the pinot server runs many concurrent threads all interacting with the filesystem at once, maybe NFS just handles concurrency poorly.
d
Pinot is quite intensive on File System IO. I would certainly not recommend using NFS for local persistence. NFS for deep store is well supported, but running a database on NFS is not very scalable in my opinion.
m
@User we are trying to see if NFS can be used, and if so how, if not why. Pinot is definitely going to do IO during serving, this one is more around less frequent operations like rebalance
d
Makes sense. Thank you for the context 👍
e
additional note: it seems to work fine on iSCSI (which may or may not be an option for our production setup). I'm not sure about the differences between NFS and iSCSI myself and haven't had time to look into it but maybe it helps someone!
m
ack