My ops guy is setting up Docker containers, and wa...
# general
k
My ops guy is setting up Docker containers, and wants to know why the base Pinot Dockerfile has
Copy code
VOLUME ["${PINOT_HOME}/configs", "${PINOT_HOME}/data"]
since he sees that there’s nothing being stored in the
/data
directory. Any input?
m
Servers will store local copy of segments there?
k
But normally local copies of segments are stored in
/tmp/xxx
, or so I thought?
d
By defaults, the OSS helm chart will configure $HOME/data as the data dir for pinot
It’s in line with the default value of
controller.data.dir
of the helm chart.
k
Hmm, OK. So since we’re using HDFS as the deep store, this wouldn’t be getting used, right?
d
Indeed
But keep in mind that servers will use that path
So the volume defined in the docker image is relevant for the segments stored by the servers.
k
But wouldn’t you want that to be temp storage, and not mapped outside of Docker?
d
Nope
It’s the same as kafka
sure
brokers can rebuild their data from other replicas and deepstore and everything
But, trust me, if you want to avoid network jittering when your server are restarting, you’ll be happy with a persistent volume of your segments for the servers
Segment FS hosted by server should not be considered temporary
Deepstore download is a fallback in case of lost
k
I’ll have to poke around in one of our server processes to see why the ops guy thinks there’s nothing in /data
Thanks for the input
d
Check how your server data dir is configured
If you want to speed up server restart and avoid redownloading segments from deepstore, configuring the data dir of server in a persistent volume will improve stability of your cluster greatly when things go wrong
k
Right. So this would be a
server.data.dir
configuration value?
d
pinot.server.instance.dataDir
🙃
the takeaway is that the volume defined in the dockerfile is opiniated with the oss helm chart and not aligned with the default values from the…. dockerfile itself…
k
Nice. I guess
pinot.server.instance.segmentTarDir
can be a temp dir then.
d
not exactly
turns out it more subtle than that 🙂
Copy code
dataDir: /var/pinot/server/data/index
  segmentTarDir: /var/pinot/server/data/segment
pinot.server.instance.dataDir
is the index storage location, and
pinot.server.instance.segmentTarDir
is the tgz dir
helm chart stores them both in the same
data
volume of the dockerfile
k
OK - seems like https://docs.pinot.apache.org/developers/advanced/advanced-pinot-setup could use some editing love. Currently says for
pinot.server.instance.dataDir
“Directory to hold all the data”, and for
pinot.server.instance.segmentTarDir
“Directory to hold temporary segments downloaded from Controller or Deep Store”. But based on above, it’s not “all the data”, and it’s not (really) “temporary segments”.
d
The definition of temporary can be loose maybe? 😄
If you need to rebuild the segment indexes, there’s value in having the tgz persisted.
If the definition of
all the data
is the what is on the query path, it is accurate 😛
k
But if only indexes go into
pinot.server.instance.dataDir
, then you’d need to access the tgz to get data in a column that doesn’t have an index on it.
d
@User to the rescue for that last one 🙂
k
🙂 I’ll see what he says when I’m back online after dinner…
Thanks again
d
my pleasure!
x
the purpose of mounting data directory is because of user can plugin the data directory based on their own interest(e.g. local disk or ssd or raid)
Copy code
VOLUME ["${PINOT_HOME}/configs", "${PINOT_HOME}/data"]
are mainly for users can quick plugin and test their own data/configs, especially for quickstart style test
m
Could we add this to faq please
d
Ken’s question was rather if segmentTarDir is on the query path