https://pinot.apache.org/ logo
#general
Title
# general
k

Ken Krugler

06/02/2021, 12:27 AM
My ops guy is setting up Docker containers, and wants to know why the base Pinot Dockerfile has
Copy code
VOLUME ["${PINOT_HOME}/configs", "${PINOT_HOME}/data"]
since he sees that there’s nothing being stored in the
/data
directory. Any input?
m

Mayank

06/02/2021, 12:29 AM
Servers will store local copy of segments there?
k

Ken Krugler

06/02/2021, 12:29 AM
But normally local copies of segments are stored in
/tmp/xxx
, or so I thought?
d

Daniel Lavoie

06/02/2021, 12:29 AM
By defaults, the OSS helm chart will configure $HOME/data as the data dir for pinot
It’s in line with the default value of
controller.data.dir
of the helm chart.
k

Ken Krugler

06/02/2021, 12:31 AM
Hmm, OK. So since we’re using HDFS as the deep store, this wouldn’t be getting used, right?
d

Daniel Lavoie

06/02/2021, 12:31 AM
Indeed
But keep in mind that servers will use that path
So the volume defined in the docker image is relevant for the segments stored by the servers.
k

Ken Krugler

06/02/2021, 12:32 AM
But wouldn’t you want that to be temp storage, and not mapped outside of Docker?
d

Daniel Lavoie

06/02/2021, 12:32 AM
Nope
It’s the same as kafka
sure
brokers can rebuild their data from other replicas and deepstore and everything
But, trust me, if you want to avoid network jittering when your server are restarting, you’ll be happy with a persistent volume of your segments for the servers
Segment FS hosted by server should not be considered temporary
Deepstore download is a fallback in case of lost
k

Ken Krugler

06/02/2021, 12:36 AM
I’ll have to poke around in one of our server processes to see why the ops guy thinks there’s nothing in /data
Thanks for the input
d

Daniel Lavoie

06/02/2021, 12:36 AM
Check how your server data dir is configured
If you want to speed up server restart and avoid redownloading segments from deepstore, configuring the data dir of server in a persistent volume will improve stability of your cluster greatly when things go wrong
k

Ken Krugler

06/02/2021, 12:38 AM
Right. So this would be a
server.data.dir
configuration value?
d

Daniel Lavoie

06/02/2021, 12:38 AM
pinot.server.instance.dataDir
🙃
the takeaway is that the volume defined in the dockerfile is opiniated with the oss helm chart and not aligned with the default values from the…. dockerfile itself…
k

Ken Krugler

06/02/2021, 12:40 AM
Nice. I guess
pinot.server.instance.segmentTarDir
can be a temp dir then.
d

Daniel Lavoie

06/02/2021, 12:40 AM
not exactly
turns out it more subtle than that 🙂
Copy code
dataDir: /var/pinot/server/data/index
  segmentTarDir: /var/pinot/server/data/segment
pinot.server.instance.dataDir
is the index storage location, and
pinot.server.instance.segmentTarDir
is the tgz dir
helm chart stores them both in the same
data
volume of the dockerfile
k

Ken Krugler

06/02/2021, 12:45 AM
OK - seems like https://docs.pinot.apache.org/developers/advanced/advanced-pinot-setup could use some editing love. Currently says for
pinot.server.instance.dataDir
“Directory to hold all the data”, and for
pinot.server.instance.segmentTarDir
“Directory to hold temporary segments downloaded from Controller or Deep Store”. But based on above, it’s not “all the data”, and it’s not (really) “temporary segments”.
d

Daniel Lavoie

06/02/2021, 12:47 AM
The definition of temporary can be loose maybe? 😄
If you need to rebuild the segment indexes, there’s value in having the tgz persisted.
If the definition of
all the data
is the what is on the query path, it is accurate 😛
k

Ken Krugler

06/02/2021, 12:48 AM
But if only indexes go into
pinot.server.instance.dataDir
, then you’d need to access the tgz to get data in a column that doesn’t have an index on it.
d

Daniel Lavoie

06/02/2021, 12:49 AM
@User to the rescue for that last one 🙂
k

Ken Krugler

06/02/2021, 12:50 AM
🙂 I’ll see what he says when I’m back online after dinner…
Thanks again
d

Daniel Lavoie

06/02/2021, 12:50 AM
my pleasure!
x

Xiang Fu

06/02/2021, 3:54 AM
the purpose of mounting data directory is because of user can plugin the data directory based on their own interest(e.g. local disk or ssd or raid)
Copy code
VOLUME ["${PINOT_HOME}/configs", "${PINOT_HOME}/data"]
are mainly for users can quick plugin and test their own data/configs, especially for quickstart style test
m

Mayank

06/02/2021, 10:50 AM
Could we add this to faq please
d

Daniel Lavoie

06/02/2021, 12:03 PM
Ken’s question was rather if segmentTarDir is on the query path