Hi everyone, I was looking at the docs for using a...
# general
l
Hi everyone, I was looking at the docs for using an hdfs instance as segment store for Pinot. Though I don't fine that much in the docs on how to connect the two. Does anyone know how to do it? This could be an improvement to the documentation for beginners. Thanks.
t
I supposed you want to use HDFS to store both realtime and offline segments? We have been using this setup in Uber for the past year. I can share a doc later today.
a
actually, is it possibly to have a layered storage. Like, store cached or last 30 days of batch/stream segments on local SSDs. Everything else -> GCS?
t
Pinot now has a storage interface called PinotFS. I think it is possible to do you own layered storage extending that interface.
but there is no out of shelf class but such impl as far as i know.
a
got it
s
@Ting Chen would it be possible if you can add your documents to https://readthedocs.org/projects/pinot/ or modify as needed? It is under the source tree.
t
@Subbu Subramaniam will do.
l
@Ting Chen exactly, there is nothing out-of-the-box to start with. I'm new to Pinot and I would like to start using it one step at the time. Once I've understood it better I'll contribute to the docs and tutorials, but for beginners it is very difficult
t
@Isabi In fact there is a section in Pinot doc on how to plug in different storage options: https://pinot.readthedocs.io/en/latest/pluggable_storage.html#
It does not talk about realtime table thought. I just filed a PR: https://github.com/apache/incubator-pinot/pull/4783 to add the user guide. We tested in Uber using a variation of org.apache.pinot.filesystem.HadoopPinotFS. You should be able to use it directly.
@Islam abdelaziz
@Subbu Subramaniam Pls take a look at the PR when free.
Thanks.
l
@Ting Chen so I cannot have Pinot consuming/creating segments in real-time and then storing them into hadoop and query them in the future?
Also, the docs page only mentions that hadoopfs and azure data lake are supported, but it does not specify how to connect them. The alternative is to create a configuration on your own, but to me it is not super clear (I would love to experiment with it, but I only have an old laptop)
t
The connection to HDFS store or any other storage should be standard to HFDS (and other such storage). You can check the source class code if you want more detail info. The docs should provide sample configs almost the same and tested in our Prod envs already.
You can still setup you local test environment on a single laptop though.
l
That's my point: how to connect Pinot to HDFS is not super clear to me. How can I connect them?
@Ting Chen any news about the docs on how your Pinot setup? Thanks
s
@Ting Chen can u share your doc with @isabi