What is the process to use HDFS as Pinot deepstrage Apache Pinot #general

Join Slack

What is the process to use HDFS as Pinot deepstrag...

# general

05/06/2021, 1:23 PM

What is the process to use HDFS as Pinot deepstrage?

Chinmay Soman

05/06/2021, 4:39 PM

@User ^^ looks like we dont have a good doc. Mind updating it ?

Ting Chen

05/06/2021, 4:40 PM

ok.

05/06/2021, 5:16 PM

Thanks @User @User

Ting Chen

05/06/2021, 5:20 PM

https://docs.pinot.apache.org/users/tutorials/use-s3-as-deep-store-for-pinot

Ting Chen

05/06/2021, 5:20 PM

have your read the above tutorial?

Ting Chen

05/06/2021, 5:20 PM

HDFS setup is similar except the storage now is HDFS instead of s3

05/06/2021, 5:37 PM

Ok @Thanks Ting Chen.Will read this and try. will connect again in case of any issue.

Chinmay Soman

05/06/2021, 5:38 PM

@User might be useful to copy that and modify with a working example

Chinmay Soman

05/06/2021, 5:38 PM

cause I'm sure others will have similar questions

Ting Chen

05/06/2021, 5:58 PM

sure would do that. Just that the s3 tutorial looks very close to HDFS setup too.

05/07/2021, 7:59 AM

@User @User I followed the tutorials which you shared and created 3 files inside Apache-pinot-version-bin/bin 1. Controller.conf 2. Server.conf 3. IngestionJobSpec.yaml I couldn't understand some properties so need your help to modify those properties kindly guide me. I am attaching my 3 files which I have created new in pdf format.Kindly review and help

05/07/2021, 8:04 AM

P.s. I have not done the spark Job and spark submit step. Since as a part of Poc I am not creating any spark job , I have already Kafka topic and it's integrated to pinot now want to use deepatorage as Hadoop.

05/07/2021, 3:12 PM

@User @User @User

Ken Krugler

05/07/2021, 4:35 PM

@User - I think you want something more like this for controller:

Copy code

pinot.controller.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS
pinot.controller.segment.fetcher.protocols=file,http,hdfs
pinot.controller.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Note you’re setting up various

.hdfs=xxx

configurations, NOT the

.s3=xxx

ones from the tutorial, since you want to use HDFS, right?

Ken Krugler

05/07/2021, 4:36 PM

And something like this for the server.conf:

Copy code

pinot.server.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS
pinot.server.segment.fetcher.protocols=file,http,hdfs
pinot.server.segment.fetcher.hdfsclass=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Ken Krugler

05/07/2021, 4:39 PM

Also your job spec isn’t going to work for URI push - the resulting segments have to be sent to a shared file system (HDFS, in your case). So the output dir has to be

<hdfs://some/path/to/dir>

, which is what’s meant by the comment

expected to have schema configured in PinotFS

. You need to ensure HDFS access is set up properly on the server where you’re running your standalone batch job, so that it’s able to push segments to HDFS.

Mayank

05/07/2021, 4:45 PM

@User thanks a lot. Is the pinot docs missing these steps for HDFS (I saw ali cloud only). Since you have first hand experience, would be super helpful if you got help add the docs (I can work with you on that)?

05/07/2021, 4:45 PM

@User but in case of hdfs I need to pass namenode details ryt? How should I provide those details and in which file..? And these extras 3 files server.config controler.config and job files are required or do I need to update in existing files

Ken Krugler

05/07/2021, 4:46 PM

@User I’ve got this on my to-do list, but sadly it’s gotten pushed down by things like “get my sewer connection working” 🙂

Mayank

05/07/2021, 4:47 PM

Uh-oh. Feel free to ping me and I can work with you on adding the docs.

Ken Krugler

05/07/2021, 4:53 PM

Assuming I do want to document this end-to-end, what’s the right location? Or is there an existing page I should just update?

Mayank

05/07/2021, 4:54 PM

A sibling to this page would be good, wdyt? https://docs.pinot.apache.org/users/tutorials/use-s3-as-deep-store-for-pinot

Mayank

05/07/2021, 4:54 PM

Would appreciate your input as a user on whether this is where you would look for it

05/07/2021, 5:06 PM

@User Followed this link and created these 3 files (attached in earlier comments) , It's for S3. I am trying to store on HDFS.

Ken Krugler

05/07/2021, 5:31 PM

@User - that location seems reasonable

Mayank

05/07/2021, 5:32 PM

Thanks @User. Let's add there once you get a chance. I saw several folks asking about it and I was surprised we didn't have any instructions on HDFS as deep-storage.

Ting Chen

05/07/2021, 6:34 PM

I am working on an example for HDFS setup based on our installation and will share it shortly.

Ting Chen

05/07/2021, 6:48 PM

@User The hadoop HFDS config files should be referenced from the server and controller conf. E.g., the following is our config (I will post a more detail tutorial later).

Ting Chen

05/07/2021, 6:52 PM

Copy code

controller.data.dir=root_dir_to_your_hdfs_dir
pinot.controller.segment.fetcher.protocols=file,http,hdfs,viewfs
pinot.controller.segment.fetcher.viewfs.hadoop.conf.path=/pathToYourHDFSConfigDir
pinot.controller.segment.fetcher.viewfs.class=YourVersionOfSegmentFetcher (Check out its subclasses)
pinot.controller.segment.fetcher.viewfs.hadoop.kerberos.principle=XXXXX (If you need secure access)
pinot.controller.segment.fetcher.viewfs.hadoop.kerberos.keytab=XXXXX (If you need secure access)

pinot.controller.storage.factory.class.viewfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS
pinot.controller.storage.factory.viewfs.llc.hdfs.config.dir=/pathToYourHDFSConfigDir
pinot.controller.storage.factory.viewfs.hadoop.kerberos.principle=XXXXX (If you need secure access)
pinot.controller.storage.factory.viewfs.hadoop.kerberos.keytab=xxxxx (If you need secure access)

Ting Chen

05/07/2021, 6:56 PM

Copy code

pinot.server.segment.fetcher.protocols=file,http,hdfs,viewfs
pinot.server.segment.fetcher.viewfs.hadoop.conf.path=/pathToYourHDFSConfigDir
pinot.server.segment.fetcher.viewfs.class=YourVersionOfSegmentFetcher (Check out its subclasses)
pinot.server.segment.fetcher.viewfs.hadoop.kerberos.principle=XXXXX (If you need secure access)
pinot.server.segment.fetcher.viewfs.hadoop.kerberos.keytab=XXXXX (If you need secure access)

Mayank

05/07/2021, 8:03 PM

Thanks @User, that would be super useful.

05/08/2021, 3:37 PM

@User thanks a lot .where can I find VersionOfSegmentFetcher

Akash

05/08/2021, 10:31 PM

@User I have created a doc for the whole setup here. https://github.com/SleepyThread/pinot-docs/blob/master/basics/getting-started/hdfs-as-deepstorage.md Please validate, once you are ok with the docs I will add a pull request in the main Pinot docs.

Open in Slack

Previous Next