Hello Everyone I am creating one hybrid which can ingest dat Apache Pinot #general

Hello Everyone, I am creating one hybrid which can...

05/31/2021, 5:45 AM

Hello Everyone, I am creating one hybrid which can ingest data from Kafka topic(streaming data) as well as from hdfs Location(batch ingestion). I am aware about stream ingestion process to ingest data from Kafka topic and have created multiple realtime tables. Now I am creating one hybrid table for one of the Kafka topic data is also available at hdfs location for the same topic. I am going through the documents but in offline-config-table.json I couldn't find any properties where we are passing source location as hdfs location.Kindly suggest what is the process to ingest from hdfs also in same table.

Xiang Fu

05/31/2021, 5:57 AM

it’s in server config, for server conf, you should already have it

05/31/2021, 6:23 AM

In server.conf file we have 2 locations pinot.server.instancr.datadir Pinot.server.instance.segmentTarDir and both are local path which I have given while creating configuration file for deepstorage is this same location you are taking about? @User

05/31/2021, 6:25 AM

When I am checking on git I am able to see multiple files here i.e. Hadoop ingestion.yaml, schemaFile, configfile etc.

05/31/2021, 6:25 AM

So not sure which all files are required for creating hybrid table.

Xiang Fu

05/31/2021, 6:33 AM

right it’s local path

Xiang Fu

05/31/2021, 6:34 AM

pinot queries are serving from the servers which have local segments

05/31/2021, 6:40 AM

Oh okay @User so for loading Pinot hybrid table I need to load data from pinot.server.instance.datadir location ryt? And which all files are required

Xiang Fu

05/31/2021, 6:50 AM

Yes

Xiang Fu

05/31/2021, 6:51 AM

You need to push segments to offline table, which all the segments files are backed up in deep store and also loaded by every server

05/31/2021, 8:00 AM

These files are available

05/31/2021, 2:33 PM

Which all configuration files needs to create @User ? .I am able to see these 5 files on git.

Xiang Fu

05/31/2021, 7:38 PM

for batch job, you need to run ingestion job to push data to pinot offline table

Xiang Fu

05/31/2021, 7:39 PM

https://docs.pinot.apache.org/basics/data-import/batch-ingestion

06/01/2021, 5:36 AM

Kindly suggest if anything needs to update in this file since my source location is hdfs .i.e.In hybrid table I want to load data from hdfs location as well as Kafka topic. So I have created this hadoooingestionJobSpec.yaml.I have data inside /user/hdfs/rawdata directory. P.s. I have already loaded data from Kafka topic now trying to load from hdfs in same table. @User

Xiang Fu

06/01/2021, 6:45 AM

where do you want to run this job? Do you have a hadoop or spark cluster ?

06/01/2021, 7:08 AM

@User Hadoop

Xiang Fu

06/01/2021, 7:31 AM

Then you can just update this file following the docs

Xiang Fu

06/01/2021, 7:32 AM

https://docs.pinot.apache.org/basics/data-import/batch-ingestion/hadoop

06/01/2021, 8:17 AM

Ok Thanks @User

06/02/2021, 2:03 AM

Hi @User I have created the files as per the document.Kindly let me know if any changes are required .kindly review the properties which I have mentioned in note(last 3 lines) and suggest if any changes are required

Xiang Fu

06/02/2021, 3:09 AM

input/output/stagingDirURI should all be your hdfs path

06/02/2021, 6:31 AM

@User When I am executing this file .In logs after successfully loading all the jar it's showing this error.

06/02/2021, 6:36 AM

This is my file .where hdfs://10.190.135.180:8030 is the value of my fs.defaultFS and /user/hdfs is hdfs directory

Xiang Fu

06/02/2021, 6:37 AM

the error logs says it’s failed to list all the input files

Xiang Fu

06/02/2021, 6:37 AM

the inputDirUri is invalid

Xiang Fu

06/02/2021, 6:38 AM

can you try to replace

%3d

Xiang Fu

06/02/2021, 6:38 AM

not sure if that is the issue

06/02/2021, 6:51 AM

After table I used %3d SI_TRANSATION it's showing illegal character in path at index 51 same issue it's showing now for =

06/02/2021, 6:52 AM

In hdfs path it's not taking' ='

Xiang Fu

06/02/2021, 6:55 AM

hmm

Xiang Fu

06/02/2021, 6:57 AM

Xiang Fu

06/02/2021, 6:57 AM

so it’s still =

Xiang Fu

06/02/2021, 6:57 AM

i remember it’s ok to use

in s3

Xiang Fu

06/02/2021, 6:57 AM

haven’t tried in hdfs though

06/02/2021, 7:00 AM

Sorry @User it is my mistake. I am giving space after = but in actual path we don't have space .let me retry

Xiang Fu

06/02/2021, 7:00 AM

06/02/2021, 7:09 AM

I am able to fetch list of files , but while running map reduce it's showing this error. @User

Xiang Fu

06/02/2021, 7:10 AM

why it’s txt?

Xiang Fu

06/02/2021, 7:10 AM

oic

06/02/2021, 7:13 AM

At hdfs loc we have files with .txt extension

Xiang Fu

06/02/2021, 7:15 AM

Xiang Fu

06/02/2021, 7:15 AM

which version of pinot are you using

06/02/2021, 7:20 AM

0.7.1

Xiang Fu

06/02/2021, 7:21 AM

also, do you have mapper logs?

06/02/2021, 7:27 AM

From where I can collect mapper log?

Xiang Fu

06/02/2021, 7:32 AM

your hadoop should give you the mapper id and you can see mapper log?

06/02/2021, 7:43 AM

Ok @User let me check

Xiang Fu

06/02/2021, 7:47 AM

sure, you can find the task id in your screenshot

Xiang Fu

06/02/2021, 7:48 AM

those attempts

Xiang Fu

06/02/2021, 7:48 AM

the logs error stack points to the top level exception, so it doesn’t give enough information to debug

06/02/2021, 7:59 AM

From that jobid I could only find these 4 files at hdfs loc

06/02/2021, 8:17 AM

I am able to see these files at staging location and from input directory also it's reading all the files name.i guess something wrong with output director. @User

Xiang Fu

06/02/2021, 8:18 AM

hmm, i cannot tell without seeing the mapper log

Xiang Fu

06/02/2021, 8:18 AM

can you provide the job conf again

06/02/2021, 8:19 AM

06/02/2021, 8:23 AM

@User kindly check

Xiang Fu

06/02/2021, 8:38 AM

hmm, i think this is fine

Xiang Fu

06/02/2021, 8:38 AM

need to check mapper log for detailed stacktrace

06/02/2021, 9:06 AM

So I am able to see these 4 files for the jobid at hdfs location

06/02/2021, 9:07 AM

Which one I need to check I checked all these 4 files but couldn't find any error message @User

Xiang Fu

06/02/2021, 9:07 AM

hmm but do you see hadoop ui and see the logs?

Xiang Fu

06/02/2021, 9:08 AM

when you start the job, hadoop logs should gives you a link to access the ui

Xiang Fu

06/02/2021, 9:09 AM

hadoop job tracker

Xiang Fu

06/02/2021, 9:09 AM

try to find your hadoop job history log

06/02/2021, 12:45 PM

@User PFA log details from Hadoop UI.

Open in Slack

Previous Next