Hello Everyone, I am creating one hybrid which can...
# general
r
Hello Everyone, I am creating one hybrid which can ingest data from Kafka topic(streaming data) as well as from hdfs Location(batch ingestion). I am aware about stream ingestion process to ingest data from Kafka topic and have created multiple realtime tables. Now I am creating one hybrid table for one of the Kafka topic data is also available at hdfs location for the same topic. I am going through the documents but in offline-config-table.json I couldn't find any properties where we are passing source location as hdfs location.Kindly suggest what is the process to ingest from hdfs also in same table.
x
it’s in server config, for server conf, you should already have it
r
In server.conf file we have 2 locations pinot.server.instancr.datadir Pinot.server.instance.segmentTarDir and both are local path which I have given while creating configuration file for deepstorage is this same location you are taking about? @User
When I am checking on git I am able to see multiple files here i.e. Hadoop ingestion.yaml, schemaFile, configfile etc.
So not sure which all files are required for creating hybrid table.
x
right it’s local path
pinot queries are serving from the servers which have local segments
r
Oh okay @User so for loading Pinot hybrid table I need to load data from pinot.server.instance.datadir location ryt? And which all files are required
x
Yes
You need to push segments to offline table, which all the segments files are backed up in deep store and also loaded by every server
r
These files are available
Which all configuration files needs to create @User ? .I am able to see these 5 files on git.
x
for batch job, you need to run ingestion job to push data to pinot offline table
r
Kindly suggest if anything needs to update in this file since my source location is hdfs .i.e.In hybrid table I want to load data from hdfs location as well as Kafka topic. So I have created this hadoooingestionJobSpec.yaml.I have data inside /user/hdfs/rawdata directory. P.s. I have already loaded data from Kafka topic now trying to load from hdfs in same table. @User
x
where do you want to run this job? Do you have a hadoop or spark cluster ?
r
@User Hadoop
x
Then you can just update this file following the docs
r
Ok Thanks @User
Hi @User I have created the files as per the document.Kindly let me know if any changes are required .kindly review the properties which I have mentioned in note(last 3 lines) and suggest if any changes are required
x
input/output/stagingDirURI should all be your hdfs path
r
@User When I am executing this file .In logs after successfully loading all the jar it's showing this error.
This is my file .where hdfs://10.190.135.180:8030 is the value of my fs.defaultFS and /user/hdfs is hdfs directory
x
the error logs says it’s failed to list all the input files
the inputDirUri is invalid
can you try to replace
=
to
%3d
not sure if that is the issue
r
After table I used %3d SI_TRANSATION it's showing illegal character in path at index 51 same issue it's showing now for =
In hdfs path it's not taking' ='
x
hmm
ok
so it’s still =
i remember it’s ok to use
=
in s3
haven’t tried in hdfs though
r
Sorry @User it is my mistake. I am giving space after = but in actual path we don't have space .let me retry
x
ok
r
I am able to fetch list of files , but while running map reduce it's showing this error. @User
x
why it’s txt?
oic
r
At hdfs loc we have files with .txt extension
x
ic
which version of pinot are you using
r
0.7.1
x
also, do you have mapper logs?
r
From where I can collect mapper log?
x
your hadoop should give you the mapper id and you can see mapper log?
r
Ok @User let me check
x
sure, you can find the task id in your screenshot
those attempts
the logs error stack points to the top level exception, so it doesn’t give enough information to debug
r
From that jobid I could only find these 4 files at hdfs loc
I am able to see these files at staging location and from input directory also it's reading all the files name.i guess something wrong with output director. @User
x
hmm, i cannot tell without seeing the mapper log
can you provide the job conf again
r
Ok
@User kindly check
x
hmm, i think this is fine
need to check mapper log for detailed stacktrace
r
So I am able to see these 4 files for the jobid at hdfs location
Which one I need to check I checked all these 4 files but couldn't find any error message @User
x
hmm but do you see hadoop ui and see the logs?
when you start the job, hadoop logs should gives you a link to access the ui
hadoop job tracker
try to find your hadoop job history log
r
@User PFA log details from Hadoop UI.