Hi team, I’m getting a class not found exception w...
# troubleshooting
t
Hi team, I’m getting a class not found exception when doing a SegmentCreationAndUriPush job, the
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
class cannot be found, below is my job config:
Copy code
executionFrameworkSpec:
  name: 'standalone'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndUriPush
inputDirURI: '/root/fetrace_biz/data/'
includeFileNamePattern: 'glob:**/*'
outputDirURI: '<hdfs://pinot/controller/fetrace_biz/>'
overwriteOutput: true
pinotFSSpecs:
  - scheme: hdfs
    className: org.apache.pinot.plugin.filesystem.HadoopPinotFS
    configs:
      hadoop.conf.path: '/opt/hdfs/'
  - scheme: file
    className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
  dataFormat: 'csv'
  className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
tableSpec:
  tableName: 'fetrace_biz'
  schemaURI: '<http://10.168.0.88:31645/tables/fetrace_biz/schema>'
  tableConfigURI: '<http://10.168.0.88:31645/tables/fetrace_biz>'
pinotClusterSpecs:
  - controllerURI: '<http://10.168.0.88:31645>'
exception stack is:
Copy code
2021/01/27 03:53:03.942 ERROR [PinotAdministrator] [main] Exception caught:
java.lang.RuntimeException: Failed to create IngestionJobRunner instance for class - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:137) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:117) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:123) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:164) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:184) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
Caused by: java.lang.ClassNotFoundException: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_275]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_275]
	at org.apache.pinot.spi.plugin.PluginClassLoader.loadClass(PluginClassLoader.java:80) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:293) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:264) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:245) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:135) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	... 4 more
k
Does your Pinot distribution directory have a
plugins
sub-dir, which contains a
pinot-batch-ingestion
sub-dir?
t
Yes, it has all the plugins
k
Including
pinot-batch-ingestion-standalone-*.jar
?
t
Yes
k
What’s the command line you’re using to launch the ingest job?
t
/opt/pinot/bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile fetrace_biz/fetrace_biz-job-spec.yml
JAVA_OPTS=-Dplugins.dir=/opt/pinot/plugins -Dplugins.include=pinot-hdfs CLASSPATH_PREFIX=/root/hadoop-lib/*
k
I’m running successfully without setting those java options, and executing
bin/pinot-admin.sh
. Though I did have to copy some of the Hadoop jars into my Pinot lib sub-dir. Wondering what happens if you get rid of the -Dplugins.include parameter, as I thought Pinot would include everything in the plugins dir by default.
I think if you specify
plugins.include
then it only includes those plugins (comma-separated list)
x
can you try to do
-Dplugins.include=pinot-hdfs,pinot-json,pinot-batch-ingestion-standalone
or just remove
-Dplugins.include
,then the ingestion job will load all the plugins
t
if not specifying JAVA_OPTS, I will get Wrong FS exception
Copy code
java.lang.IllegalArgumentException: Wrong FS: hdfs:/pinot/controller/fetrace_biz/, expected: file:///
x
do you have full stackstrace
k
I think you need to specify the protocol (
file:/
) for the inputDirURI
t
full stack trace:
Copy code
Exception caught:
java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:144) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:117) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:123) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:164) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:184) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs:/pinot/controller/fetrace_biz/, expected: file:///
	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:730) ~[hadoop-common-3.1.1.3.1.0.0-78.jar:?]
	at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:86) ~[hadoop-common-3.1.1.3.1.0.0-78.jar:?]
	at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:548) ~[hadoop-common-3.1.1.3.1.0.0-78.jar:?]
	at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:534) ~[hadoop-common-3.1.1.3.1.0.0-78.jar:?]
	at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:705) ~[hadoop-common-3.1.1.3.1.0.0-78.jar:?]
	at org.apache.pinot.plugin.filesystem.HadoopPinotFS.mkdir(HadoopPinotFS.java:78) ~[pinot-hdfs-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.run(SegmentGenerationJobRunner.java:130) ~[pinot-batch-ingestion-standalone-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:142) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
x
hmm, does it mean that we need to put namespace inside the output dir uri?
btw, there is an off here
Copy code
recordReaderSpec:
  dataFormat: 'csv'
  className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
your data format is csv, but class name is jsonrecord reader
t
I see, thanks
k
FWIW, I wound up having to fully specify the HDFS URI, as in
outputDirURI: '<hdfs://namenode/user/hadoop/pinot-segments/>'
t
the job should be able to read hdfs config from the hdfs config dir
/opt/hdfs
k
calling it a night, good luck
t
thanks, I will try it
x
so you also mount the hdfs config to path
/opt/hdfs
?
@Chinmay Soman @Ting Chen do you recall what has been done at uber
t
yes, its mounted under
/opt/hdfs
@Xiang Fu turned out to be the hdfs xml config has some error, solved by provide a correct config, thanks for your help
x
👍
is there anything we can do we prevent this happening again ? Like give more clear stack trace/ error messages?
t
I think the trace is unclear, also the doc need to be more detail about setting up hdfs.
@Xiang Fu Maybe we can validate the
hdfs.conf.path
before job launch, so if conf path not exist, user should provide namenode instead?
x
agreed
I think fs validation can help root cause the problem