https://pinot.apache.org/ logo
t

troywinter

01/27/2021, 3:59 AM
Hi team, I’m getting a class not found exception when doing a SegmentCreationAndUriPush job, the
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
class cannot be found, below is my job config:
Copy code
executionFrameworkSpec:
  name: 'standalone'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndUriPush
inputDirURI: '/root/fetrace_biz/data/'
includeFileNamePattern: 'glob:**/*'
outputDirURI: '<hdfs://pinot/controller/fetrace_biz/>'
overwriteOutput: true
pinotFSSpecs:
  - scheme: hdfs
    className: org.apache.pinot.plugin.filesystem.HadoopPinotFS
    configs:
      hadoop.conf.path: '/opt/hdfs/'
  - scheme: file
    className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
  dataFormat: 'csv'
  className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
tableSpec:
  tableName: 'fetrace_biz'
  schemaURI: '<http://10.168.0.88:31645/tables/fetrace_biz/schema>'
  tableConfigURI: '<http://10.168.0.88:31645/tables/fetrace_biz>'
pinotClusterSpecs:
  - controllerURI: '<http://10.168.0.88:31645>'
exception stack is:
Copy code
2021/01/27 03:53:03.942 ERROR [PinotAdministrator] [main] Exception caught:
java.lang.RuntimeException: Failed to create IngestionJobRunner instance for class - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:137) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:117) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:123) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:164) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:184) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
Caused by: java.lang.ClassNotFoundException: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_275]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_275]
	at org.apache.pinot.spi.plugin.PluginClassLoader.loadClass(PluginClassLoader.java:80) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:293) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:264) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:245) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:135) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	... 4 more
k

Ken Krugler

01/27/2021, 4:03 AM
Does your Pinot distribution directory have a
plugins
sub-dir, which contains a
pinot-batch-ingestion
sub-dir?
t

troywinter

01/27/2021, 4:04 AM
Yes, it has all the plugins
k

Ken Krugler

01/27/2021, 4:05 AM
Including
pinot-batch-ingestion-standalone-*.jar
?
t

troywinter

01/27/2021, 4:07 AM
Yes
k

Ken Krugler

01/27/2021, 4:07 AM
What’s the command line you’re using to launch the ingest job?
t

troywinter

01/27/2021, 4:09 AM
/opt/pinot/bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile fetrace_biz/fetrace_biz-job-spec.yml
JAVA_OPTS=-Dplugins.dir=/opt/pinot/plugins -Dplugins.include=pinot-hdfs CLASSPATH_PREFIX=/root/hadoop-lib/*
k

Ken Krugler

01/27/2021, 4:12 AM
I’m running successfully without setting those java options, and executing
bin/pinot-admin.sh
. Though I did have to copy some of the Hadoop jars into my Pinot lib sub-dir. Wondering what happens if you get rid of the -Dplugins.include parameter, as I thought Pinot would include everything in the plugins dir by default.
I think if you specify
plugins.include
then it only includes those plugins (comma-separated list)
x

Xiang Fu

01/27/2021, 4:15 AM
can you try to do
-Dplugins.include=pinot-hdfs,pinot-json,pinot-batch-ingestion-standalone
or just remove
-Dplugins.include
,then the ingestion job will load all the plugins
t

troywinter

01/27/2021, 4:16 AM
if not specifying JAVA_OPTS, I will get Wrong FS exception
Copy code
java.lang.IllegalArgumentException: Wrong FS: hdfs:/pinot/controller/fetrace_biz/, expected: file:///
x

Xiang Fu

01/27/2021, 4:18 AM
do you have full stackstrace
k

Ken Krugler

01/27/2021, 4:18 AM
I think you need to specify the protocol (
file:/
) for the inputDirURI
t

troywinter

01/27/2021, 4:18 AM
full stack trace:
Copy code
Exception caught:
java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:144) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:117) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:123) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:164) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:184) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs:/pinot/controller/fetrace_biz/, expected: file:///
	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:730) ~[hadoop-common-3.1.1.3.1.0.0-78.jar:?]
	at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:86) ~[hadoop-common-3.1.1.3.1.0.0-78.jar:?]
	at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:548) ~[hadoop-common-3.1.1.3.1.0.0-78.jar:?]
	at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:534) ~[hadoop-common-3.1.1.3.1.0.0-78.jar:?]
	at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:705) ~[hadoop-common-3.1.1.3.1.0.0-78.jar:?]
	at org.apache.pinot.plugin.filesystem.HadoopPinotFS.mkdir(HadoopPinotFS.java:78) ~[pinot-hdfs-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.run(SegmentGenerationJobRunner.java:130) ~[pinot-batch-ingestion-standalone-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:142) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-255202ec4fc7df2283f7c275d8e9025a26cf3274]
x

Xiang Fu

01/27/2021, 4:22 AM
hmm, does it mean that we need to put namespace inside the output dir uri?
btw, there is an off here
Copy code
recordReaderSpec:
  dataFormat: 'csv'
  className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
your data format is csv, but class name is jsonrecord reader
t

troywinter

01/27/2021, 4:24 AM
I see, thanks
k

Ken Krugler

01/27/2021, 4:25 AM
FWIW, I wound up having to fully specify the HDFS URI, as in
outputDirURI: '<hdfs://namenode/user/hadoop/pinot-segments/>'
t

troywinter

01/27/2021, 4:25 AM
the job should be able to read hdfs config from the hdfs config dir
/opt/hdfs
k

Ken Krugler

01/27/2021, 4:26 AM
calling it a night, good luck
t

troywinter

01/27/2021, 4:26 AM
thanks, I will try it
x

Xiang Fu

01/27/2021, 4:27 AM
so you also mount the hdfs config to path
/opt/hdfs
?
@Chinmay Soman @Ting Chen do you recall what has been done at uber
t

troywinter

01/27/2021, 4:29 AM
yes, its mounted under
/opt/hdfs
@Xiang Fu turned out to be the hdfs xml config has some error, solved by provide a correct config, thanks for your help
x

Xiang Fu

01/27/2021, 4:39 AM
👍
is there anything we can do we prevent this happening again ? Like give more clear stack trace/ error messages?
t

troywinter

01/27/2021, 4:48 AM
I think the trace is unclear, also the doc need to be more detail about setting up hdfs.
@Xiang Fu Maybe we can validate the
hdfs.conf.path
before job launch, so if conf path not exist, user should provide namenode instead?
x

Xiang Fu

01/27/2021, 5:11 AM
agreed
I think fs validation can help root cause the problem