Another question regarding using hdfs as pinot dee...
# general
t
Another question regarding using hdfs as pinot deep storage, I have put hadoop-client-3.1.1.3.1.0.0-78.jar, hadoop-common-3.1.1.3.1.0.0-78.jar, hadoop-hdfs-3.1.1.3.1.0.0-78.jar, hadoop-hdfs-client-3.1.1.3.1.0.0-78.jar these jars in pinot controller’s classpath, but controller still reporting class not found for org/apache/hadoop/fs/FSDataInputStream, what other jars should I include? Below are the stack trace for this error:
Copy code
2021/01/18 10:26:32.704 INFO [ControllerStarter] [main] Initializing PinotFSFactory
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
	at java.lang.Class.getDeclaredConstructors0(Native Method)
	at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
	at java.lang.Class.getConstructor0(Class.java:3075)
	at java.lang.Class.getConstructor(Class.java:1825)
	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:295)
	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:264)
	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:245)
	at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:53)
	at org.apache.pinot.spi.filesystem.PinotFSFactory.init(PinotFSFactory.java:74)
	at org.apache.pinot.controller.ControllerStarter.initPinotFSFactory(ControllerStarter.java:481)
	at org.apache.pinot.controller.ControllerStarter.setUpPinotController(ControllerStarter.java:329)
	at org.apache.pinot.controller.ControllerStarter.start(ControllerStarter.java:287)
	at org.apache.pinot.tools.service.PinotServiceManager.startController(PinotServiceManager.java:116)
	at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:91)
	at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.lambda$startBootstrapServices$0(StartServiceManagerCommand.java:234)
	at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:286)
	at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startBootstrapServices(StartServiceManagerCommand.java:233)
	at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.execute(StartServiceManagerCommand.java:183)
	at org.apache.pinot.tools.admin.command.StartControllerCommand.execute(StartControllerCommand.java:130)
	at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:162)
	at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:182)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 21 more
And below are the startup opts:
Copy code
JAVA_OPTS	-Xms256M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/opt/pinot/gc-pinot-controller.log -Dlog4j2.configurationFile=/opt/pinot/conf/pinot-controller-log4j2.xml -Dplugins.dir=/opt/pinot/plugins -Dplugins.include=pinot-hdfs -classpath /opt/hadoop-lib/hadoop-common-3.1.1.3.1.0.0-78.jar:/opt/hadoop-lib/hadoop-client-3.1.1.3.1.0.0-78.jar:/opt/hadoop-lib/hadoop-hdfs-3.1.1.3.1.0.0-78.jar:/opt/hadoop-lib/hadoop-hdfs-client-3.1.1.3.1.0.0-78.jar
k
I believe Pinot is built against Hadoop 2.7, so I’d try putting jars for that version of Hadoop on the classpath, not 3.1.1
Though Hadoop 3.1.1 source also has FSDataInputStream in hadoop-common, in the same package. So switching to 2.7 seems less likely to fix your issue, sorry.
t
You will also need to provide proper Hadoop dependencies jars from your Hadoop installation to your Pinot startup scripts.
from the documentation, https://docs.pinot.apache.org/basics/data-import/pinot-file-system/import-from-hdfs , I think this means I should provide the version that match my hadoop version?
Solved by setting CLASSPATH_PREFIX env instead of passing to the startup args.
k
That seems odd (that
-classpath
doesn’t work, but
CLASSPATH_PREFIX
does work).
t
The startup script will override passed -classpath args, it will read from
CLASSPATH_PREFIX
and append to
CLASSPATH
, then add to the program args.