was there some recent version changes made to hado...
# pinot-dev
n
was there some recent version changes made to hadoop/parquet dependencies? I’m unable to upload a Parquet format file via this API anymore.
Copy code
<http://localhost:9000/help#/Table/ingestFromFile>
This was working a few weeks ago. Now i get this exception during segment creation
Copy code
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/lib/input/FileInputFormat
	at java.lang.ClassLoader.defineClass1(Native Method) ~[?:1.8.0_282]
	at java.lang.ClassLoader.defineClass(ClassLoader.java:756) ~[?:1.8.0_282]
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) ~[?:1.8.0_282]
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) ~[?:1.8.0_282]
	at java.net.URLClassLoader.access$100(URLClassLoader.java:74) ~[?:1.8.0_282]
	at java.net.URLClassLoader$1.run(URLClassLoader.java:369) ~[?:1.8.0_282]
	at java.net.URLClassLoader$1.run(URLClassLoader.java:363) ~[?:1.8.0_282]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_282]
	at java.net.URLClassLoader.findClass(URLClassLoader.java:362) ~[?:1.8.0_282]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_282]
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) ~[?:1.8.0_282]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_282]
	at org.apache.parquet.HadoopReadOptions$Builder.<init>(HadoopReadOptions.java:95) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-89a22f097c5ff26396e58950c90d764066a56121]
	at org.apache.parquet.HadoopReadOptions.builder(HadoopReadOptions.java:79) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-89a22f097c5ff26396e58950c90d764066a56121]
	at org.apache.parquet.hadoop.ParquetReader$Builder.<init>(ParquetReader.java:198) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-89a22f097c5ff26396e58950c90d764066a56121]
	at org.apache.parquet.avro.AvroParquetReader$Builder.<init>(AvroParquetReader.java:107) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-89a22f097c5ff26396e58950c90d764066a56121]
	at org.apache.parquet.avro.AvroParquetReader$Builder.<init>(AvroParquetReader.java:99) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-89a22f097c5ff26396e58950c90d764066a56121]
	at org.apache.parquet.avro.AvroParquetReader.builder(AvroParquetReader.java:48) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-89a22f097c5ff26396e58950c90d764066a56121]
	at org.apache.pinot.plugin.inputformat.parquet.ParquetUtils.getParquetAvroReader(ParquetUtils.java:51) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-89a22f097c5ff26396e58950c90d764066a56121]
	at org.apache.pinot.plugin.inputformat.parquet.ParquetAvroRecordReader.init(ParquetAvroRecordReader.java:52) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-89a22f097c5ff26396e58950c90d764066a56121]
	at org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader.init(ParquetRecordReader.java:47) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-89a22f097c5ff26396e58950c90d764066a56121]
	at org.apache.pinot.spi.data.readers.RecordReaderFactory.getRecordReaderByClass(RecordReaderFactory.java:149) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-89a22f097c5ff26396e58950c90d764066a56121]
	at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.getRecordReader(SegmentIndexCreationDriverImpl.java:122) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-89a22f097c5ff26396e58950c90d764066a56121]
	at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:98) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-89a22f097c5ff26396e58950c90d764066a56121]
	at org.apache.pinot.controller.util.FileIngestionUtils.buildSegment(FileIngestionUtils.java:129) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-89a22f097c5ff26396e58950c90d764066a56121]
	at org.apache.pinot.controller.util.FileIngestionHelper.buildSegmentAndPush(FileIngestionHelper.java:101) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-89a22f097c5ff26396e58950c90d764066a56121]
	at org.apache.pinot.controller.api.resources.PinotIngestionRestletResource.ingestData(PinotIngestionRestletResource.java:197) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-89a22f097c5ff26396e58950c90d764066a56121]
	at org.apache.pinot.controller.api.resources.PinotIngestionRestletResource.ingestFromFile(PinotIngestionRestletResource.java:127) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-89a22f097c5ff26396e58950c90d764066a56121]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_282]
x
n
@Amrish Lal any idea what I should do to get around this?
a
@Neha Pawar Parquet version was updated to 1.11.1 due to dependency on Avro as part of upgrading to Avro 1.9.2. Are these old files? Is there a way, I can reproduce the issue here?
We did this upgrade so that we could be more consistent with other tools that use Avro.
n
yeah to reproduce, you can try this API
<http://localhost:9000/help#/Table/ingestFromFile>
with a parquet data file
this was my exact call
curl -i -X POST -F file=@data.parquet "<http://localhost:9000/ingestFromFile?tableNameWithType=foo_OFFLINE&batchConfigMapStr=%7B%0A%20%20%22inputFormat%22%3A%22parquet%22%0A%7D>"
a
@Jack ^^ FYI
n
and any basic table config and schema. My sample files were
i will also check if I can add parquet case to the test. I think it was only checking json
a
Thanks. I will try it out. I am wondering if migrating these files to new version of Parquet is feasible? The exception seems be thrown within the parquet code, so obviously it doesn't look like they are backward compatible.
n
oh sure, so you’re saying the exception is because the data file was generated using an older parquet?
a
Yes, I think so. Avro seems to be a bit finicky when changing versions and thats why we are trying to move to version 1.9.2 so that we have better binary compatibility specially among different tools that use Avro.
j
NoClassDefFoundError
basically means that your code depends on and it is present at compile time but not found at runtime. @Amrish Lal can you try to find out whether it’s due to the incorrect scope of your newly added dependency?
a
Yeah, I think its for this: <dependency> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>${hadoop.version}</version> <scope>provided</scope> </dependency>
j
so you’d probably need to add
<scope>compile</scope>
back to
hadoop-common
and same for
hadoop-mapreduce-client-core
in pinot-parquet.pom file
a
@Neha Pawar I tested using
./pinot-tools/target/pinot-tools-pkg/bin/quick-start-batch.sh
. I had to add the following jars to classpath
hadoop-auth-2.7.0.jar hadoop-mapreduce-client-core-2.7.0.jar hadoop-common-2.7.0.jar
to make it work. @Jack seems like this is an extra step besides fixing dependency in pom.xml.
n
interesting. This step should’ve been necessary even before the PR changes rt? I wonder how it worked before
a
@Neha Pawar are you using quick-start-batch.sh as well while getting these errors?
n
yes
and also started seeing these errors in a internal environment.
where it used to work, like 10ish days ago
a
hadoop-mapreduce-client-core-2.7.0.jar
seems to have changed things.
I am not quite sure how to copy jars and set classpath for
quick-start-batch.sh
, but will send a PR to fix the dependencies in pom.xml.
n
Thanks @Amrish Lal, will take a look
a
From what I can see if we include the dependencies in
pinot-tool/pom.xml
the jars should get copied into
pinot-tools/target/pinot-tools-pkg/lib
which is in classpath for
quick-start-batch.sh
. Not sure if thats the right/desired approach though.
n
can we merge this?