hello there, getting the following exception.. `Ca...
# general
p
hello there, getting the following exception..
Caused by: java.lang.IllegalArgumentException: Parameter 'Bucket' must not be null
I am using 0.5.0 GenerationJobRunner,  segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,  segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner} includeFileNamePattern: glob:**/*.parquet inputDirURI: s3://edp-pinot-data/nem13/ jobType: SegmentCreationAndUriPush outputDirURI: s3://edp-pinot-segments/nem13/segments overwriteOutput: true pinotClusterSpecs: - {controllerURI: 'http://localhost:9000'} pinotFSSpecs: - {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file} - className: org.apache.pinot.plugin.filesystem.S3PinotFS  configs: {region: ap-southeast-2}  scheme: s3 pushJobSpec: {pushAttempts: 1, pushParallelism: 1, pushRetryIntervalMillis: 1000,  segmentUriPrefix: 's3://edp-pinot-segments', segmentUriSuffix: null} recordReaderSpec: {className: org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader,  configClassName: null, configs: null, dataFormat: parquet} segmentNameGeneratorSpec: null tableSpec: {schemaURI: 'http://localhost:9000/tables/nem13/schema', tableConfigURI: 'http://localhost:9000/tables/nem13',  tableName: nem13} Am I missing anything? Please help!!!
k
@Kartik Khare ^^
k
@Prakash Tirumalareddy Can you share the proper yaml config I am seeing recordReaderSpec contains '{}' and configClassName and configs are set as null
p
Please see attached file. Sorry for late reply. It was late night :)
d
@Prakash Tirumalareddy can you also provide a complete stack rather than just a cause message? It usually greatly help with the analysis.
p
sure
just incase I remove all comments from jobSpec file to make simple to read.
@Daniel Lavoie any finding please suggest?
@Kartik Khare any hints pls?
n
looks like someone had reported this same issue: https://github.com/apache/incubator-pinot/issues/5835
and it has been fixed by @Kartik Khare on master: https://github.com/apache/incubator-pinot/pull/5836
this commit is not part of 0.5.0.
Could you try with the build from source @Prakash Tirumalareddy?
p
oh ok
Sure I will try from source. Thank you very much @Neha Pawar
k
Hi
p
hello
k
Actually there is a small hack also You can just mention prefix as s3:// and it should work
In 0.5.0
p
you mean this :
pushJobSpec:
pushAttempts: 1
pushRetryIntervalMillis: 1000
segmentUriPrefix: "s3://"
segmentUriSuffix: ""
k
My bad. This "hack" was for some other issue. Right now, you'll have to build from master only. You can simply clone the repo and run
mvn clean package -DskipTests -Pbin-dist
p
ok sure. I will build
@Kartik Khare yes that worked but got another issue (sorry for this inconvenience).
2020/10/07 00:44:54.420 INFO [PinotFSFactory] [main] Initializing PinotFS for scheme s3, classname org.apache.pinot.plugin.filesystem.S3PinotFS
2020/10/07 00:44:54.891 INFO [S3PinotFS] [main] mkdir <s3://edp-pinot-segments/nem13/segments>
2020/10/07 00:44:55.598 INFO [S3PinotFS] [main] Listed 1 files from URI: <s3://edp-pinot-data/nem13/>, is recursive: true
2020/10/07 00:44:56.043 INFO [S3PinotFS] [main] Copy <s3://edp-pinot-data/nem13/currentregisterreaddate=2002-06-14/active_ind=Y/part-00000-2c2c776c-12f8-45a0-96fa-e402b13fdb57.c000.snappy.parquet> to local /var/folders/xs/bknv88ln05g5z3dgzss7whw80000gn/T/pinot-956cf81e-458b-45f3-9669-c24019eeacd3/input/part-00000-2c2c776c-12f8-45a0-96fa-e402b13fdb57.c000.snappy.parquet
2020/10/07 00:44:56.176 WARN [SegmentIndexCreationDriverImpl] [main] Using class: org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader to read segment, ignoring configured file format: AVRO
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/Path
at org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader.init(ParquetRecordReader.java:46)
at org.apache.pinot.spi.data.readers.RecordReaderFactory.getRecordReaderByClass(RecordReaderFactory.java:133)
at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.getRecordReader(SegmentIndexCreationDriverImpl.java:120)
at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:96)
at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:104)
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.run(SegmentGenerationJobRunner.java:190)
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:142)
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:117)
at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:123)
at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main(LaunchDataIngestionJobCommand.java:65)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.Path
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:602)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 10 more
ANy thoughts/pointers?
k
That's strange. Can you share the ingestion config again along with the java version?
p
:apache-pinot-incubating-0.6.0-SNAPSHOT-bin prakash$ java -version openjdk version "13.0.2" 2020-01-14 OpenJDK Runtime Environment (build 13.0.2+8) OpenJDK 64-Bit Server VM (build 13.0.2+8, mixed mode, sharing)
k
Can you set
segmentUriPrefix: ""
and try again
d
You need to build Pinot with Java 11
I see JDK 13.
k
@Daniel Lavoie We do have checks for quickstart on JDK 13 as well as 14 in github but the build has to be on JDK 11?
d
Indeed, sorry read the stack a bit too quickly :
java.lang.NoClassDefFoundError: org/apache/hadoop/fs/Path
p
tried with following, same error segmentUriPrefix: "" segmentUriSuffix: ""
I see there is open issue? is this related? https://github.com/apache/incubator-pinot/issues/5387
k
Yes, it is related. Will find a long term fix for it. For now Can you try the solution mentioned in the issue
p
yes trying now
Attached logs..
d
Copy code
Caused by: java.lang.IllegalArgumentException: INT96 not yet implemented.
k
That seems like a avro specific error
d
Yes, I think your schema is too advanced for the once supported by the
org.apache.parquet.avro.AvroSchemaConverter
from Pinot
p
see attached schema I converted that from Athena DDL to JSON schema..
k
This schema is fine. The issue with avro schema i.e. avsc file
p
.avsc is this file generated by Pinot?
please let me know is there anything I am doing wrong?
k
my bad. What parquet version are you using to write the data?
p
python-snappy==0.5.4 fastparquet==0.3.3
k
p
Sorry I didn't get it. You mean to change in schema file? or changing generation of parquet file? { "name": "datetime", "dataType": "INT64" },
k
No in the python parquet writer. See the last config on the link I mentioned
p
ok boss. that is big change, because it may impact of other things currently running. I need to discuss with team. I will try this tomorrow morning. Time to go to bed 1.40am now 🙂. Please let me know if anything I can do without changing source data. Thanks kindly for all support and help.
n
Hi @Prakash Tirumalareddy, looks like INT96 is deprecated in parquet future versions, so changing it to INT64 will be the only solution here. https://github.com/apache/parquet-mr/pull/579/files
p
@Kartik Khare @Daniel Lavoie @Neha Pawar it worked. Thanks again for your kind help. Any performance guide to load data faster way? inputDirURI: "s3://edp-pinot-data/nem13/" includeFileNamePattern: "glob:**/*.parquet" outputDirURI: "s3://edp-pinot-segments/nem13/segments"
n
Whats the raw data size right now? How many files? How long did it take? Also please share the table config/schema.
Btw, can we move to troubleshooting channel?
p
Sure I will send all the details about data.