Ilya Yatsishin
07/04/2022, 1:36 PMTrying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Creating an executor service with 10 threads(Job parallelism: 10, available cores: 16.)
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Start pushing segments: []... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@73ab3aac] for table hits
Mayank
Ilya Yatsishin
07/04/2022, 3:34 PMPINOT_VERSION=0.10.0
wget <https://downloads.apache.org/pinot/apache-pinot-$PINOT_VERSION/apache-pinot-$PINOT_VERSION-bin.tar.gz>
tar -zxvf apache-pinot-$PINOT_VERSION-bin.tar.gz
./apache-pinot-$PINOT_VERSION-bin/bin/pinot-admin.sh QuickStart -type batch &
sleep 30
./apache-pinot-$PINOT_VERSION-bin/bin/pinot-admin.sh AddTable -tableConfigFile offline_table.json -schemaFile schema.json -exec
./apache-pinot-$PINOT_VERSION-bin/bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile local.yaml
Rong R
07/04/2022, 3:40 PMJAVA_OPTS
is set? if it is then pinot will use your env var setting log4j configuration instead of using the one in the conf directoryIlya Yatsishin
07/04/2022, 3:47 PMRong R
07/04/2022, 3:53 PMat org.apache.pinot.spi.filesystem.LocalPinotFS.listFiles(LocalPinotFS.java:113) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.run(SegmentGenerationJobRunner.java:169) ~[pinot-batch-ingestion-standalone-0.10.0-shaded.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:146) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
... 13 more
java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:148)
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:117)
at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:121)
at org.apache.pinot.tools.Command.call(Command.java:33)
at org.apache.pinot.tools.Command.call(Command.java:29)
at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
at picocli.CommandLine.access$1300(CommandLine.java:145)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
at picocli.CommandLine.execute(CommandLine.java:2078)
at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:161)
at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:192)
Caused by: java.nio.file.NoSuchFileException: /home/ubuntu
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
at java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148)
at java.base/java.nio.file.Files.readAttributes(Files.java:1851)
at java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:220)
at java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:277)
at java.base/java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:323)
at java.base/java.nio.file.FileTreeIterator.<init>(FileTreeIterator.java:71)
at java.base/java.nio.file.Files.walk(Files.java:3918)
at java.base/java.nio.file.Files.walk(Files.java:3973)
at org.apache.pinot.spi.filesystem.LocalPinotFS.listFiles(LocalPinotFS.java:113)
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.run(SegmentGenerationJobRunner.java:169)
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:146)
... 13 more
Ilya Yatsishin
07/04/2022, 3:54 PMIlya Yatsishin
07/04/2022, 3:54 PMRong R
07/04/2022, 3:56 PMGonzalo Ortiz
07/04/2022, 4:05 PMIlya Yatsishin
07/04/2022, 4:06 PMGonzalo Ortiz
07/04/2022, 4:06 PMStart pushing segments: []... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@390877d2] for table hits
and then the program finished with status 0Ilya Yatsishin
07/04/2022, 4:07 PMmini.tsv
is just a head of the real table.Gonzalo Ortiz
07/04/2022, 4:08 PMGonzalo Ortiz
07/04/2022, 4:08 PMRong R
07/04/2022, 4:10 PMRong R
07/04/2022, 4:23 PM'glob:**/*.tsv'
instead of just the *.tsv
long answer:
b/c we can't find the proper tsv file. it tries to ingest []
files into segments. which means empty.
b/c the list of files to be ingest is empty, there's no success message for each of the segments being generated.Rong R
07/04/2022, 4:24 PMSummary, successfully pushed segments: [ xxx, yyy ]. created from list of files [ xxx, yyy]
then we clearly knew that we didn't filter any files in our processing listIlya Yatsishin
07/04/2022, 4:27 PM'glob:**/*.tsv'
means all subdirectories.
what about glob:.tsv
it should work based on your docs https://docs.pinot.apache.org/configuration-reference/job-specification#top-level-spec
https://www.digitalocean.com/community/tools/glob?comments=true&glob=%2A.js&matches=fal[…]ld.js&tests=%2F%2F%20This%20won%27t%20match%21&tests=globsRong R
07/04/2022, 4:29 PMglob:*.tsv
meaning tsv files in your root directory.Ilya Yatsishin
07/04/2022, 4:31 PMRong R
07/04/2022, 4:33 PMKen Krugler
07/04/2022, 11:45 PMglob:/absolute/path/to/your/input/directory/*.tsv
?Ken Krugler
07/04/2022, 11:46 PMIlya Yatsishin
07/05/2022, 10:44 AMinputDir
are really confusing. You need to add a note at least to docs/examples about it.
inputDirURI: '/home/ubuntu/pinot'
includeFileNamePattern: 'glob:/home/ubuntu/pinot/mini.tsv'
But if you also made a next step you encountered the next issue.
1. I don’t get why it logs something about file format AVRO, when i explicitly have
recordReaderSpec:
dataFormat: 'csv'
2. It does not work with string column that has space in it? Or I don’t understand what is the issue.
Using class: org.apache.pinot.plugin.inputformat.csv.CSVRecordReader to read segment, ignoring configured file format: AVRO
RecordReaderSegmentCreationDataSource is used
Caught exception while gathering stats
java.lang.RuntimeException: Caught exception while transforming data type for column: URL
at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.transform(DataTypeTransformer.java:95) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.segment.local.recordtransformer.CompositeTransformer.transform(CompositeTransformer.java:83) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:80) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:42) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:173) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:155) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:104) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:118) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:263) ~[pinot-batch-ingestion-standalone-0.10.0-shaded.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.lang.IllegalArgumentException: Cannot read single-value from Object[]: [<http://bonprix.ru/index.ru/cinema/art/A00387,3797>), ru)&bL] for column: URL
at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.standardize(DataTypeTransformer.java:145) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.transform(DataTypeTransformer.java:63) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
... 13 more
Failed to generate Pinot segment for file - file:/home/ubuntu/ClickHouse/benchmark/pinot/mini.tsv
java.lang.RuntimeException: Caught exception while transforming data type for column: URL
at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.transform(DataTypeTransformer.java:95) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.segment.local.recordtransformer.CompositeTransformer.transform(CompositeTransformer.java:83) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:80) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:42) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:173) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:155) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:104) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:118) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:263) ~[pinot-batch-ingestion-standalone-0.10.0-shaded.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.lang.IllegalArgumentException: Cannot read single-value from Object[]: [<http://bonprix.ru/index.ru/cinema/art/A00387,3797>), ru)&bL] for column: URL
at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.standardize(DataTypeTransformer.java:145) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.transform(DataTypeTransformer.java:63) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
... 13 more
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Start pushing segments: []... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@65ae095c] for table hits
Ken Krugler
07/05/2022, 3:11 PMKen Krugler
07/05/2022, 3:13 PMCannot read single-value from Object[]: [<http://bonprix.ru/index.ru/cinema/art/A00387,3797>), ru)&bL] for column: URL
Then I’d guess you haven’t encoded the CSV data properly, and the ‘,’ in the URL is confusing the CSV parser.Ilya Yatsishin
07/05/2022, 3:36 PMdelimeter
in this value, so quotes are not required. For example last one was Druid.
I got the same dataset in Parquet and tried to import without this CSV parser.
1. It again writes something about AVRO.
2. It sees this amount of records RecordReader initialized will read a total of 99997497 records.
But then it tries to push data after 43172732
- not sure if it was planning to process the rest or failed earlier.
3. There is still empty list of segments that it is trying to push. I don’t understand if something is wrong or not. Nothing is written
4. It looks job finished and no data can be seen. I can’t see error
at row 43172732. reading next block
block read in memory in 1682 ms. row count = 262144
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Start pushing segments: []... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@7b7b3edb] for table hits
Full log https://pastila.nl/?017b92b4/361513a635141c533150a5b79f6a4848Ken Krugler
07/05/2022, 3:38 PMRong R
07/05/2022, 3:51 PMRong R
07/05/2022, 3:52 PMHamza
07/08/2022, 3:30 PM