It took me a little time to figure out how to "scr...
# troubleshooting
m
It took me a little time to figure out how to "scroll back" to see log messages over the weekend; they were just into level - no warnings or errors - but otherwise very much like the above - they appeared to skip over the .avro.gz files entirely. And yet (I thought) we configured pinot-minion to accept either .avro or ,.avro.gz files:
Copy code
"ingestionConfig": {
      "batchIngestionConfig": {
        "batchConfigMaps": [
          {
            "inputDirURI": "<gs://pinot-ingestion/transaction>",
            "includeFileNamePattern": "glob:**/*.avro*",
            "excludeFileNamePattern": "glob:**/*.tmp",
            "inputFormat": "avro"
          }
        ],
        "segmentIngestionType": "APPEND",
        "segmentIngestionFrequency": "DAILY"
      }
    },
h
this looks good to me
m
I'm glad that (at least) the configuration file seems OK, @Haitao Zhang. I tried rerunning some of the segments, and I was surprised that the errors seem to happen between the END and the START (naively, I would thought they would appear between START and END). Here's the latest error I see (in trying to extract the .avro file pushed to the Google Cloud bucket that Pinot is watching):
Copy code
2022-09-08T14:38:41.072435473Z. Caught exception while gathering stats

2022-09-08T14:38:41.072469443Z. org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync! at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:223) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.plugin.inputformat.avro.AvroRecordReader.hasNext(AvroRecordReader.java:61) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:63) [pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:37) [pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:178) [pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:152) [pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:101) [pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:118) [pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.plugin.minion.tasks.segmentgenerationandpush.SegmentGenerationAndPushTaskExecutor.generateAndPushSegment(SegmentGenerationAndPushTaskExecutor.java:132) [pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.plugin.minion.tasks.segmentgenerationandpush.SegmentGenerationAndPushTaskExecutor.executeTask(SegmentGenerationAndPushTaskExecutor.java:118) [pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.runInternal(TaskFactoryRegistry.java:113) [pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.run(TaskFactoryRegistry.java:89) [pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.helix.task.TaskRunner.run(TaskRunner.java:75) [pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?] Caused by: java.io.IOException: Invalid sync! at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:318) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:212) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] ... 18 more
2022-09-08T14:38:41.073685703Z. Caught exception while executing task: Task_SegmentGenerationAndPushTask_e3496893-4bcc-4b48-a479-17b0412b2824_1662647820418_59
2022-09-08T14:38:41.073714533Z. java.lang.RuntimeException: Failed to execute SegmentGenerationAndPushTask at org.apache.pinot.plugin.minion.tasks.segmentgenerationandpush.SegmentGenerationAndPushTaskExecutor.executeTask(SegmentGenerationAndPushTaskExecutor.java:120) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.runInternal(TaskFactoryRegistry.java:113) [pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.run(TaskFactoryRegistry.java:89) [pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.helix.task.TaskRunner.run(TaskRunner.java:75) [pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?] Caused by: org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync! at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:223) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.plugin.inputformat.avro.AvroRecordReader.hasNext(AvroRecordReader.java:61) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:63) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:37) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:178) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:152) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:101) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:118) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.plugin.minion.tasks.segmentgenerationandpush.SegmentGenerationAndPushTaskExecutor.generateAndPushSegment(SegmentGenerationAndPushTaskExecutor.java:132) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.plugin.minion.tasks.segmentgenerationandpush.SegmentGenerationAndPushTaskExecutor.executeTask(SegmentGenerationAndPushTaskExecutor.java:118) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] ... 9 more Caused by: java.io.IOException: Invalid sync! at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:318) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:212) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.plugin.inputformat.avro.AvroRecordReader.hasNext(AvroRecordReader.java:61) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:63) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:37) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:178) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:152) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:101) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:118) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.plugin.minion.tasks.segmentgenerationandpush.SegmentGenerationAndPushTaskExecutor.generateAndPushSegment(SegmentGenerationAndPushTaskExecutor.java:132) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] at org.apache.pinot.plugin.minion.tasks.segmentgenerationandpush.SegmentGenerationAndPushTaskExecutor.executeTask(SegmentGenerationAndPushTaskExecutor.java:118) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-54d2813c00093e1baada9f7c8627c9360f133328] ... 9 more
h
Thanks for the detailed info
I just realized that we are using avro to read avro.gz files, could we use
avro
format instead of
avro.gz
. I think that should be the issue