Phúc Huỳnh
04/20/2021, 4:40 AM21/04/20 03:03:42 ERROR org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand: Got exception to kick off standalone data ingestion job -
java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:144)
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:117)
at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:132)
at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main(LaunchDataIngestionJobCommand.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at <http://org.apache.spark.deploy.SparkSubmit.org|org.apache.spark.deploy.SparkSubmit.org>$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.nio.file.FileSystemNotFoundException: Provider "gs" not installed
at java.nio.file.Paths.get(Paths.java:147)
at org.apache.pinot.plugin.filesystem.GcsPinotFS.copy(GcsPinotFS.java:262)
at org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner.run(SparkSegmentGenerationJobRunner.java:344)
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:142)
... 15 more
java.nio.file.Path
to org.apache.hadoop.fs.Path
Have any idea ?Jackie
04/20/2021, 5:20 AMXiang Fu
google-cloud-nio
dependency into pinot-gcs pom file?<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-nio</artifactId>
<version>0.120.0-alpha</version>
</dependency>
Phúc Huỳnh
04/20/2021, 6:52 AMXiang Fu
Phúc Huỳnh
04/20/2021, 7:07 AMXiang Fu
Phúc Huỳnh
04/20/2021, 7:09 AMXiang Fu
Phúc Huỳnh
04/20/2021, 7:09 AMXiang Fu
Phúc Huỳnh
04/20/2021, 7:18 AMjava.nio.file.Paths
in gcs-pinot, then custom another Paths.get function ?Xiang Fu
Phúc Huỳnh
04/20/2021, 10:38 AMCaused by: java.lang.IllegalStateException: PinotFS for scheme: gs has not been initialized
at shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:518)
at org.apache.pinot.spi.filesystem.PinotFSFactory.create(PinotFSFactory.java:80)
at org.apache.pinot.plugin.ingestion.batch.common.SegmentPushUtils.sendSegmentUris(SegmentPushUtils.java:158)
at org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentUriPushJobRunner$1.call(SparkSegmentUriPushJobRunner.java:122)
at org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentUriPushJobRunner$1.call(SparkSegmentUriPushJobRunner.java:117)
Xiang Fu
Phúc Huỳnh
04/22/2021, 5:11 AM21/04/22 07:25:48 ERROR org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner: Failed to tar plugins directory
java.io.IOException: Request to write '4096' bytes exceeds size in header of '12453302' bytes for entry './pinot-plugins.tar.gz'
at org.apache.commons.compress.archivers.tar.TarArchiveOutputStream.write(TarArchiveOutputStream.java:449)
Xiang Fu
plugins.dir
? in your java cmd?public static void main(String[] args) {
try {
TarGzCompressionUtils.createTarGzFile(
new File("/Users/xiangfu/workspace/pinot-dev/pinot-distribution/target/apache-pinot-incubating-0.8.0-SNAPSHOT-bin/apache-pinot-incubating-0.8.0-SNAPSHOT-bin/plugins"),
new File("/tmp/plugin.tar.gz"));
} catch (IOException e) {
e.printStackTrace();
}
}
Phúc Huỳnh
04/23/2021, 2:36 AM21/04/23 02:33:34 INFO org.apache.pinot.spi.plugin.PluginManager: Plugins root dir is [./]
21/04/23 02:33:34 INFO org.apache.pinot.spi.plugin.PluginManager: Trying to load plugins: [[pinot-gcs]]
Full log:
:: retrieving :: org.apache.spark#spark-submit-parent-adf0fd1c-d000-4782-8499-d41f1396e726
confs: [default]
0 artifacts copied, 9 already retrieved (0kB/17ms)
21/04/23 02:33:34 INFO org.apache.pinot.spi.plugin.PluginManager: Plugins root dir is [./]
21/04/23 02:33:34 INFO org.apache.pinot.spi.plugin.PluginManager: Trying to load plugins: [[pinot-gcs]]
21/04/23 02:33:35 INFO org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher: SegmentGenerationJobSpec:
!!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
authToken: null
cleanUpOutputDir: false
excludeFileNamePattern: null
executionFrameworkSpec:
extraConfigs: {stagingDir: '<gs://bucket_name/tmp/>'}
name: spark
segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner
segmentMetadataPushJobRunnerClassName: null
segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentTarPushJobRunner
segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentUriPushJobRunner
failOnEmptySegment: false
includeFileNamePattern: glob:**/*.avro
inputDirURI: <gs://bucket_name/rule_logs/>
jobType: SegmentCreationAndUriPush
outputDirURI: <gs://bucket_name/data/>
overwriteOutput: true
pinotClusterSpecs:
- {controllerURI: '<http://localhost:8080>'}
pinotFSSpecs:
- {className: org.apache.pinot.plugin.filesystem.GcsPinotFS, configs: null, scheme: gs}
pushJobSpec: {pushAttempts: 2, pushParallelism: 2, pushRetryIntervalMillis: 1000,
segmentUriPrefix: null, segmentUriSuffix: null}
recordReaderSpec: {className: org.apache.pinot.plugin.inputformat.avro.AvroRecordReader,
configClassName: null, configs: null, dataFormat: avro}
segmentCreationJobParallelism: 0
segmentNameGeneratorSpec:
configs: {segment.name.prefix: rule_logs_uat, exclude.sequence.id: 'true'}
type: simple
tableSpec: {schemaURI: '<http://localhost:8080/tables/RuleLogsUAT/schema>',
tableConfigURI: '<http://localhost:8080/tables/RuleLogsUAT>', tableName: RuleLogsUAT}
tlsSpec: null
-Dplugins.include=pinot-gcs
i found another jar files. Maybe it’s root-cause issue ?Xiang Fu
Phúc Huỳnh
04/23/2021, 3:55 AM/tmp/2694644d46744db78cbe27e6dd833f2a
so it’s hard to get absolute pathXiang Fu
$(pwd)/plugins
work?Phúc Huỳnh
04/23/2021, 7:00 AM$(pwd)/plugins
will be current dir on remote control machine.
Not worker exec machineXiang Fu
org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner
Phúc Huỳnh
04/23/2021, 7:22 AMXiang Fu
Phúc Huỳnh
04/23/2021, 7:51 AMPinotFS for scheme: gs has not been initialized
again 😞pinot-batch-ingestion-spark
from release branchXiang Fu
Phúc Huỳnh
04/24/2021, 2:32 AMv2/segments
seem conflict vs spark sendSegmentUris.
Something null that’s make API return internal server errorsStart sending table RuleLogsUAT segment URIs: [gs://{bucket{/data/year=2020/RuleLogsUAT_OFFLINE_18316_18627_0.tar.gz] to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@499782c3]"
Sending table RuleLogsUAT segment URI: gs://{bucket}data/year=2020/RuleLogsUAT_OFFLINE_18316_18627_0.tar.gz to location: https://{domain} for
Sending request: https://{domain}/v2/segments to controller: pinot-controller-0.pinot-controller-headless.analytics.svc.cluster.local, version: Unknown
Caught temporary exception while pushing table: RuleLogsUAT segment uri: gs://{bucket}/data/year=2020/RuleLogsUAT_OFFLINE_18316_18627_0.tar.gz to https://{domain}, will retry
Got error status code: 500 (Internal Server Error) with reason: "Caught internal server exception while uploading segment" while sending request: https://{domain}/v2/segments to controller: pinot-controller-0.pinot-controller-headless.analytics.svc.cluster.local, version: Unknown at org.apache.pinot.common.utils.FileUploadDownloadClient.sendRequest(FileUploadDownloadClient.java:451) at org.apache.pinot.common.utils.FileUploadDownloadClient.sendSegmentUri(FileUploadDownloadClient.java:771) at org.apache.pinot.segment.local.utils.SegmentPushUtils.lambda$sendSegmentUris$1(SegmentPushUtils.java:178) at org.apache.pinot.spi.utils.retry.BaseRetryPolicy.attempt(BaseRetryPolicy.java:50) at org.apache.pinot.segment.local.utils.SegmentPushUtils.sendSegmentUris(SegmentPushUtils.java:175) at org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentUriPushJobRunner$1.call(SparkSegmentUriPushJobRunner.java:127) at org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentUriPushJobRunner$1.call(SparkSegmentUriPushJobRunner.java:117) at org.apache.spark.api.java.JavaRDDLike.$anonfun$foreach$1(JavaRDDLike.scala:352) at org.apache.spark.api.java.JavaRDDLike.$anonfun$foreach$1$adapted(JavaRDDLike.scala:352) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1012) at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1012) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2242) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
SEGMENT
-> exception when multiPart file nullXiang Fu
segmentMetadataPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentMetadataPushJobRunner
jobType: SegmentCreationAndMetadataPush
Phúc Huỳnh
04/26/2021, 2:48 AM4: *519 client sent invalid header line: "DOWNLOAD_URI: <gs://my-bucket-test-data-np/data/RuleLogsUAT_OFFLINE_18117_18731_0.tar.gz>" while reading client request headers, client: 10.255.160.94,
When the use of underscores is disabled, request header fields whose names contain underscores are marked as invalid and become subject to the ignore_invalid_headers directive.
Xiang Fu