Facundo Bianco
03/16/2022, 8:18 PM"dateTimeFieldSpecs": [{
"name": "timestampCustom",
"dataType": "STRING",
"format" : "1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss.SSZZ",
"granularity": "1:MILLISECONDS"
}]
Table is generated successfully but POST command returns
I discovered is related to date format, could you kindly indicate how should it be? I used this site to generate the custom format. Thanks in advance!Copy code{ "code": 500, "error": "Caught exception when ingesting file into table: foo_OFFLINE. null" }
Xiaobing
03/16/2022, 8:51 PMSSSZ
instead? as noted in the website.Mayank
Diana Arnos
03/17/2022, 1:10 PM{
"name": "deletedAt",
"dataType": "STRING",
"format": "1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss.SSSZ",
"granularity": "1:MILLISECONDS"
}
Facundo Bianco
03/17/2022, 1:55 PM{
"schemaName": "ads13",
"dimensionFieldSpecs": [
{
"name": "id",
"dataType": "INT"
},
{
"name": "value",
"dataType": "STRING"
}
],
"dateTimeFieldSpecs": [
{
"name": "timestampCustom",
"dataType": "STRING",
"format": "1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss.SSSZ",
"granularity": "1:MILLISECONDS"
}
]
}
• table-config.json
{
"tableName": "ads13",
"tableType": "OFFLINE",
"segmentsConfig": {
"replication": 1,
"timeColumnName": "timestampCustom",
"timeType": "MILLISECONDS",
"retentionTimeUnit": "DAYS",
"retentionTimeValue": 365
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant"
},
"tableIndexConfig": {
"loadMode": "MMAP"
},
"ingestionConfig": {
"batchIngestionConfig": {
"segmentIngestionType": "APPEND",
"segmentIngestionFrequency": "DAILY"
}
},
"metadata": {}
}
• data.cvs
id,value,timestampCustom
1,foo,2020-12-31T19:59:21.522-0400
And then I run:
/opt/pinot/bin/pinot-admin.sh AddTable -tableConfigFile table-config.json -schemaFile table-schema.json -exec
curl -X POST -F file=@data.csv -H "Content-Type: multipart/form-data" "<http://localhost:9000/ingestFromFile?tableNameWithType=ads13_OFFLINE&batchConfigMapStr=%7B%22inputFormat%22%3A%22csv%22%7D>"
Mayank
Facundo Bianco
03/17/2022, 2:36 PMHH:mm:ss
) because colon mark does some trick.
For example, this format works 1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HHmm
(removed colon mark) with this _data.csv_:
id,value,timestampCustom
1,foo,2020-12-31T1959
But doesn't work this format 1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm
(added colon mark after hour) with this _data.csv_:
id,value,timestampCustom
1,foo,2020-12-31T19:59
What do you recommend? Thanks in advance!Mayank
Eduardo Cusa
03/17/2022, 2:51 PMingestionJobSpec.yaml
instead of using the API? Is it worth it to try?Xiaobing
03/17/2022, 4:15 PMjava.lang.IllegalArgumentException: null
at shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:108) ~[startree-pinot-all-0.10.0-ST.36-jar-with-dependencies.jar:0.10.0-ST.36-565e66063a82d0b4a61c73bfcddbbb3cd0d436ac]
at org.apache.pinot.segment.spi.creator.name.SimpleSegmentNameGenerator.generateSegmentName(SimpleSegmentNameGenerator.java:53) ~[startree-pinot-all-0.10.0-ST.36-jar-with-dependencies.jar:0.10.0-ST.36-565e66063a82d0b4a61c73bfcddbbb3cd0d436ac]
ingestFromFile endpoint
is hard coded to use ‘simple’ name generator (as it was added mainly for test purpose) and simple generator doesn’t work with date formatted time column. For date formatted time column, it’s recommended to use ‘normalizedDate’ segment name generator type (docs).
@User when using ingestion job, the generator type can be configured to ‘normalizedDate’ (docs), hopefully overcoming this issue.Eduardo Cusa
03/17/2022, 6:29 PMPINOT_VERSION=0.10.0-SNAPSHOT
. Is safe to download 0.8.0 jars and try again?Eduardo Cusa
03/17/2022, 7:18 PMCaused by: java.lang.ClassNotFoundException: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner
Attached the yaml usedXiaobing
03/17/2022, 8:33 PMEduardo Cusa
03/17/2022, 9:08 PMCaused by: java.lang.ClassNotFoundException: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner
Eduardo Cusa
03/17/2022, 9:33 PMexport PINOT_VERSION=0.8.0
export PINOT_ROOT_DIR=/opt/pinot
export SPARK_HOME=/root/spark-2.4.8-bin-hadoop2.7
export PINOT_DISTRIBUTION_DIR=/opt/pinot
cd ${PINOT_DISTRIBUTION_DIR}
${SPARK_HOME}/bin/spark-submit \
--class org.apache.pinot.tools.admin.PinotAdministrator \
--master "local[2]" \
--deploy-mode client \
--conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml" \
--conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar" \
local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar \
LaunchDataIngestionJob \
-jobSpecFile '/opt/pinot/data/ingestJob.yml'
Xiaobing
03/17/2022, 9:51 PMEduardo Cusa
03/18/2022, 7:54 PMSparkSegmentGenerationJobRunner
isn't included in the jar
. doing a grep inside the jar, I only found the IngestionJobRunner
interface
jar tvf ../lib/pinot-all-0.8.0-jar-with-dependencies.jar | grep JobRunner
318 Tue Aug 24 23:32:56 UTC 2021 org/apache/pinot/spi/ingestion/batch/runner/IngestionJobRunner.class
I was thinking to re-build the 0.8.0 version locally and push it into the k8s cluster. Another option could be to re-deploy the pinot helm using the 0.9.3 version.
what do you recommend?
ThanksFacundo Bianco
03/23/2022, 12:26 PMMayank
Xiang Fu
org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand
It will be available in 0.10.0 releaseXiang Fu
Eduardo Cusa
04/07/2022, 6:20 PMcp -r plugins-external/pinot-batch-ingestion plugins/
and move forward. 😄
Then got the following error:
Caused by: shaded.com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.10.0
at shaded.com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
at shaded.com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
at shaded.com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:808)
at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
I debugging locally version 0.10.0 with spark-2.4.0-bin-hadoop2.7
from here: https://archive.apache.org/dist/spark/spark-2.4.0/
Any suggestion is welcomed!Eduardo Cusa
04/07/2022, 6:27 PMexport PINOT_VERSION=0.10.0
export PINOT_DISTRIBUTION_DIR=/opt/pinot
export SPARK_HOME=/root/spark-2.4.0-bin-hadoop2.7
cd ${PINOT_DISTRIBUTION_DIR}
${SPARK_HOME}/bin/spark-submit \
--class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \
--master "local[2]" \
--deploy-mode client \
--conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml" \
--conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-s3/pinot-s3-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-input-format/pinot-parquet/pinot-parquet-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-hdfs/pinot-hdfs-${PINOT_VERSION}-shaded.jar" \
local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar \
-jobSpecFile '/opt/pinot/data/ingestJob.yml'
Xiang Fu
Xiang Fu
Eduardo Cusa
04/08/2022, 12:24 PMroot@pinot-controller:/opt/pinot/data# ls ../plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/
pinot-batch-ingestion-spark-0.10.0-shaded.jar
Xiang Fu
Eduardo Cusa
04/08/2022, 5:45 PMpinot-batch-ingestion-spark
using the same jackson as spark I was able to move forward.
Now, I got an error when the Job is trying to push the segments metadata:
java.io.IOException: Failed to find file: metadata.properties in: /tmp/segmentTar-cb3750db-872e-4bbe-9a04-7ce859a18581.tar.gz
at org.apache.pinot.common.utils.TarGzCompressionUtils.untarOneFile(TarGzCompressionUtils.java:198) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.segment.local.utils.SegmentPushUtils.generateSegmentMetadataFile(SegmentPushUtils.java:344) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.segment.local.utils.SegmentPushUtils.sendSegmentUriAndMetadata(SegmentPushUtils.java:238) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentMetadataPushJobRunner$1.call(SparkSegmentMetadataPushJobRunner.java:124) ~[pinot-batch-ingestion-spark-0.10.0-shaded.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentMetadataPushJobRunner$1.call(SparkSegmentMetadataPushJobRunner.java:112) ~[pinot-batch-ingestion-spark-0.10.0-shaded.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
I checked and the file doesn't exist:
root@pinot-controller:/tmp# ls -lah
total 51M
drwxrwxrwt 1 root root 4.0K Apr 8 17:15 .
drwxr-xr-x 1 root root 4.0K Apr 8 17:03 ..
-rw-r--r-- 1 root root 26M Apr 8 17:15 adv1_OFFLINE_2022-03-01_2022-03-01_0.tar.gz
-rw-r--r-- 1 root root 26M Apr 8 17:15 adv1_OFFLINE_2022-03-01_2022-03-01_1.tar.gz
drwxr-xr-x 4 root root 4.0K Apr 6 18:30 data
drwxr-xr-x 4 root root 4.0K Apr 8 17:15 pinot-49d1110b-8481-4dfa-a058-3a22348445ce
drwxr-xr-x 4 root root 4.0K Apr 8 17:04 pinot-4f8933d2-c89c-487e-8737-ebce2a72bcfc
drwxr-xr-x 4 root root 4.0K Apr 8 17:05 pinot-87b83a51-068c-4cac-b714-5f627bb0ac58
drwxr-xr-x 4 root root 4.0K Apr 8 17:15 pinot-dbcff700-a3d7-4445-bc87-b3179e1b87c8
Xiang Fu
Xiang Fu
Xiang Fu
Xiang Fu
Eduardo Cusa
04/11/2022, 12:21 PMSegmentCreationAndTarPush
the ingestion finished ok.
I will open other thread about segments.
thanks!