Ken Krugler
06/02/2021, 9:33 PMmultiValueDelimiter
recordReaderSpec:
dataFormat: 'csv'
className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
configs:
multiValueDelimiter: '\ufff0'
With 0.6.0 this works fine. With 0.7.1 I get:
shaded.com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `char` out of VALUE_STRING token
at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig["multiValueDelimiter"])
at shaded.com.fasterxml.jackson.databind.exc.MismatchedInputException.from(MismatchedInputException.java:59) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at shaded.com.fasterxml.jackson.databind.DeserializationContext.reportInputMismatch(DeserializationContext.java:1442) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at shaded.com.fasterxml.jackson.databind.DeserializationContext.handleUnexpectedToken(DeserializationContext.java:1216) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at shaded.com.fasterxml.jackson.databind.DeserializationContext.handleUnexpectedToken(DeserializationContext.java:1126) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at shaded.com.fasterxml.jackson.databind.deser.std.NumberDeserializers$CharacterDeserializer.deserialize(NumberDeserializers.java:448) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at shaded.com.fasterxml.jackson.databind.deser.std.NumberDeserializers$CharacterDeserializer.deserialize(NumberDeserializers.java:405) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at shaded.com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeAndSet(MethodProperty.java:129) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:288) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:151) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at shaded.com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:1719) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at shaded.com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1350) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at org.apache.pinot.spi.utils.JsonUtils.jsonNodeToObject(JsonUtils.java:117) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:88) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$run$0(SegmentGenerationJobRunner.java:199) ~[pinot-batch-ingestion-standalone-0.7.1-shaded.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_291]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_291]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_291]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_291]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_291]
Mayank
Ken Krugler
06/02/2021, 11:29 PMmultiValueDelimiter: 'a'
, but it’s not OK if I do something like multiValueDelimiter: '\u0040'
. Where in the code is the job yaml file converted to a RecordReaderSpec?Mayank
IngestionJobLauncher.java
Ken Krugler
06/02/2021, 11:41 PMMayank
Ken Krugler
06/03/2021, 3:20 PMrecordReaderSpec:
dataFormat: 'csv'
className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
configs:
multiValueDelimiter: "\ufff0"
(double-quotes for multiValueDelimiter
value) fixed the problem.Mayank
Ken Krugler
06/03/2021, 3:21 PMsnakeyaml
`1.16`…hmmmMayank
Ken Krugler
06/03/2021, 3:25 PMMayank
Ken Krugler
06/11/2021, 6:40 PMMayank