I’m running into an issue when building segments w...
# troubleshooting
k
I’m running into an issue when building segments with 0.7.1 that didn’t occur with 0.6.0, due to (I think) using a Unicode code point for my
multiValueDelimiter
The relevant bit of my job file is:
Copy code
recordReaderSpec:
  dataFormat: 'csv'
  className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
  configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
  configs:
    multiValueDelimiter: '\ufff0'
With 0.6.0 this works fine. With 0.7.1 I get:
Copy code
shaded.com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `char` out of VALUE_STRING token
 at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig["multiValueDelimiter"])
	at shaded.com.fasterxml.jackson.databind.exc.MismatchedInputException.from(MismatchedInputException.java:59) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at shaded.com.fasterxml.jackson.databind.DeserializationContext.reportInputMismatch(DeserializationContext.java:1442) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at shaded.com.fasterxml.jackson.databind.DeserializationContext.handleUnexpectedToken(DeserializationContext.java:1216) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at shaded.com.fasterxml.jackson.databind.DeserializationContext.handleUnexpectedToken(DeserializationContext.java:1126) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at shaded.com.fasterxml.jackson.databind.deser.std.NumberDeserializers$CharacterDeserializer.deserialize(NumberDeserializers.java:448) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at shaded.com.fasterxml.jackson.databind.deser.std.NumberDeserializers$CharacterDeserializer.deserialize(NumberDeserializers.java:405) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at shaded.com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeAndSet(MethodProperty.java:129) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:288) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:151) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at shaded.com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:1719) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at shaded.com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1350) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.utils.JsonUtils.jsonNodeToObject(JsonUtils.java:117) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:88) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$run$0(SegmentGenerationJobRunner.java:199) ~[pinot-batch-ingestion-standalone-0.7.1-shaded.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_291]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_291]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_291]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_291]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_291]
m
I am guessing we moved to a newer version of jackson that is having trouble reading the delimiter into a char?
k
Well, it’s OK if I use
multiValueDelimiter: 'a'
, but it’s not OK if I do something like
multiValueDelimiter: '\u0040'
. Where in the code is the job yaml file converted to a RecordReaderSpec?
m
Check
IngestionJobLauncher.java
Assuming that you are using it
k
Yes, thanks - working on a unit test to see if I can find the issue :)
m
Cool, thanks
Either there's a code change or a lib change that is not able to handle your delim.
k
Looks like YAML parser used by Pinot 0.6.0 had a bug where it would treat \ufff0 as a Unicode escape sequence inside of single quotes, but according to the latest spec that’s only supposed to happen when it’s in double-quotes. So changing my job spec to look like:
Copy code
recordReaderSpec:
  dataFormat: 'csv'
  className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
  configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
  configs:
    multiValueDelimiter: "\ufff0"
(double-quotes for
multiValueDelimiter
value) fixed the problem.
m
Thanks @Ken Krugler for finding this.
k
Though in checking the 0.6.0 vs 0.7.1 pom.xml, it seems like both used
snakeyaml
`1.16`…hmmm
m
May be transitive dependency?
Would you mind adding this to FAQ?
k
Sure
m
Thanks
k
Done
m
thankyou