Yash Agarwal
07/23/2020, 4:48 PM{
"name": "sls_d",
"dataType": "STRING",
"format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd",
"granularity": "1:DAYS"
}
but i am getting
Caused by: java.lang.IllegalArgumentException: Invalid format: "null"
at org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187) ~[pinot-all.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
Neha Pawar
dateTimeFieldSpec
object?Yash Agarwal
07/23/2020, 4:50 PM"dateTimeFieldSpecs": [
{
"name": "sls_d",
"dataType": "STRING",
"format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd",
"granularity": "1:DAYS"
}
]
Neha Pawar
Yash Agarwal
07/23/2020, 4:51 PM2020/07/23 09:52:25.681 INFO [DAGScheduler] [dag-scheduler-event-loop] ResultStage 0 (foreach at SparkSegmentGenerationJobRunner.java:214) failed in 91.117 s due to Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 16, <http://brdn2451.target.com|brdn2451.target.com>, executor 4): java.lang.IllegalArgumentException: Invalid format: "null"
at org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)
at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826)
at org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.writeMetadata(SegmentColumnarIndexCreator.java:399)
at org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.seal(SegmentColumnarIndexCreator.java:360)
at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:216)
at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:199)
at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:102)
at org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner$1.call(SparkSegmentGenerationJobRunner.java:278)
at org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner$1.call(SparkSegmentGenerationJobRunner.java:214)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreach$1.apply(JavaRDDLike.scala:351)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreach$1.apply(JavaRDDLike.scala:351)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:921)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:921)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Neha Pawar
timeColumnName
mentioned?Yash Agarwal
07/23/2020, 4:59 PM"timeColumnName": "sls_d",
"timeType": "DAYS",
Neha Pawar
if (config.getTimeColumnType() == SegmentGeneratorConfig.TimeColumnType.SIMPLE_DATE) {
// For TimeColumnType.SIMPLE_DATE_FORMAT, convert time value into millis since epoch
DateTimeFormatter dateTimeFormatter = DateTimeFormat.forPattern(config.getSimpleDateFormat());
startTime = dateTimeFormatter.parseMillis(startTimeStr);
endTime = dateTimeFormatter.parseMillis(endTimeStr);
timeUnit = TimeUnit.MILLISECONDS;
}
can you confirm that the raw data has a sls_d
column in the correct format? it looks like this piece of code is receiving null value for timeYash Agarwal
07/23/2020, 5:06 PMsls_d
2019-05-18
2019-05-19
2019-05-20
2019-05-21
2019-05-24
2020-05-21
2020-05-22
Neha Pawar
Kartik Khare
07/23/2020, 6:31 PMsls_d
column?Neha Pawar
sls_d
has any nulls
2. date
in hive gets stored as daysSinceEpoch INT
in avro https://avro.apache.org/docs/1.8.0/spec.html#Date. Try with sls_d
format as 1:DAYS:EPOCH
Yash Agarwal
07/24/2020, 5:14 AM