Thomas Steinholz
10/17/2022, 6:40 PM"dateTimeFieldSpecs" : [ {
"name" : "time_string",
"dataType" : "STRING",
"format" : "1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss.SSS",
"granularity" : "1:MILLISECONDS"
} ]
however, getting the following error when running an Realtime to Offline segment task in the minion-stateless pod:
java.lang.IllegalArgumentException: Invalid minTimeValue: 2022-09-26T14:46:40.760 for SimpleSegmentNameGenerator
From the code, it seems like it is automatically creating a segment name based on the .toString method of the datetime field spec, which is outputting “2022-09-26T144640.760" and then gets rejected by the regex validator for matching the following regex expression .*[\\\\/:\\*?\"<>|].*
for segment names.
Is there a way to specify different name generation logic? Or do I have to ETL my offline data and update my real time data to publish with a different format?
The data I loaded into Offline tables seperately automatically created the following example segment name <table name>_OFFLINE_2021-03-15-06_2022-08-01-14_11
, seeming to automatically convert the semi-colons “:” to hyphens “-”. It does not seem like this logic is consistent between the batch load and Realtime to Offline Segment jobs.Thomas Steinholz
10/17/2022, 6:41 PMjava.lang.IllegalArgumentException: Invalid minTimeValue: 2022-09-26T14:46:40.760 for SimpleSegmentNameGenerator
at org.apache.pinot.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:191) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.segment.spi.creator.name.SimpleSegmentNameGenerator.generateSegmentName(SimpleSegmentNameGenerator.java:62) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:258) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:248) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.core.segment.processing.framework.SegmentProcessorFramework.process(SegmentProcessorFramework.java:154) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.plugin.minion.tasks.realtimetoofflinesegments.RealtimeToOfflineSegmentsTaskExecutor.convert(RealtimeToOfflineSegmentsTaskExecutor.java:163) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.plugin.minion.tasks.BaseMultipleSegmentsConversionExecutor.executeTask(BaseMultipleSegmentsConversionExecutor.java:165) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.plugin.minion.tasks.BaseMultipleSegmentsConversionExecutor.executeTask(BaseMultipleSegmentsConversionExecutor.java:62) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.runInternal(TaskFactoryRegistry.java:113) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.run(TaskFactoryRegistry.java:89) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.helix.task.TaskRunner.run(TaskRunner.java:75) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
Thomas Steinholz
10/17/2022, 6:47 PMXiang Fu
Xiang Fu
Thomas Steinholz
10/17/2022, 10:00 PMsegmentNameGeneratorType
to the OFFLINE table’s indexing configXiang Fu
Thomas Steinholz
10/17/2022, 10:09 PMinferSegmentNameGeneratorType
method, it would have actually worked for me out of the box for ISO 8601 strings
https://github.com/apache/pinot/blob/0a442b90a337b6dd777a1d55a4361e9654e7e91d/pino[…]rg/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.javaJackie
10/17/2022, 10:17 PMNORMALIZED_DATE
by default, or at least infer the type. @Thomas Steinholz Can you help create a github issue to track this?Jackie
10/17/2022, 11:05 PMTim Santos
10/17/2022, 11:48 PMJackie
10/17/2022, 11:57 PMTim Santos
10/17/2022, 11:58 PMJackie
10/18/2022, 12:19 AMTim Santos
10/19/2022, 10:06 PMXiang Fu