Hi all, is there a way to use an ISO 8601 datetime...
# troubleshooting
t
Hi all, is there a way to use an ISO 8601 datetime string format format? I have the following datetime field spec defined:
Copy code
"dateTimeFieldSpecs" : [ {
    "name" : "time_string",
    "dataType" : "STRING",
    "format" : "1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss.SSS",
    "granularity" : "1:MILLISECONDS"
  } ]
however, getting the following error when running an Realtime to Offline segment task in the minion-stateless pod:
Copy code
java.lang.IllegalArgumentException: Invalid minTimeValue: 2022-09-26T14:46:40.760 for SimpleSegmentNameGenerator
From the code, it seems like it is automatically creating a segment name based on the .toString method of the datetime field spec, which is outputting “2022-09-26T144640.760" and then gets rejected by the regex validator for matching the following regex expression
.*[\\\\/:\\*?\"<>|].*
for segment names. Is there a way to specify different name generation logic? Or do I have to ETL my offline data and update my real time data to publish with a different format? The data I loaded into Offline tables seperately automatically created the following example segment name
<table name>_OFFLINE_2021-03-15-06_2022-08-01-14_11
, seeming to automatically convert the semi-colons “:” to hyphens “-”. It does not seem like this logic is consistent between the batch load and Realtime to Offline Segment jobs.
Full Trace
Copy code
java.lang.IllegalArgumentException: Invalid minTimeValue: 2022-09-26T14:46:40.760 for SimpleSegmentNameGenerator
	at org.apache.pinot.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:191) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.segment.spi.creator.name.SimpleSegmentNameGenerator.generateSegmentName(SimpleSegmentNameGenerator.java:62) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:258) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:248) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.core.segment.processing.framework.SegmentProcessorFramework.process(SegmentProcessorFramework.java:154) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.plugin.minion.tasks.realtimetoofflinesegments.RealtimeToOfflineSegmentsTaskExecutor.convert(RealtimeToOfflineSegmentsTaskExecutor.java:163) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.plugin.minion.tasks.BaseMultipleSegmentsConversionExecutor.executeTask(BaseMultipleSegmentsConversionExecutor.java:165) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.plugin.minion.tasks.BaseMultipleSegmentsConversionExecutor.executeTask(BaseMultipleSegmentsConversionExecutor.java:62) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.runInternal(TaskFactoryRegistry.java:113) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.run(TaskFactoryRegistry.java:89) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.helix.task.TaskRunner.run(TaskRunner.java:75) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
x
Pinot uses time in the segment name, you can either use normalizedSegmentNameGenerator. Or have two timestamp field, one is using TIMESTAMP type, one is your iso8601
cc: @Jackie I feel we should make iso8601 a reserved format keyword?
t
This isn’t documented in the following pages • Indexing https://docs.pinot.apache.org/basics/indexing • Table https://docs.pinot.apache.org/configuration-reference/table However, I found my answer here: https://github.com/apache/pinot/blob/0a442b90a337b6dd777a1d55a4361e9654e7e91d/pino[…]ore/segment/processing/framework/SegmentProcessorFramework.java Where I added the
segmentNameGeneratorType
to the OFFLINE table’s indexing config
x
Thanks for pointing out, we will enhance the doc for this
🌟 1
t
j
We should probably use
NORMALIZED_DATE
by default, or at least infer the type. @Thomas Steinholz Can you help create a github issue to track this?
cc @Tim Santos
t
@Jackie this PR should correctly infer the segment name generator type if the time column is string and in simple date format. https://github.com/apache/pinot/pull/9550
j
@Tim Santos Is this the default behavior?
t
The inference will now happen by default (without any configuration)
👍 2
j
Shall we update the doc for this change?
👍 1
t
@Jackie PR to improve the doc based on this recent change: https://github.com/pinot-contrib/pinot-docs/pull/133
👍 2
x
Also suggest to add examples on how each type will generate segment names and best use cases, e.g. full table refresh/daily append/ daily bootstrap etc