Hi Pinot friends. We are trying out the Timestamp ...
# troubleshooting
s
Hi Pinot friends. We are trying out the Timestamp Index and it's working great except for one portion. The realtime to offline task is now failing with the following error:
Copy code
Caught exception while executing task: Task_RealtimeToOfflineSegmentsTask_653223a4-b56a-4625-a899-c956b3ed77f0_1667244900023_0 
java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0  
at org.apache.pinot.segment.local.segment.creator.impl.stats.LongColumnPreIndexStatsCollector.getMinValue(LongColumnPreIndexStatsCollector.java:71) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936
at org.apache.pinot.segment.local.segment.creator.impl.stats.LongColumnPreIndexStatsCollector.getMinValue(LongColumnPreIndexStatsCollector.java:27) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936
at org.apache.pinot.segment.spi.creator.ColumnIndexCreationInfo.getMin(ColumnIndexCreationInfo.java:55) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.init(SegmentColumnarIndexCreator.java:205) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:216) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.core.segment.processing.framework.SegmentProcessorFramework.process(SegmentProcessorFramework.java:154) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.plugin.minion.tasks.realtimetoofflinesegments.RealtimeToOfflineSegmentsTaskExecutor.convert(RealtimeToOfflineSegmentsTaskExecutor.java:163) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422
at org.apache.pinot.plugin.minion.tasks.BaseMultipleSegmentsConversionExecutor.executeTask(BaseMultipleSegmentsConversionExecutor.java:165) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056
at org.apache.pinot.plugin.minion.tasks.BaseMultipleSegmentsConversionExecutor.executeTask(BaseMultipleSegmentsConversionExecutor.java:62) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f
at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.runInternal(TaskFactoryRegistry.java:113) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.run(TaskFactoryRegistry.java:89) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.helix.task.TaskRunner.run(TaskRunner.java:75) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
 at java.lang.Thread.run(Thread.java:829) [?:?]
 Task: Task_RealtimeToOfflineSegmentsTask_653223a4-b56a-4625-a899-c956b3ed77f0_1667244900023_0 completed in: 2015ms
Here are the configs that I put on both my realtime and offline tables
Copy code
"fieldConfigList": [
    {
      "name": "event_timestamp",
      "encodingType": "DICTIONARY",
      "indexType": "TIMESTAMP",
      "indexTypes": [
        "TIMESTAMP"
      ],
      "timestampConfig": {
        "granularities": [
          "HOUR",
          "DAY",
          "WEEK",
          "MONTH"
        ]
      }
    }
  ]
If I remove that config from my offline table then the realtime to offline job works fine but I no longer get my timestamp index columns on my offline table
If I remove the timestamp config from offline and run a query using the timestamp $ columns, I also get this error
Copy code
There are 13 invalid segment/s. This usually means that they were created with an older schema. Please reload the table in order to refresh these segments to the new schema.
The query seems to return fine though.
My rollup config
Copy code
"task": {
    "taskTypeConfigsMap": {
      "RealtimeToOfflineSegmentsTask": {
        "bucketTimePeriod": "1d",
        "bufferTimePeriod": "1d",
        "mergeType": "concat",
        "maxNumRecordsPerSegment": "5000000",
        "schedule": "$REALTIME_TO_OFFLINE_SEGMENT_TASK_SCHEDULE"
      }
    }
m
@Jackie for timestamp index ^^
j
@Stuart Millholland This seems a bug in Pinot where the auto-generated virtual column is mistakenly included. Can you please help file a github issue with the information above?
s
Yes will do
@Jackie what are the chances this one is fast tracked? It seems like a P1 to me, basically you can't move to offline from realtime if you use the timestamp index. We may try to find time to work on this one ourselves, but curious if you have a resource to throw at it?
😮 1
j
Let me take a look now and see if there is a easy fix
👍 1
s
@Jackie any additional thoughts on this one?
j
I found the issue, and is working on a fix now
Merged the fix, please try it out
s
Awesome, will try it tomorrow!