Has anybody noticed an index out of bounds excepti...
# troubleshooting
j
Has anybody noticed an index out of bounds exception when trying to create a native text index over a column?
Copy code
Created text index for column: value in segment: global_values_OFFLINE_0
Failed to instantiate Lucene text index reader for column value, exception Index 2048 out of bounds for length 2048
Failed to load segment: global_values_OFFLINE_0 with SegmentDirectory
java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: Index 2048 out of bounds for length 2048
	at org.apache.pinot.segment.local.segment.index.readers.text.NativeTextIndexReader.<init>(NativeTextIndexReader.java:60) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.segment.local.segment.index.readers.DefaultIndexReaderProvider.newTextIndexReader(DefaultIndexReaderProvider.java:168) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.segment.spi.index.IndexingOverrides$Default.newTextIndexReader(IndexingOverrides.java:254) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.segment.local.segment.index.column.PhysicalColumnIndexContainer.<init>(PhysicalColumnIndexContainer.java:101) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.segment.local.indexsegment.immutable.ImmutableSegmentLoader.load(ImmutableSegmentLoader.java:189) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.segment.local.indexsegment.immutable.ImmutableSegmentLoader.load(ImmutableSegmentLoader.java:127) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.segment.local.indexsegment.immutable.ImmutableSegmentLoader.load(ImmutableSegmentLoader.java:92) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.core.data.manager.BaseTableDataManager.addSegment(BaseTableDataManager.java:216) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.core.data.manager.BaseTableDataManager.addOrReplaceSegment(BaseTableDataManager.java:409) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addOrReplaceSegment(HelixInstanceDataManager.java:385) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:163) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
	at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:350) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:278) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2048 out of bounds for length 2048
	at java.lang.invoke.VarHandle$1.apply(VarHandle.java:2011) ~[?:?]
	at java.lang.invoke.VarHandle$1.apply(VarHandle.java:2008) ~[?:?]
	at jdk.internal.util.Preconditions$1.apply(Preconditions.java:159) ~[?:?]
	at jdk.internal.util.Preconditions$1.apply(Preconditions.java:156) ~[?:?]
	at jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:62) ~[?:?]
	at jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) ~[?:?]
	at jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248) ~[?:?]
	at java.lang.invoke.VarHandleObjects$Array.setVolatile(VarHandleObjects.java:445) ~[?:?]
	at java.lang.invoke.VarHandleGuards.guard_LIL_V(VarHandleGuards.java:659) ~[?:?]
	at java.util.concurrent.atomic.AtomicReferenceArray.set(AtomicReferenceArray.java:111) ~[?:?]
	at org.apache.pinot.segment.local.realtime.impl.dictionary.OffHeapMutableBytesStore.add(OffHeapMutableBytesStore.java:102) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.segment.local.utils.nativefst.ImmutableFST.readRemaining(ImmutableFST.java:207) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.segment.local.utils.nativefst.ImmutableFST.<init>(ImmutableFST.java:200) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.segment.local.utils.nativefst.FST.read(FST.java:91) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.segment.local.utils.nativefst.FST.read(FST.java:114) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.segment.local.segment.index.readers.text.NativeTextIndexReader.populateIndexes(NativeTextIndexReader.java:88) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	at org.apache.pinot.segment.local.segment.index.readers.text.NativeTextIndexReader.<init>(NativeTextIndexReader.java:56) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
	... 22 more
m
@Atri Sharma ^^
@Josh Clum Could you try the non-native index in the meanwhile?
j
The error message is also a little bit confusing. Confirmed that it is logged from the
NativeTextIndexReader
, which is a copy-pasted bug
a
Looking
Oh yeah, I notice the issue. Sorry, bad error message copy :D
j
So the problem is with my error message or the pinot code?
Here is my table config:
Copy code
{
  "tableName": "global_values",
  "tableType": "OFFLINE",
  "segmentsConfig": {
    "segmentPushType": "APPEND",
    "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
    "schemaName": "global_values",
    "replication": "2",
    "replicasPerPartition": "1"
  },
  "tenants": {},
  "fieldConfigList": [
    {
      "name": "value",
      "encodingType": "RAW",
      "indexType": "TEXT",
      "properties": {
        "fstType": "native"
      }
    }
  ],
  "tableIndexConfig": {
    "loadMode": "MMAP"
  },
  "metadata": {
    "customConfigs": {}
  }
}
@Atri Sharma
a
Let me try and confirm, thanks
@Josh Clum did you try the same config, with a Lucene index? (remove the fstType config)
j
@Atri Sharma will try that
a
Let me know if it works. I am debugging this now
j
trying with 10 ~128 MB files totaling around ~75 million text values
importing from s3 via spark3
import seems to go just fine
a
Got it. So it's an issue with the native. Let me try it out
j
Not setting
"fstType": "native"
works great, but it would be cool to see the speedups for native if you find a fix for it
a
Yes, I am debugging the same
j
also, one thing i was noticing is that the text index for native was taking much longer to build than lucene. i wasn't timing it, was just significantly noticeable.
a
Let's discuss offline - I would love to investigate more
j
@Atri Sharma any luck on this?
a
Travelling, let me get back by Sunday
@Josh Clum As discussed offline, I tried reproducing this but unable to. Would love to get some inputs from yourself to understand what you tried, and whether it failed after a certain point or in the beginning. Let me know when we can chat