What are the limitations when using noDictionaryCo...
# general
t
What are the limitations when using noDictionaryColumns? I got the following exceptions when doing an orderby on a noDictionaryColumn:
Copy code
[
  {
    "errorCode": 200,
    "message": "QueryExecutionError:\njava.lang.IndexOutOfBoundsException\n\tat java.nio.Buffer.checkBounds(Buffer.java:571)\n\tat java.nio.DirectByteBuffer.get(DirectByteBuffer.java:264)\n\tat org.apache.pinot.core.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getStringCompressed(VarByteChunkSVForwardIndexReader.java:80)\n\tat org.apache.pinot.core.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getString(VarByteChunkSVForwardIndexReader.java:60)\n\tat org.apache.pinot.core.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getString(VarByteChunkSVForwardIndexReader.java:34)\n\tat org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readStringValues(DataFetcher.java:465)\n\tat org.apache.pinot.core.common.DataFetcher.fetchStringValues(DataFetcher.java:146)\n\tat org.apache.pinot.core.common.DataBlockCache.getStringValuesForSVColumn(DataBlockCache.java:194)\n\tat org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getStringValuesSV(ProjectionBlockValSet.java:94)\n\tat org.apache.pinot.core.common.RowBasedBlockValueFetcher.createFetcher(RowBasedBlockValueFetcher.java:64)\n\tat org.apache.pinot.core.common.RowBasedBlockValueFetcher.<init>(RowBasedBlockValueFetcher.java:32)\n\tat org.apache.pinot.core.operator.query.SelectionOrderByOperator.computePartiallyOrdered(SelectionOrderByOperator.java:237)\n\tat org.apache.pinot.core.operator.query.SelectionOrderByOperator.getNextBlock(SelectionOrderByOperator.java:178)\n\tat org.apache.pinot.core.operator.query.SelectionOrderByOperator.getNextBlock(SelectionOrderByOperator.java:73)"
  }
]
m
Hmm, from query perspective everything should work. My guess is offset had overflow. but I thought we already switched to long based offset. Can you provide more context?
Cc @User @User
j
We are using
int
to store offset within the chunk, but in normal case that should not overflow
@User Can you share the segment metadata? What is the longest entry for this column? Does it contain special characters?
t
Copy code
{
  "segment.realtime.endOffset": "9223372036854775807",
  "segment.start.time": "-1",
  "segment.time.unit": null,
  "segment.flush.threshold.size": "1666666",
  "segment.realtime.startOffset": "5722880729",
  "segment.end.time": "-1",
  "segment.total.docs": "-1",
  "segment.table.name": "product_log2_REALTIME",
  "segment.realtime.numReplicas": "1",
  "segment.creation.time": "1620879704453",
  "segment.realtime.download.url": null,
  "segment.name": "product_log2__0__20__20210513T0421Z",
  "segment.index.version": null,
  "segment.flush.threshold.time": null,
  "segment.type": "REALTIME",
  "segment.crc": "-1",
  "segment.realtime.status": "IN_PROGRESS"
}
Here is the metadata for the consuming segment.
The max length of msg is only 512, maybe that not caused by length of column but an offset miscalculation?