What are the limitations when using noDictionaryColumns I go Apache Pinot #general

What are the limitations when using noDictionaryCo...

troywinter

05/12/2021, 12:06 PM

What are the limitations when using noDictionaryColumns? I got the following exceptions when doing an orderby on a noDictionaryColumn:

Copy code

[
  {
    "errorCode": 200,
    "message": "QueryExecutionError:\njava.lang.IndexOutOfBoundsException\n\tat java.nio.Buffer.checkBounds(Buffer.java:571)\n\tat java.nio.DirectByteBuffer.get(DirectByteBuffer.java:264)\n\tat org.apache.pinot.core.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getStringCompressed(VarByteChunkSVForwardIndexReader.java:80)\n\tat org.apache.pinot.core.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getString(VarByteChunkSVForwardIndexReader.java:60)\n\tat org.apache.pinot.core.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getString(VarByteChunkSVForwardIndexReader.java:34)\n\tat org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readStringValues(DataFetcher.java:465)\n\tat org.apache.pinot.core.common.DataFetcher.fetchStringValues(DataFetcher.java:146)\n\tat org.apache.pinot.core.common.DataBlockCache.getStringValuesForSVColumn(DataBlockCache.java:194)\n\tat org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getStringValuesSV(ProjectionBlockValSet.java:94)\n\tat org.apache.pinot.core.common.RowBasedBlockValueFetcher.createFetcher(RowBasedBlockValueFetcher.java:64)\n\tat org.apache.pinot.core.common.RowBasedBlockValueFetcher.<init>(RowBasedBlockValueFetcher.java:32)\n\tat org.apache.pinot.core.operator.query.SelectionOrderByOperator.computePartiallyOrdered(SelectionOrderByOperator.java:237)\n\tat org.apache.pinot.core.operator.query.SelectionOrderByOperator.getNextBlock(SelectionOrderByOperator.java:178)\n\tat org.apache.pinot.core.operator.query.SelectionOrderByOperator.getNextBlock(SelectionOrderByOperator.java:73)"
  }
]

Mayank

05/12/2021, 3:31 PM

Hmm, from query perspective everything should work. My guess is offset had overflow. but I thought we already switched to long based offset. Can you provide more context?

Mayank

05/12/2021, 3:31 PM

Cc @User @User

Jackie

05/12/2021, 5:48 PM

We are using

int

to store offset within the chunk, but in normal case that should not overflow

Jackie

05/12/2021, 5:49 PM

@User Can you share the segment metadata? What is the longest entry for this column? Does it contain special characters?

troywinter

05/13/2021, 6:03 AM

Copy code

{
  "segment.realtime.endOffset": "9223372036854775807",
  "segment.start.time": "-1",
  "segment.time.unit": null,
  "segment.flush.threshold.size": "1666666",
  "segment.realtime.startOffset": "5722880729",
  "segment.end.time": "-1",
  "segment.total.docs": "-1",
  "segment.table.name": "product_log2_REALTIME",
  "segment.realtime.numReplicas": "1",
  "segment.creation.time": "1620879704453",
  "segment.realtime.download.url": null,
  "segment.name": "product_log2__0__20__20210513T0421Z",
  "segment.index.version": null,
  "segment.flush.threshold.time": null,
  "segment.type": "REALTIME",
  "segment.crc": "-1",
  "segment.realtime.status": "IN_PROGRESS"
}

Here is the metadata for the consuming segment.

troywinter

05/13/2021, 6:09 AM

The max length of msg is only 512, maybe that not caused by length of column but an offset miscalculation?

Open in Slack

Previous Next