hello friends me again with a different issue we are executi Apache Pinot #troubleshooting

hello friends me again, with a different issue: we...

Luis Fernandez

08/23/2022, 7:33 PM

hello friends me again, with a different issue: we are executing queries in pinot and are getting the following exception at query time, not for all of them just a few:

Copy code

QueryExecutionError:\njava.lang.IndexOutOfBoundsException\n	at java.base/java.nio.Buffer.checkBounds(Buffer.java:714)\n	at java.base/java.nio.DirectByteBuffer.get(DirectByteBuffer.java:288)\n	at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getStringCompressed(VarByteChunkSVForwardIndexReader.java:81)\n	at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getString(VarByteChunkSVForwardIndexReader.java:61)

query looks like this:

Copy code

SELECT SUM(impression_count) as imp_count, stemmed_query FROM query_metrics WHERE user_id = xxx AND product_id = xxx AND serve_time BETWEEN 1660622400 AND 1661227199 GROUP BY stemmed_query ORDER BY impression_count LIMIT 100000

stats:

Copy code

"numServersQueried": 2,
    "numServersResponded": 2,
    "numSegmentsQueried": 11,
    "numSegmentsProcessed": 10,
    "numSegmentsMatched": 10,
    "numConsumingSegmentsQueried": 1,
    "numDocsScanned": 16241,
    "numEntriesScannedInFilter": 5862,
    "numEntriesScannedPostFilter": 64964,
    "numGroupsLimitReached": false,
    "totalDocs": 77203847,
    "timeUsedMs": 133,
    "offlineThreadCpuTimeNs": 0,
    "realtimeThreadCpuTimeNs": 0,
    "offlineSystemActivitiesCpuTimeNs": 0,
    "realtimeSystemActivitiesCpuTimeNs": 0,
    "offlineResponseSerializationCpuTimeNs": 0,
    "realtimeResponseSerializationCpuTimeNs": 0,
    "offlineTotalCpuTimeNs": 0,
    "realtimeTotalCpuTimeNs": 0,
    "segmentStatistics": [],
    "traceInfo": {},
    "minConsumingFreshnessTimeMs": 1661283161852,
    "numRowsResultSet": 100

Mayank

08/23/2022, 7:35 PM

Hmm, are you using no-dictionary column?

Luis Fernandez

08/23/2022, 7:36 PM

in the select statement? yes

Luis Fernandez

08/23/2022, 7:42 PM

these 2 are not dictionary columns

SUM(impression_count) as imp_count, stemmed_query

Mayank

08/23/2022, 7:45 PM

Is this real-time table?

Luis Fernandez

08/23/2022, 7:46 PM

it’s a real-time yes

Mayank

08/23/2022, 7:46 PM

And do you have upsert? Trying to triage what might be causing the issue

Mayank

08/23/2022, 7:46 PM

@Jackie for any thoughts

Luis Fernandez

08/23/2022, 7:47 PM

no upsert

Jackie

08/23/2022, 8:42 PM

@Luis Fernandez Which version are you running? And what compression type did you use for the no-dictionary column?

Luis Fernandez

08/23/2022, 8:43 PM

running

0.10.0

Luis Fernandez

08/23/2022, 8:44 PM

I just configured

"noDictionaryColumns":

i guess default compression? Snappy?

Jackie

08/23/2022, 8:45 PM

Is it a dimension?

Luis Fernandez

08/23/2022, 8:45 PM

it’s a dimension

Luis Fernandez

08/23/2022, 8:46 PM

should be a dictionary column in that case right?

Jackie

08/23/2022, 8:47 PM

That is fine. The default compression type for dimension is snappy

Jackie

08/23/2022, 8:52 PM

Could the index for

stemmed_query

very large?

Jackie

08/23/2022, 8:53 PM

Might be related to this isue: https://github.com/apache/pinot/issues/8701

🌟 1

Luis Fernandez

08/23/2022, 8:54 PM

oh i’m asking should we make the stemmed_query a dictionary column or would it cause it to fail too

Luis Fernandez

08/23/2022, 8:55 PM

but this def looks like the issue

Luis Fernandez

08/23/2022, 8:55 PM

in noDictionaryColumns we usually just leave the metrics columns that we want to aggregate so was just wondering that

Luis Fernandez

08/23/2022, 9:04 PM

What would be your recommendation?

Luis Fernandez

08/23/2022, 9:16 PM

also just to check again the exception happens in the

VarByteChunkSVForwardIndexReader

but we should change the version of the Writer? just confused cause there are different versions of the Reader too.

Jackie

08/23/2022, 9:18 PM

If this is indeed the issue, that means the writer wrote the corrupted data, which caused the exception on the reader side

Jackie

08/23/2022, 9:18 PM

If the

stemmed_query

is almost all unique, no dictionary should have better performance

Jackie

08/23/2022, 9:18 PM

You may try the newer version of the writer and see if it solves the problem

Luis Fernandez

08/23/2022, 9:19 PM

how do we configure that?

Luis Fernandez

08/23/2022, 9:20 PM

and this wouldn’t fix the existing records right? or would it

Jackie

08/24/2022, 6:02 PM

You may refer to this part: https://docs.pinot.apache.org/configuration-reference/table#field-config-list, and change the

rawIndexWriterVersion

to 3 or 4 (4 might not be available in the version you are running)

Rebecca Lau

08/31/2022, 8:46 PM

👋 hi! wanted to follow up on this thread. i’ve tried

rawIndexWriterVersion

both 3 and 4 for our text column, but we’re still receiving the same error as originally posted

Jackie

09/02/2022, 3:54 AM

I believe the issue is from the

stemmed_query

column. Can you try

select DISTINCTCOUNTHLL(stemmed_query) FROM query_metrics

and see if you get the same exception?

Jackie

09/02/2022, 3:55 AM

If sharing the data is not possible, you may also try directly creating the column index using

SingleValueVarByteRawIndexCreator

then read it with

VarByteChunkSVForwardIndexReader

and see if you can reproduce the issue

Rebecca Lau

09/12/2022, 5:28 PM

select DISTINCTCOUNTHLL(stemmed_query) FROM query_metrics

gave the same exception, yes

Jinny Cho

09/15/2022, 6:08 PM

@Jackie 👋 we can reproduce this issue and

DISTINCTCOUNTHLL

doesn't fix the problem

Jinny Cho

09/15/2022, 6:10 PM

When I check some shops that cause this issue, their

stemmed_query

doesn't look particularly problematic

Jinny Cho

09/15/2022, 6:10 PM

For example, I was expecting them to be extremely long or have weird characters like (

) but they don't...

Jackie

09/15/2022, 6:10 PM

DISTINCTCOUNTHLL

won't fix the problem. I asked you to try it to validate if

stemmed_query

caused the issue.

Jackie

09/15/2022, 6:11 PM

The issue could be due to the size of it

Jinny Cho

09/15/2022, 6:11 PM

I think

stemmed_query

is likely the cause because when I remove that line in the query, it worked well.

Jackie

09/15/2022, 6:12 PM

Do you want to have a quick debug session?

Jinny Cho

09/15/2022, 6:12 PM

Oh you mean through video chat or something?

Jackie

09/15/2022, 6:12 PM

Yeah, zoom

Jinny Cho

09/15/2022, 6:15 PM

ah gotcha. Let me ask the team first!

Jinny Cho

09/15/2022, 6:18 PM

cool! How about directly from slack? And thanks!

Jackie

09/15/2022, 6:24 PM

Do you have the raw data available? Want to see if we can reproduce it using the low level index creator and reader

Jinny Cho

09/15/2022, 6:28 PM

Yeah. we have the raw data.

Jinny Cho

09/15/2022, 6:29 PM

If you mean raw data being the data in pinot table?

Jackie

09/15/2022, 6:30 PM

Yeah, I'll need all the

stemmed_query

values

Jinny Cho

09/15/2022, 6:31 PM

Yep. We have all the values

Jackie

09/15/2022, 6:34 PM

Cool. Do you have the Pinot source code available? Want to write a small java program to debug it

Jackie

09/15/2022, 6:34 PM

How many segments do you have? We want to find the segment that is having problem

Jinny Cho

09/15/2022, 6:37 PM

there are 201 segments 👀

Jinny Cho

09/15/2022, 6:37 PM

What kind of Pinot source code do you need exactly?

Jinny Cho

09/15/2022, 6:39 PM

We can voice maybe? That might be easier.

Jackie

09/15/2022, 7:03 PM

Sharing screen will be easier if that is okay

Jinny Cho

09/15/2022, 7:09 PM

Yeah that should be good!

Jinny Cho

09/15/2022, 7:12 PM

How about like in an hour? at 4PM EST? I'd like to try something else before we dig more into it..

Jackie

09/15/2022, 7:12 PM

Sure

thankyou 1

Jackie

09/15/2022, 7:50 PM

Sorry I'll run a little late. Will ping you when I'm ready

Jinny Cho

09/15/2022, 8:02 PM

Sounds good! Thank you!

Jackie

09/15/2022, 8:12 PM

@Jinny Cho I can talk now

Jinny Cho

09/15/2022, 8:12 PM

Awesome!

Jinny Cho

09/15/2022, 8:14 PM

hmm it looks like there's no video chat here 😅 I'll try to set up a quick zoom

Jinny Cho

09/15/2022, 8:15 PM

https://us05web.zoom.us/j/84156035065?pwd=OXZ4UGl2TVNOR1N3ZUtqdTZ2SnRQQT09?

Jinny Cho

09/20/2022, 3:45 PM

Hmm Just want to give you heads up that we created a new table with an updated

rawIndexWriterVersion

. But it still doesn't fix the problem. 🤔

Jackie

09/20/2022, 5:03 PM

What is the maximum length of the

stemmed_query

Jackie

09/20/2022, 5:05 PM

If possible, you can try with the latest release

0.11.0

which has more error handling and the V4 raw index. If that is not possible, you may also add

deriveNumDocsPerChunkForRawIndex: true

along with the

rawIndexWriterVersion

and see if it fixes the problem

Luis Fernandez

09/20/2022, 5:05 PM

i think it’s configured to the default (?)

Copy code

{
  "name": "stemmed_query",
  "dataType": "STRING"
}

and we added this

Copy code

"fieldConfigList": [
      {
        "name": "stemmed_query",
        "encodingType": "RAW",
        "indexType": "TEXT",
        "indexTypes": [
          "TEXT"
        ],
        "properties": {
          "rawIndexWriterVersion": "3"
        }
      }
    ],

Luis Fernandez

09/20/2022, 5:06 PM

so add that under properties yes?

Luis Fernandez

09/20/2022, 5:06 PM

do we need to create the table again for that?

Jinny Cho

09/20/2022, 5:08 PM

Yeah. something like this?

Copy code

"fieldConfigList": [
      {
        "name": "stemmed_query",
        "encodingType": "RAW",
        "indexType": "TEXT",
        "indexTypes": [
          "TEXT"
        ],
        "properties": {
          "rawIndexWriterVersion": "4",
          "deriveNumDocsPerChunkForRawIndex": true
        }
      }
    ],

Jackie

09/20/2022, 5:08 PM

Oh, you didn't change the

maxLength

of the field? then it means the maximum length is 512

Luis Fernandez

09/20/2022, 5:08 PM

yeah that’s correct @Jackie

Jinny Cho

09/20/2022, 5:10 PM

yeah. Maybe we should do that.

Jinny Cho

09/20/2022, 5:10 PM

But IIRC when I checked some specific problematic cases, the length of the stemmed queries was far below 512 length.

Jinny Cho

09/20/2022, 5:12 PM

do we need to create the table again for that?

Yeah. Have the same question. Do we need to create the table again to add

deriveNumDocsPerChunkForRawIndex

or upgrade

rawIndexWriterVersion

to V4?

Jackie

09/20/2022, 5:20 PM

Yes, you'll need to recreate the table to add them

Jackie

09/20/2022, 5:21 PM

Can you upgrade to the latest release? There are several changes introduced in this release, and it is a little bit hard for me to track the old code

Luis Fernandez

09/20/2022, 5:22 PM

"deriveNumDocsPerChunkForRawIndex": true

this we can do with rawIndexWriterVersion V3 yes?

Luis Fernandez

09/20/2022, 5:25 PM

always can do

git checkout release-0.10.0

to check that older code

Jackie

09/20/2022, 5:26 PM

Yes, but we lost the new introduced error handling code..

Jackie

09/20/2022, 5:27 PM

"deriveNumDocsPerChunkForRawIndex": true

is available in V3 yes

🌟 1

Luis Fernandez

09/20/2022, 5:27 PM

another question, what happens in pinot if we try to ingest a record that’s greater than the default does it error out or does it ingest it anyhow?

Luis Fernandez

09/20/2022, 5:41 PM

oh i think we do have v4 https://github.com/lfernandez93/pinot/blob/13c9ee9556498bb6dc4ab60734743edb8b89773[…]segment/creator/impl/fwd/SingleValueVarByteRawIndexCreator.java is this the right place?

Jackie

09/20/2022, 5:47 PM

You mean the default

maxLength

? Pinot will truncate the value to the max length and ingest it

🍷 1

Jinny Cho

09/20/2022, 5:48 PM

cool

Jackie

09/20/2022, 5:48 PM

Then we can directly try V4. V4 doesn't require

deriveNumDocsPerChunkForRawIndex

Jinny Cho

09/20/2022, 5:48 PM

Nice..! Then I'll try V4 now.

Luis Fernandez

09/20/2022, 8:23 PM

seems like we are getting the following error:

Copy code

[
  {
    "message": "QueryExecutionError:\nnet.jpountz.lz4.LZ4Exception: Error decoding offset 588892 of input buffer\n\tat net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:70)\n\tat net.jpountz.lz4.LZ4DecompressorWithLength.decompress(LZ4DecompressorWithLength.java:145)\n\tat org.apache.pinot.segment.local.io.compression.LZ4WithLengthDecompressor.decompress(LZ4WithLengthDecompressor.java:44)\n\tat org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReaderV4$CompressedReaderContext.processChunkAndReadFirstValue(VarByteChunkSVForwardIndexReaderV4.java:241)",
    "errorCode": 200
  },
  {
    "message": "QueryExecutionError:\nnet.jpountz.lz4.LZ4Exception: Error decoding offset 529140 of input buffer\n\tat net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:70)\n\tat net.jpountz.lz4.LZ4DecompressorWithLength.decompress(LZ4DecompressorWithLength.java:145)\n\tat org.apache.pinot.segment.local.io.compression.LZ4WithLengthDecompressor.decompress(LZ4WithLengthDecompressor.java:44)\n\tat org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReaderV4$CompressedReaderContext.processChunkAndReadFirstValue(VarByteChunkSVForwardIndexReaderV4.java:241)",
    "errorCode": 200
  }
]

do you have any clue as to why this may be?

Luis Fernandez

09/20/2022, 9:14 PM

we are also getting this:

Copy code

[
  {
    "message": "QueryExecutionError:\njava.lang.IllegalArgumentException: newPosition > limit: (540794 > 528446)\n\tat java.base/java.nio.Buffer.createPositionException(Buffer.java:318)\n\tat java.base/java.nio.Buffer.position(Buffer.java:293)\n\tat java.base/java.nio.ByteBuffer.position(ByteBuffer.java:1094)\n\tat java.base/java.nio.MappedByteBuffer.position(MappedByteBuffer.java:226)",
    "errorCode": 200
  }
]

Jackie

09/20/2022, 9:41 PM

Interesting... So the issue might be from the LZ4 compression

👀 1

Jackie

09/20/2022, 9:42 PM

Can you try switching it to SNAPPY?

Jinny Cho

09/20/2022, 9:42 PM

ah instead of murmur?

Luis Fernandez

09/20/2022, 9:42 PM

instead of raw

Mayank

09/20/2022, 9:42 PM

LZ4

Jackie

09/20/2022, 9:42 PM

You may add

"compressionCodec": "SNAPPY"

to the field config

Jinny Cho

09/20/2022, 9:43 PM

gotcha! yeah we can try that

Jinny Cho

09/20/2022, 9:44 PM

it probably requires the whole new creation of the table just to be clear?

Jackie

09/20/2022, 9:50 PM

You may either wipe the current table, or create a new one. For real-time table, dropping the existing one and creating a new one might be simpler

Jinny Cho

09/22/2022, 9:25 PM

Hi team. It took some time to propagate full data. So when we tried V4 & Snappy encoding.. we still get the same decoding problem (got the following error messages)

Copy code

[
  {
    "message": "QueryExecutionError:\nnet.jpountz.lz4.LZ4Exception: Error decoding offset 123239 of input buffer\n\tat net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:70)\n\tat net.jpountz.lz4.LZ4DecompressorWithLength.decompress(LZ4DecompressorWithLength.java:145)\n\tat org.apache.pinot.segment.local.io.compression.LZ4WithLengthDecompressor.decompress(LZ4WithLengthDecompressor.java:44)\n\tat org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReaderV4$CompressedReaderContext.processChunkAndReadFirstValue(VarByteChunkSVForwardIndexReaderV4.java:241)",
    "errorCode": 200
  },
  {
    "message": "java.net.UnknownHostException: pinot-server-0.pinot-server-headless.pinot.svc.cluster.local: Name or service not known\n\tat java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)\n\tat java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)\n\tat java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1519)\n\tat java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)",
    "errorCode": 425
  },
  {
    "message": "1 servers [pinot-server-0_R] not responded",
    "errorCode": 427
  }
]

Should we consider SNAPPY decoding as well? At least snappy encoding didn't help this problem.

Jinny Cho

09/22/2022, 9:25 PM

(We also tried V3 + deriveNumDocsPerChunkForRawIndex = true, and it didn't work)

Luis Fernandez

09/27/2022, 3:34 PM

hey friends just bumping this thread we are still facing this issue, it’s not many errors but whoever clients get this error cannot load the UI properly we have tried a number of things but nothing has worked out so far. Anyone that has an idea how can we fix this?

Copy code

[
  {
    "message": "QueryExecutionError:\nnet.jpountz.lz4.LZ4Exception: Error decoding offset 123239 of input buffer\n\tat net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:70)\n\tat net.jpountz.lz4.LZ4DecompressorWithLength.decompress(LZ4DecompressorWithLength.java:145)\n\tat org.apache.pinot.segment.local.io.compression.LZ4WithLengthDecompressor.decompress(LZ4WithLengthDecompressor.java:44)\n\tat org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReaderV4$CompressedReaderContext.processChunkAndReadFirstValue(VarByteChunkSVForwardIndexReaderV4.java:241)",
    "errorCode": 200
  },
  {
    "message": "java.net.UnknownHostException: pinot-server-0.pinot-server-headless.pinot.svc.cluster.local: Name or service not known\n\tat java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)\n\tat java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)\n\tat java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1519)\n\tat java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)",
    "errorCode": 425
  },
  {
    "message": "1 servers [pinot-server-0_R] not responded",
    "errorCode": 427
  }
]

Mayank

09/27/2022, 3:35 PM

Did you try snappy?

Luis Fernandez

09/27/2022, 3:36 PM

we tried but apparently we still saw that exception ^ which is weird cause it still says LZ4

Luis Fernandez

09/27/2022, 3:36 PM

we would try Snappy with any of the versions? V3/V4?

Mayank

09/27/2022, 3:37 PM

If it says LZ4, then it is likely LZ4. Try v3

Jinny Cho

09/27/2022, 3:53 PM

2 follow-up questions • Is there anything that we need to be aware of before using SNAPPY encoding? I read that it could be slower than LZ4? • How do we check what encoding we use? I think it's possible that V3 silently uses LZ4 even if we set to use Snappy?

Luis Fernandez

09/27/2022, 6:14 PM

Copy code

[
  {
    "message": "QueryExecutionError:\njava.lang.IndexOutOfBoundsException\n\tat java.base/java.nio.Buffer.checkBounds(Buffer.java:714)\n\tat java.base/java.nio.DirectByteBuffer.get(DirectByteBuffer.java:288)\n\tat org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getStringCompressed(VarByteChunkSVForwardIndexReader.java:81)\n\tat org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getString(VarByteChunkSVForwardIndexReader.java:61)",
    "errorCode": 200
  }
]

with v3 + SNAPPY

Luis Fernandez

09/27/2022, 6:14 PM

different exception

Mayank

09/27/2022, 6:19 PM

@Jackie ^^

Luis Fernandez

09/27/2022, 7:11 PM

w v4 and snappy I see this:

Copy code

[
  {
    "message": "QueryExecutionError:\nProcessingException(errorCode:450, message:InternalError:\njava.lang.NullPointerException\n\tat org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.mergeResults(GroupByOrderByCombineOperator.java:236)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:119)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:50)",
    "errorCode": 200
  }
]

Luis Fernandez

09/27/2022, 7:11 PM

and this for any query

Copy code

[
  {
    "message": "QueryExecutionError:\njava.lang.IllegalArgumentException: newPosition > limit: (511477 > 505284)\n\tat java.base/java.nio.Buffer.createPositionException(Buffer.java:318)\n\tat java.base/java.nio.Buffer.position(Buffer.java:293)\n\tat java.base/java.nio.ByteBuffer.position(ByteBuffer.java:1094)\n\tat java.base/java.nio.MappedByteBuffer.position(MappedByteBuffer.java:226)",
    "errorCode": 200
  }
]

Jackie

09/27/2022, 7:28 PM

Can you check the server log and see if there are related

ERROR

logs?

Jackie

09/27/2022, 7:29 PM

Also, can you share some example value of

stemmed_query

if possible?

Luis Fernandez

09/27/2022, 7:29 PM

what do you want me to try wit hfirst

Luis Fernandez

09/27/2022, 7:29 PM

v3,v4 and SNAPPY (?)

Luis Fernandez

09/27/2022, 7:30 PM

which version

Luis Fernandez

09/27/2022, 7:32 PM

will try with v3 first.

Luis Fernandez

09/27/2022, 7:33 PM

Copy code

"fieldConfigList": [{
        "name": "stemmed_query",
        "encodingType": "RAW",
        "indexType": "TEXT",
        "compressionCodec": "SNAPPY",
        "indexTypes": [
          "TEXT"
        ],
        "properties": {
          "rawIndexWriterVersion": "3"
        }
      }],

Luis Fernandez

09/27/2022, 7:36 PM

example of stemmed_queries:

Copy code

glow in the dark starbucks tumbler
square foam block 3 inch
kemono fursuit eyes
minnie mouse 2nd birthday decoration
in memory gifts mom
custom signs for home
unicorn design bundle
polo shirt men vintage
hochzeits gastgeschenke
8x10 birthday cards

Luis Fernandez

09/27/2022, 8:25 PM

as shown above this is what we are seeing with v3 + SNAPPY

Copy code

[
  {
    "message": "QueryExecutionError:\njava.lang.IndexOutOfBoundsException\n\tat java.base/java.nio.Buffer.checkBounds(Buffer.java:714)\n\tat java.base/java.nio.DirectByteBuffer.get(DirectByteBuffer.java:288)\n\tat org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getStringCompressed(VarByteChunkSVForwardIndexReader.java:81)\n\tat org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getString(VarByteChunkSVForwardIndexReader.java:61)",
    "errorCode": 200
  }
]

Luis Fernandez

09/27/2022, 8:32 PM

in the server we see the following:

Luis Fernandez

09/27/2022, 8:34 PM

Copy code

Caught exception while processing query: QueryContext{_tableName='product_query_metrics_REALTIME', _selectExpressions=[sum(impression_count), sum(click_count), sum(order_count), stemmed_query], _aliasList=[impression_count, click_count, order_count, null], _filter=(user_id = '123123' AND product_id = '123123123' AND serve_time BETWEEN '1663214400' AND '1663819199'), _groupByExpressions=[stemmed_query], _havingFilter=null, _orderByExpressions=[sum(impression_count) ASC], _limit=100000, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=9999}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName:product_query_metrics_REALTIME), pinotQuery:PinotQuery(dataSource:DataSource(tableName:product_query_metrics_REALTIME), selectList:[Expression(type:FUNCTION, functionCall:Function(operator:AS, operands:[Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:impression_count))])), Expression(type:IDENTIFIER, identifier:Identifier(name:impression_count))])), Expression(type:FUNCTION, functionCall:Function(operator:AS, operands:[Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:click_count))])), Expression(type:IDENTIFIER, identifier:Identifier(name:click_count))])), Expression(type:FUNCTION, functionCall:Function(operator:AS, operands:[Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:order_count))])), Expression(type:IDENTIFIER, identifier:Identifier(name:order_count))])), Expression(type:IDENTIFIER, identifier:Identifier(name:stemmed_query))], filterExpression:Expression(type:FUNCTION, functionCall:Function(operator:AND, operands:[Expression(type:FUNCTION, functionCall:Function(operator:EQUALS, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:user_id)), Expression(type:LITERAL, literal:<Literal longValue:32466758>)])), Expression(type:FUNCTION, functionCall:Function(operator:EQUALS, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:product_id)), Expression(type:LITERAL, literal:<Literal longValue:1261981010>)])), Expression(type:FUNCTION, functionCall:Function(operator:BETWEEN, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:serve_time)), Expression(type:LITERAL, literal:<Literal longValue:1663214400>), Expression(type:LITERAL, literal:<Literal longValue:1663819199>)]))])), groupByList:[Expression(type:IDENTIFIER, identifier:Identifier(name:stemmed_query))], orderByList:[Expression(type:FUNCTION, functionCall:Function(operator:ASC, operands:[Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:impression_count))]))]))], limit:100000, queryOptions:{responseFormat=sql, groupByMode=sql, timeoutMs=9999}))}

Luis Fernandez

09/27/2022, 8:34 PM

trace:

Luis Fernandez

09/27/2022, 8:40 PM

Copy code

java.lang.IndexOutOfBoundsException: null
	at java.nio.Buffer.checkBounds(Buffer.java:714) ~[?:?]
	at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:288) ~[?:?]
	at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getStringCompressed(VarByteChunkSVForwardIndexReader.java:81) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getString(VarByteChunkSVForwardIndexReader.java:61) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getString(VarByteChunkSVForwardIndexReader.java:35) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readStringValues(DataFetcher.java:515) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.core.common.DataFetcher.fetchStringValues(DataFetcher.java:204) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.core.common.DataBlockCache.getStringValuesForSVColumn(DataBlockCache.java:243) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getStringValuesSV(ProjectionBlockValSet.java:94) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.core.query.aggregation.groupby.NoDictionarySingleColumnGroupKeyGenerator.generateKeysForBlock(NoDictionarySingleColumnGroupKeyGenerator.java:100) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.core.query.aggregation.groupby.DefaultGroupByExecutor.process(DefaultGroupByExecutor.java:123) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.core.operator.query.AggregationGroupByOrderByOperator.getNextBlock(AggregationGroupByOrderByOperator.java:109) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.core.operator.query.AggregationGroupByOrderByOperator.getNextBlock(AggregationGroupByOrderByOperator.java:46) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.processSegments(GroupByOrderByCombineOperator.java:137) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.core.operator.combine.BaseCombineOperator$1.runJob(BaseCombineOperator.java:100) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.core.util.trace.TraceRunnable.run(TraceRunnable.java:40) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at shaded.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at shaded.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at shaded.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]

Richard

09/27/2022, 8:40 PM

it looks like there might just be an off by one bug somewhere in the V4 index. I wrote it so I'll try to find some time to take a look.

Luis Fernandez

09/27/2022, 8:41 PM

just to clarify yes we are having issues with V4 Index as well, but this is V3 + SNAPPY

Luis Fernandez

09/27/2022, 8:41 PM

v4 throws this https://apache-pinot.slack.com/archives/C011C9JHN7R/p1664305862450519?thread_ts=1661283236.435409&cid=C011C9JHN7R

Richard

09/27/2022, 8:45 PM

yes that's what I saw sent to the channel

🙏 1

Richard

09/27/2022, 8:46 PM

I will add some more testing for V4 to isolate any issues in its implementation

Luis Fernandez

09/27/2022, 8:47 PM

do you have any idea as to what may be happening in general 😄 i just want to understand the issue

Richard

09/27/2022, 8:49 PM

how large is the data? If it's >= 4GB it will overflow

Richard

09/27/2022, 8:50 PM

which will cause weird stuff like this

Luis Fernandez

09/27/2022, 8:51 PM

like the index you mean? i can get that from the segment information if I ssh into the server yes?

Richard

09/27/2022, 8:56 PM

yes, but i don't recall if there are any logs or metrics which will tell you how large the column is

Luis Fernandez

09/27/2022, 8:56 PM

there’s this index_map file but i don’t think that’s what you need is it? the size of the index?

Richard

09/27/2022, 8:57 PM

the fact you have problems either way makes me think it's not particular to any one index implementation, one thing V2 and V4 have in common is size limits though V3 should be effectively unlimited

Jackie

09/27/2022, 8:58 PM

The index size is actually small (less than 100MB), so it shouldn't be overflow

Jackie

09/27/2022, 8:58 PM

I suspect there are some special value in the stemmed_queries column that triggers the bug

Luis Fernandez

09/27/2022, 8:58 PM

does that stacktrace tell you anything at all? on the server w v3

Luis Fernandez

09/27/2022, 8:59 PM

Copy code

select DISTINCT stemmed_query from etsyads_listing_query_metrics limit 100000

Luis Fernandez

09/27/2022, 8:59 PM

yeah a query like this triggers it

Richard

09/27/2022, 9:00 PM

you mean triggers a bug in both compression libraries?

Luis Fernandez

09/27/2022, 9:01 PM

we get different errors depending on the writer we use, right now that query triggers the bug for v3, with snappy compressioncodec

Jackie

09/27/2022, 9:02 PM

Might not be the compression library, but how pinot handles the value write/read

Richard

09/27/2022, 9:02 PM

it's interesting that it's similar across several implementations too

Jackie

09/27/2022, 9:03 PM

We have tried a lot of combinations of version + compression, all of them runs into issues, which makes me think it might be some common code bug

Richard

09/27/2022, 9:04 PM

I got involved thinking this was a V4 bug, which I would have been happy to diagnose and fix, but it feels like it's a bigger issue than that and might even need access to data under NDA or something... Please ping me if help is needed with V4

Luis Fernandez

09/27/2022, 9:05 PM

tryna think how can i catch the faulty stemmed_query that’s causing this, i guess i can find a way to query the topic itself with ksqlDB, cause all the data is in the topic…

🤔 1

Jackie

09/27/2022, 9:08 PM

Thanks Richard! Will let you know if we find something

Luis Fernandez

09/27/2022, 9:09 PM

thank you Richard 🙏

Richard

09/27/2022, 9:09 PM

perhaps you can do a binary search on another column to narrow it down, e.g. if you have numeric identifiers add a filter

where x > mid_value

and if it doesn't error, check the lower quarters of the range and so on

Jackie

09/27/2022, 9:12 PM

Good point. To narrow it down, we can actually use the virtual column

$segmentName

and

$docId

Luis Fernandez

09/27/2022, 9:14 PM

noob question… what’s the docId

Jackie

09/27/2022, 9:14 PM

The document id (row id) within the segment

Jackie

09/27/2022, 9:14 PM

So you can filter out any document

Luis Fernandez

09/27/2022, 9:15 PM

is that like the order in which it was ingested (?) and like an auto increment?

Luis Fernandez

09/27/2022, 9:16 PM

and can i order by it or use it in where clauses (?)

Jackie

09/27/2022, 9:35 PM

Let's have a zoom chat so we can run some queries together?

Jackie

10/01/2022, 9:23 PM

@Richard We find the root cause, which is actually fixed in 0.11.0 with this PR: https://github.com/apache/pinot/pull/9059

🌟 1

Richard

10/02/2022, 3:34 PM

Pleased to hear it!

Luis Fernandez

10/03/2022, 4:51 PM

thank you so much @Jackie for all your support 🙏

Open in Slack

Previous Next