Hi I was issuing partitioning in Pinot When I query select w Apache Pinot #troubleshooting

Hi, I was issuing partitioning in Pinot. When I q...

sunny

03/29/2022, 1:43 AM

Hi, I was issuing partitioning in Pinot. When I query 'select where in' partition column, It doesn't show any record. But when I query 'select where not in' partition column, It seems ok. And after flushing segment, query 'select where in' result in right record. but after producing row (before flushed segments), it doesn't show record *)realtime table *)partition column : subject *) kafka topic partitions = 3 *) pinot partitiom function : Murmur

Mayank

03/29/2022, 2:23 AM

Hmm, this doesn’t make sense, what version of Pinot are you using?

Mayank

03/29/2022, 2:23 AM

Also, can you try

count(*)

instead of

to see if the problem still happens?

sunny

03/29/2022, 2:41 AM

We are using Pinot 0.9.3. This is screenshot (count(*)). It seems same situation

Mayank

03/29/2022, 2:50 AM

This is definitely very strange. Can you share the query response metadata? Wondering if partitioning is pruning out the segment (not sure why that would happen).

Mayank

03/29/2022, 4:00 AM

@User What’s the data type of the partition column? Also are you saying that once segment is committed, then the problem goes away?

Mayank

03/29/2022, 4:00 AM

And any way if you can try the latest master code?

sunny

03/29/2022, 4:02 AM

String type. and after segment is commited, I can see record. but after producing data. that data dosen't show. I am trying to get query response metadata via broker api. but syntax error. so I am checking more :)

sunny

03/29/2022, 4:15 AM

@User I think it is easy problem. but I can't find why syntax error. Could you help me ? 🥲

Mayank

03/29/2022, 4:16 AM

You can check the JSON response format in the UI that will return the metadata

Mayank

03/29/2022, 4:38 AM

The syntax error is that you need escaping of single quote. But you can just use the UI and enable JSON format for result @User

sunny

03/29/2022, 4:43 AM

Thank you :) I didn't know enabling json format for result in UI. This is result for query

Copy code

select * from transcript_key_0329_sunny where subject in ('Math')

Mayank

03/29/2022, 2:29 PM

I think some how partitioning is not set up correctly, or is getting confused due to data type, and pruning out the segment

Mayank

03/29/2022, 2:30 PM

Have you tried latest code from master?

ahsen m

03/29/2022, 9:06 PM

@User 0.2.5 version of chart is released just now, try updating it might fix ur issue?

sunny

03/30/2022, 12:21 AM

Before trying update, could you check if partitioning is set up correctly or not? I set up partitioning by refering to the pinot docs. https://docs.pinot.apache.org/operators/operating-pinot/tuning/routing Thank you for careful helping :)

Mayank

03/30/2022, 12:52 AM

Can you share the segment metadata from the swagger api?

Mayank

03/30/2022, 12:55 AM

Also, the partition config is for telling Pinot that data is already partitioned (using the same implementation of partition function as used in Pinot, name is not enough). Is your kafka topic partitioned by exact same MurMur implementation that Pinot uses?

sunny

03/30/2022, 1:23 AM

Copy code

curl -X GET "<http://pay-poc-pinot.sandbox.onkakao.net:9001/segments/transcript_key_0329_sunny/transcript_key_0329_sunny__1__1__20220329T0837Z/metadata|http://pay-poc-pinot.sandbox.onkakao.net:9001/segments/transcript_key_0329_sunny/transcript_key_0329_sunny__1__1__20220329T0837Z/metadata>" -H "accept: application/json" -u 'admin:verysecret'
{"segment.creation.time":"1648543046947","segment.flush.threshold.size":"6","segment.name":"transcript_key_0329_sunny__1__1__20220329T0837Z","segment.partition.metadata":"{\"columnPartitionMap\":{\"subject\":{\"numPartitions\":3,\"partitions\":[1],\"functionName\":\"Murmur\"}}}","segment.realtime.numReplicas":"1","segment.realtime.startOffset":"2","segment.realtime.status":"IN_PROGRESS","segment.table.name":"transcript_key_0329_sunny","segment.type":"REALTIME"}

sunny

03/30/2022, 1:26 AM

Yes. I produced kafka topic data by producer cli (not setting any other partitioner) I know that default kafka partition algorithm is murmur. And I checked that if setting Pinot partition to Modulo or ByteArray, It doesen't show any record from kafka topic in Pinot. so I think setting up Murmur in Pinot is not problem.

Mayank

03/30/2022, 2:40 AM

I feel that the partition function implementation might be different (even though the name matches)

Mayank

03/30/2022, 2:40 AM

Can you give me the exact string value of the column and the partition it belongs to on Kafka side? I will check if Pinot also thinks it is the same partition id

sunny

03/30/2022, 3:12 AM

This is value on kafka (key, value)

Copy code

"Math":{"studentID":212,"firstName":"Nick","lastName":"Young","gender":"Male","subject":"Math","score":3.6,"timestampInEpoch":1572854500000}

I produced kafka topic data via kafka console cli like this

Copy code

/home/deploy/kafka_2.11-2.4.1/bin/kafka-console-producer.sh --broker-list <http://pay-poc-pinot-m3.ay1.krane.9rum.cc:9092|pay-poc-pinot-m3.ay1.krane.9rum.cc:9092> --topic transcript-key-0329-sunny  --property "parse.key=true" --property "key.separator=:" --property "print.key=true"

Mayank

03/30/2022, 4:30 AM

Assuming partition column “subject” will always match they key, I see that partition id of “Math” as computed in Pinot is 3. But from what I see in your segment partition metadata the partition id is 1. This is why the segment gets pruned and no records are found.

Mayank

03/30/2022, 4:32 AM

Do you have a committed segment? If so, I anticipate that it doesn’t have 1 single partition in it (ie has > 1), and so it doesn’t get pruned and you get the result.

Mayank

03/30/2022, 4:32 AM

@User

Mayank

03/30/2022, 4:42 AM

Also, can you try to remove the partition config from table config and see if the problem still happens? If not, then it is definitely due to partition function implementation mis-match

sunny

03/30/2022, 5:13 AM

How can check partition id of "Math" as computed in Pinot is 3 ? @User ref) when I check in kafka, the rows including "Math" is in partition 1. when checking in Pinot, the rows including "Math" is in segment

transcript_key_0329_sunny__1__0_~~

if I remove the partitiom config from table, it dosen't happens. so as you mentioned, it is due to partitio function mis-match.

Mayank

03/30/2022, 5:20 AM

You can look at https://github.com/apache/pinot/blob/21632dadb8cd2d8b77aec523a758d73a64f70b07/pino[…]apache/pinot/segment/spi/partition/MurmurPartitionFunction.java

Mayank

03/30/2022, 5:21 AM

So it is indeed due to mis-match in the implementation. But it is still confusing because iirc if Pinot sees data as not partitioned in consuming segment it will not considered as partitioned. So this does seem like unexpected.

sunny

03/30/2022, 7:16 AM

Yes. This is completely due to mis-match. When I produce data, key includes double quotation marks (""). This is cause of mismatch 🥲 It seems ok if not included double quotation in key.

Mayank

03/30/2022, 10:20 PM

Thanks for confirming

😊 1

Open in Slack

Previous Next