Hi, I was issuing partitioning in Pinot. When I q...
# troubleshooting
s
Hi, I was issuing partitioning in Pinot. When I query 'select where in' partition column, It doesn't show any record. But when I query 'select where not in' partition column, It seems ok. And after flushing segment, query 'select where in' result in right record. but after producing row (before flushed segments), it doesn't show record *)realtime table *)partition column : subject *) kafka topic partitions = 3 *) pinot partitiom function : Murmur
m
Hmm, this doesn’t make sense, what version of Pinot are you using?
Also, can you try
count(*)
instead of
*
to see if the problem still happens?
s
We are using Pinot 0.9.3. This is screenshot (count(*)). It seems same situation
m
This is definitely very strange. Can you share the query response metadata? Wondering if partitioning is pruning out the segment (not sure why that would happen).
@User What’s the data type of the partition column? Also are you saying that once segment is committed, then the problem goes away?
And any way if you can try the latest master code?
s
String type. and after segment is commited, I can see record. but after producing data. that data dosen't show. I am trying to get query response metadata via broker api. but syntax error. so I am checking more :)
@User I think it is easy problem. but I can't find why syntax error. Could you help me ? 🥲
m
You can check the JSON response format in the UI that will return the metadata
The syntax error is that you need escaping of single quote. But you can just use the UI and enable JSON format for result @User
s
Thank you :) I didn't know enabling json format for result in UI. This is result for query
Copy code
select * from transcript_key_0329_sunny where subject in ('Math')
m
I think some how partitioning is not set up correctly, or is getting confused due to data type, and pruning out the segment
Have you tried latest code from master?
a
@User 0.2.5 version of chart is released just now, try updating it might fix ur issue?
s
Before trying update, could you check if partitioning is set up correctly or not? I set up partitioning by refering to the pinot docs. https://docs.pinot.apache.org/operators/operating-pinot/tuning/routing Thank you for careful helping :)
m
Can you share the segment metadata from the swagger api?
Also, the partition config is for telling Pinot that data is already partitioned (using the same implementation of partition function as used in Pinot, name is not enough). Is your kafka topic partitioned by exact same MurMur implementation that Pinot uses?
s
Copy code
curl -X GET "<http://pay-poc-pinot.sandbox.onkakao.net:9001/segments/transcript_key_0329_sunny/transcript_key_0329_sunny__1__1__20220329T0837Z/metadata|http://pay-poc-pinot.sandbox.onkakao.net:9001/segments/transcript_key_0329_sunny/transcript_key_0329_sunny__1__1__20220329T0837Z/metadata>" -H "accept: application/json" -u 'admin:verysecret'
{"segment.creation.time":"1648543046947","segment.flush.threshold.size":"6","segment.name":"transcript_key_0329_sunny__1__1__20220329T0837Z","segment.partition.metadata":"{\"columnPartitionMap\":{\"subject\":{\"numPartitions\":3,\"partitions\":[1],\"functionName\":\"Murmur\"}}}","segment.realtime.numReplicas":"1","segment.realtime.startOffset":"2","segment.realtime.status":"IN_PROGRESS","segment.table.name":"transcript_key_0329_sunny","segment.type":"REALTIME"}
Yes. I produced kafka topic data by producer cli (not setting any other partitioner) I know that default kafka partition algorithm is murmur. And I checked that if setting Pinot partition to Modulo or ByteArray, It doesen't show any record from kafka topic in Pinot. so I think setting up Murmur in Pinot is not problem.
m
I feel that the partition function implementation might be different (even though the name matches)
Can you give me the exact string value of the column and the partition it belongs to on Kafka side? I will check if Pinot also thinks it is the same partition id
s
This is value on kafka (key, value)
Copy code
"Math":{"studentID":212,"firstName":"Nick","lastName":"Young","gender":"Male","subject":"Math","score":3.6,"timestampInEpoch":1572854500000}
I produced kafka topic data via kafka console cli like this
Copy code
/home/deploy/kafka_2.11-2.4.1/bin/kafka-console-producer.sh --broker-list <http://pay-poc-pinot-m3.ay1.krane.9rum.cc:9092|pay-poc-pinot-m3.ay1.krane.9rum.cc:9092> --topic transcript-key-0329-sunny  --property "parse.key=true" --property "key.separator=:" --property "print.key=true"
m
Assuming partition column “subject” will always match they key, I see that partition id of “Math” as computed in Pinot is 3. But from what I see in your segment partition metadata the partition id is 1. This is why the segment gets pruned and no records are found.
Do you have a committed segment? If so, I anticipate that it doesn’t have 1 single partition in it (ie has > 1), and so it doesn’t get pruned and you get the result.
@User
Also, can you try to remove the partition config from table config and see if the problem still happens? If not, then it is definitely due to partition function implementation mis-match
s
How can check partition id of "Math" as computed in Pinot is 3 ? @User ref) when I check in kafka, the rows including "Math" is in partition 1. when checking in Pinot, the rows including "Math" is in segment
transcript_key_0329_sunny__1__0_~~
if I remove the partitiom config from table, it dosen't happens. so as you mentioned, it is due to partitio function mis-match.
So it is indeed due to mis-match in the implementation. But it is still confusing because iirc if Pinot sees data as not partitioned in consuming segment it will not considered as partitioned. So this does seem like unexpected.
s
Yes. This is completely due to mis-match. When I produce data, key includes double quotation marks (""). This is cause of mismatch 🥲 It seems ok if not included double quotation in key.
m
Thanks for confirming
😊 1