< UDQU92KBK> < UDT7GFEG6> Is it deliberate that we don t cal Apache Pinot #general

<@UDQU92KBK> <@UDT7GFEG6> Is it deliberate that we...

Mohemmad Zaid Khan

02/17/2022, 7:11 AM

@User @User Is it deliberate that we don’t call toString() before calling hashCode() function here in HashCodePartitionFunction? If it is not then it’s a bug. https://github.com/apache/pinot/blob/master/pinot-segment-spi/src/main/java/org/ap[…]ache/pinot/segment/spi/partition/HashCodePartitionFunction.java Since we don’t call the

toString()

, A different hashCode is being generated for same value when segment pruning is done by PartitionSegmentPruner because it always call toString on literal value before invoking getPartitionId.

Mayank

02/17/2022, 7:15 AM

Could be. I do see that MurmurPartitionFunction does call the toString(), and is the more commonly used one. cc: @User

Mohemmad Zaid Khan

02/17/2022, 7:17 AM

I have partitioned my offline table on

studentID

which is integer type column. I have generated some segments for this table. Now if I call below query, I don’t see any result because partitionId generated by SegmentPruner does not matches the partitionIds present in segment metadata which was generated when new value were being added in column index. -

select * from transcript_OFFLINE where studentID=200

Mohemmad Zaid Khan

02/17/2022, 7:19 AM

Essentially, Anyone using HashCode Partition Function on a non-string column, are not seeing complete data 🙂

Seunghyun

02/17/2022, 8:06 AM

@User @User I think that this is a bug. Can you file the issue on Github? This should be the easy fix. In production, we use murmur partition function @ LinkedIn and it has been working well without any issue. I recommend to use murmur partition function for now.

Mohemmad Zaid Khan

02/17/2022, 8:08 AM

Sure, I will create the issue in the Github and PR to fix it.

Mohemmad Zaid Khan

02/17/2022, 11:25 AM

Issue https://github.com/apache/pinot/issues/8215 PR https://github.com/apache/pinot/pull/8216/files

👍 1

Mayank

02/17/2022, 3:19 PM

Even after fix, I’d still recommend to use murmur partition function, that’s the one more commonly used

Open in Slack

Previous Next