<@UDQU92KBK> <@UDT7GFEG6> Is it deliberate that we...
# general
m
@User @User Is it deliberate that we don’t call toString() before calling hashCode() function here in HashCodePartitionFunction? If it is not then it’s a bug. https://github.com/apache/pinot/blob/master/pinot-segment-spi/src/main/java/org/ap[…]ache/pinot/segment/spi/partition/HashCodePartitionFunction.java Since we don’t call the
toString()
, A different hashCode is being generated for same value when segment pruning is done by PartitionSegmentPruner because it always call toString on literal value before invoking getPartitionId.
m
Could be. I do see that MurmurPartitionFunction does call the toString(), and is the more commonly used one. cc: @User
m
I have partitioned my offline table on
studentID
which is integer type column. I have generated some segments for this table. Now if I call below query, I don’t see any result because partitionId generated by SegmentPruner does not matches the partitionIds present in segment metadata which was generated when new value were being added in column index. -
select * from transcript_OFFLINE where studentID=200
Essentially, Anyone using HashCode Partition Function on a non-string column, are not seeing complete data 🙂
s
@User @User I think that this is a bug. Can you file the issue on Github? This should be the easy fix. In production, we use murmur partition function @ LinkedIn and it has been working well without any issue. I recommend to use murmur partition function for now.
m
Sure, I will create the issue in the Github and PR to fix it.
m
Even after fix, I’d still recommend to use murmur partition function, that’s the one more commonly used