I had a general question about Upsert. Are the re...
# general
j
I had a general question about Upsert. Are the resource required expected to be “significantly” higher than a normal Realtime table? I ask because our Upsert table seems to take significantly more resources. Our upsert table is a considerably wider table, but I’d like to understand if it’s that width that’s contributing a bulk of that load, or if it could be Upsert itself.
k
yes, upsert needs more resources because of key - row id mapping. But the number of columns in the table should not increase the overhead.
y
also, consider not too complex primary key values (e.g. single value but not composite). or use this
hashFunction
https://github.com/apache/pinot/pull/7246
j
Thanks. Our keys are UUID or UUID+UUID. The first problem we found was that they were not uniformly distributed. So we hashed them with XX3 (xxhash). That definitely helped with the balance and turned them into longs. But we continue to use the tuple of UUIDs for the partitionKeyColumns.
Oh, and to add a little more detail, we found that the lack of uniformity started with the Kafka key when we used UUIDs. So we weren’t getting an even spread across the servers and we ultimately had hot nodes.
y
right, then you need to solve this distribution via shuffling with flink or so