I had a general question about Upsert Are the resource requi Apache Pinot #general

I had a general question about Upsert. Are the re...

Jai Patel

08/16/2021, 4:39 PM

I had a general question about Upsert. Are the resource required expected to be “significantly” higher than a normal Realtime table? I ask because our Upsert table seems to take significantly more resources. Our upsert table is a considerably wider table, but I’d like to understand if it’s that width that’s contributing a bulk of that load, or if it could be Upsert itself.

Kishore G

08/16/2021, 4:42 PM

yes, upsert needs more resources because of key - row id mapping. But the number of columns in the table should not increase the overhead.

Yupeng Fu

08/16/2021, 7:51 PM

also, consider not too complex primary key values (e.g. single value but not composite). or use this

hashFunction

https://github.com/apache/pinot/pull/7246

Jai Patel

08/16/2021, 11:54 PM

Thanks. Our keys are UUID or UUID+UUID. The first problem we found was that they were not uniformly distributed. So we hashed them with XX3 (xxhash). That definitely helped with the balance and turned them into longs. But we continue to use the tuple of UUIDs for the partitionKeyColumns.

Jai Patel

08/17/2021, 12:43 AM

Oh, and to add a little more detail, we found that the lack of uniformity started with the Kafka key when we used UUIDs. So we weren’t getting an even spread across the servers and we ultimately had hot nodes.

Yupeng Fu

08/17/2021, 2:07 AM

right, then you need to solve this distribution via shuffling with flink or so

Open in Slack

Previous Next