I had a general question about Upsert. Are the resource required expected to be “significantly” higher than a normal Realtime table? I ask because our Upsert table seems to take significantly more resources. Our upsert table is a considerably wider table, but I’d like to understand if it’s that width that’s contributing a bulk of that load, or if it could be Upsert itself.
k
Kishore G
08/16/2021, 4:42 PM
yes, upsert needs more resources because of key - row id mapping. But the number of columns in the table should not increase the overhead.
y
Yupeng Fu
08/16/2021, 7:51 PM
also, consider not too complex primary key values (e.g. single value but not composite). or use this
Thanks. Our keys are UUID or UUID+UUID. The first problem we found was that they were not uniformly distributed. So we hashed them with XX3 (xxhash). That definitely helped with the balance and turned them into longs. But we continue to use the tuple of UUIDs for the partitionKeyColumns.
Jai Patel
08/17/2021, 12:43 AM
Oh, and to add a little more detail, we found that the lack of uniformity started with the Kafka key when we used UUIDs. So we weren’t getting an even spread across the servers and we ultimately had hot nodes.
y
Yupeng Fu
08/17/2021, 2:07 AM
right, then you need to solve this distribution via shuffling with flink or so