Ilya Sterin
09/13/2023, 2:54 PMSELECT ...
FROM markings
LEFT JOIN users AS created_user
ON created_user.userid = markings.userid
Unfortunately, there isn't a good even distribution of users who create markings and we have a system process that's responsible for 90% of those. So 90% of data gets joined on the same worker.
I thought about using a broadcast join. Given the user table is small enough to just index on every worker, but broadcast joins only work in batch mode.
Is there a way to better distribute the processing of this join?