What are some ways to solve a data skew issue? We...
# troubleshooting
i
What are some ways to solve a data skew issue? We have a query that joins to a user table as below (I simplified the query to show the problem).
Copy code
SELECT ...
FROM markings
LEFT JOIN users AS created_user
  ON created_user.userid = markings.userid
Unfortunately, there isn't a good even distribution of users who create markings and we have a system process that's responsible for 90% of those. So 90% of data gets joined on the same worker. I thought about using a broadcast join. Given the user table is small enough to just index on every worker, but broadcast joins only work in batch mode. Is there a way to better distribute the processing of this join?