Ran into a severe performance issue yesterday I have a model Prisma #prisma-client

Ran into a severe performance issue yesterday. I h...

Sean Sullivan

03/09/2022, 6:35 PM

Ran into a severe performance issue yesterday. I have a model like

Copy code

model User {
  id           Int      @id @default(autoincrement())
  primaryEmail String   @unique @map("primary_email")
  emails        Email[]
  @@map("users")
}

model Email {
  id                 Int      @id @default(autoincrement())
  userId             Int      @map("user_id")
  user               User     @relation(fields: [userId], references: [id])
  email              String   @unique
  @@map("emails")
}

And I'd like to get all users who don't have an email, per the docs, I'm running:

Copy code

prisma.user.findMany({
  where: { emails: { none: {} } }
});

This creates SQL along the lines of:

Copy code

SELECT * FROM users
WHERE (users.id) NOT IN (
    SELECT users.id
    FROM users
    INNER JOIN emails
        ON (emails.user_id) = (users.id)
    WHERE (users.id IS NOT NULL)
)

The query plan for this

Copy code

EXPLAIN SELECT * FROM users
WHERE (users.id) NOT IN (
    SELECT users.id
    FROM users
    INNER JOIN emails
        ON (emails.user_id) = (users.id)
    WHERE (users.id IS NOT NULL)
);
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Seq Scan on users  (cost=2057.26..59509346.61 rows=24028 width=26)
   Filter: (NOT (SubPlan 1))
   SubPlan 1
     ->  Materialize  (cost=2057.26..4394.34 rows=55777 width=4)
           ->  Hash Join  (cost=2057.26..3897.46 rows=55777 width=4)
                 Hash Cond: (emails.user_id = users_1.id)
                 ->  Seq Scan on emails  (cost=0.00..1069.77 rows=55777 width=4)
                 ->  Hash  (cost=1268.56..1268.56 rows=48056 width=4)
                       ->  Seq Scan on users users_1  (cost=0.00..1268.56 rows=48056 width=4)
                             Filter: (id IS NOT NULL)
(10 rows)

is prohibitively expensive to run because it materializes a table of all users who have emails and then scans it for each row in users. Our users table is relatively small (~52,000 rows) but these queries take forever (i.e. they don't terminate) and started clogging up our db connection pool. I can write a handwritten query that joins a table of all users without an email that runs almost instantly. I'm wondering if there is an explanation for this behavior so I can understand and avoid it in the future, or if this is sub-optimal performance that could be fixed with a PR?

Vladi Stevanovic

03/10/2022, 7:44 AM

@Sean Sullivan thank you for raising this! I've moved the convo to #C036EJQ7J6N

2 Views

Open in Slack

Previous Next