Hey <@U015DRP62P7> and I have been having connecti...
# orm-help
a
Hey @Amit and I have been having connection pool issues (happens rarely).
Copy code
PrismaClientKnownRequestError: Timed out fetching a new connection from the connection pool. (More info: <http://pris.ly/d/connection-pool>, Current connection limit: 1)
This happened multiple times to different node processes on the same container. After closing and restarting the container this issue doesn’t reoccur.
p
hasn't worked for me even once since yesterday
very frustrating 😞
j
We have had similar experiences, also resolved by restarting the container(s). It is difficult to reproduce it though, so we cant be certain how to prevent it from happening though or how to debug it at the moment
r
@potatoxchip This should not happen with if using the Data Proxy. @Amit Goldfarb Could you share more about your setup i.e. where you have deployed your app and the Prisma version?
a
We are using prisma 3.2.0, we deployed on AWS ec2 instance. We are running on this dockerImage -> node:12.22.1-buster-slim
a
Our gut feeling, which might not be accurate, is that there's some scenario where the connection is not returned to the connection pool. Does that makes sense? Is there a timeout on the connection pool such that connection is being taken back to the pool after said timeout?
j
@Amit we have a similar conclusion. We have autoscaling on our fargate tasks (docker like containers on aws). When one closes after scaling down again , it seems like the connections are not returned (hence our application is very slow)
a
Thanks @Jonathan, did you find some hack to mitigate it in the meantime? I'm thinking about something like - resetting the pool once in a while? It's annoying and brings me back to how we fixed memory leak in Ruby five years ago
j
Not yet, we disabled autoscaling for now and just add more resources to one container. We and plan on finding out how to reproduce / fix it but havent had time and resources yet
Its not ideal though, and it is a pretty big hazard for us to solve. I would be interested to see whether other users have solved this somehow
It seems that prisma disconnects on a SIGINT signal, so maybe your node process is not closed with this signal?
a
Is this still relevant with Napi?
j
I am not certain to be honest. maybe @Ryan knows?
prisma.$on('beforeExit', async () => { console.log('beforeExit hook') }) We are gonna add this debug statement to see if it does an exit after autoscaling sown
r
Ideally it should. Have you tried performing a disconnect on SIGINT and checking if that changes anything?
j
We only did one for SIGTERM since that is the message we get from the fargate task manager (container). I wasnt certain if SIGTERM => ‘node app.js’ then would also be recognized by prisma in some way. But perhaps not?
a
The thing with debugging this thing is that at least we (@Amit Goldfarb and I) aren't sure when it happens, so whatever we will try or did try to implement we said that it might've fixed it. It'll be way easier if we at least know the situation in which a connection is not returned to the pool
For us BTW it happened in the past also on a λ function. This was particularly bad because it was a function from our provisioned concurrency pool. In that case, we have verified with AWS support that indeed there was some intermittent network issue that might've caused it
j
One thing that helped us in some way, is to run
*select* * *from* pg_stat_activity
and check how many connections our prisma client could have at that moment to our database
a
That's a good idea, are you naming your connections?
j
I wasnt aware that is possible? I only count the nr of connections, and then after autoscale count rhem again. I notice that it is capped and cant grow afterwards
a
Interesting
t
Just had this happening. Really don't know how to debug it. Did you guys get any further? Had to destroy AWS ECS task and let is restart
a
@Timo just now saw your message. For our λ instances, we basically
exit 1
whenever we bump into something like that, this is very high level, of course it includes error handling and specific usecases, replays, etc. But the fastest way we have found to deal with it is just destroy the entire λ container and to restart it (done automatically)
t
Interesting, @Amit. Where do you catch the error?