opened a critical ticket admins Supabase #off-topic

Join Discord

opened a critical ticket @admins

# off-topic

Dots

05/16/2022, 11:19 PM

opened a critical ticket @admins

burggraf

05/16/2022, 11:20 PM

What do you mean? Nothing is down that I'm aware of.

Dots

05/16/2022, 11:20 PM

Opened a ticket

Dots

05/16/2022, 11:20 PM

our API requests are dropping on our go live

Dots

05/16/2022, 11:21 PM

https://app.supabase.io/project/bpzricpsuaeqlvanvnxf

burggraf

05/16/2022, 11:22 PM

Investigating now

Dots

05/16/2022, 11:26 PM

Thank you sir

burggraf

05/16/2022, 11:28 PM

You had a huge spike in usage on your api server, where network usage went from 0 to over 300MB/sec. Something's using your server bigtime. Check your support ticket.

Dots

05/16/2022, 11:28 PM

uea

Dots

05/16/2022, 11:29 PM

yea we have a go live today

Dots

05/16/2022, 11:29 PM

with over 5000 active users

Dots

05/16/2022, 11:29 PM

Can. you guys spin up some more dinos?

burggraf

05/16/2022, 11:29 PM

You're running this on a t4g.micro instance with one gig of ram!

Dots

05/16/2022, 11:29 PM

how do we upgrade

burggraf

05/16/2022, 11:29 PM

You can upgrade your instance yourself immediately from the dashboard.

burggraf

05/16/2022, 11:29 PM

Let me show you:

burggraf

05/16/2022, 11:30 PM

Settings / Billing & Usage / Change Subscription / Edit Plan Configuration.

Dots

05/16/2022, 11:31 PM

alright just updated lmfao

Dots

05/16/2022, 11:31 PM

Idk how we didnt figure this out fromt he last time

Dots

05/16/2022, 11:31 PM

but thank you

burggraf

05/16/2022, 11:31 PM

I'm so happy that this problem is a super successful project and not an attack or anything. This is easy to solve! Just jump up to a big instance...

burggraf

05/16/2022, 11:31 PM

you can always dial it down later -- but I'd go big then see how it goes and back off if you have too much power.

burggraf

05/16/2022, 11:32 PM

Happy to help, and happy to see developers be successful!

Dots

05/16/2022, 11:33 PM

can you doublec chekc

Dots

05/16/2022, 11:33 PM

some traffic now?

burggraf

05/16/2022, 11:35 PM

Let me look

burggraf

05/16/2022, 11:36 PM

I sent you a new Grafana shot in your support ticket. Looks MUCH better now!

burggraf

05/16/2022, 11:37 PM

You're still handling 1-3GB/sec network traffic but the server is barely sweating now

burggraf

05/16/2022, 11:38 PM

What kind of application is this that's using so much network?

burggraf

05/16/2022, 11:38 PM

crazy!

burggraf

05/16/2022, 11:44 PM

CPU on the API server is going higher now, though

burggraf

05/16/2022, 11:44 PM

How's the app seem to be running?

burggraf

05/16/2022, 11:44 PM

cpu 93% busy

Dots

05/16/2022, 11:48 PM

crashing every once in a while

Dots

05/16/2022, 11:48 PM

cant go higher can we?

burggraf

05/16/2022, 11:49 PM

Is that the biggest instance available?

burggraf

05/16/2022, 11:50 PM

What's doing the 1.5GB/sec of network traffic? That's huge

Dots

05/16/2022, 11:50 PM

most likely multiple servers pulling API data

Dots

05/16/2022, 11:50 PM

its a pretty big launch

Dots

05/16/2022, 11:50 PM

hmmm

Dots

05/16/2022, 11:51 PM

yea we are on the biggest at 400$ right now

burggraf

05/16/2022, 11:51 PM

I can see if any of our backend engineers are available now to look at this.

Dots

05/16/2022, 11:52 PM

dope

burggraf

05/16/2022, 11:56 PM

one of our other team members is looking at your server configuration to try to optimzie it

Dots

05/16/2022, 11:58 PM

thank you

Dots

05/16/2022, 11:58 PM

tell them we are fine with spending $$$ for this

Dots

05/16/2022, 11:58 PM

up to 2000$

Dots

05/16/2022, 11:58 PM

actually no budget really just bill us

burggraf

05/16/2022, 11:59 PM

Our team is upgrading you now: 12xlarge

Dots

05/16/2022, 11:59 PM

sounds good

burggraf

05/17/2022, 12:04 AM

From our engineer: Requests look fine right now but I’d like to upsize their DB to 4 or 8xlarge to be on the safe side. @Mark Burggraf when can I do this?

burggraf

05/17/2022, 12:04 AM

Can they upgrade you right now? It may mean a little downtime.

Dots

05/17/2022, 12:05 AM

as long as its less then 3 mins we should be good

burggraf

05/17/2022, 12:05 AM

Probably 2-3 minutes, max 5

burggraf

05/17/2022, 12:05 AM

is that good?

burggraf

05/17/2022, 12:06 AM

@darora is coming in

Dots

05/17/2022, 12:09 AM

Hey Darora!

darora

05/17/2022, 12:10 AM

Ok so your traffic looks fine and healthy right now, but to be on the safe side I'd like to bump your DB up a tier. Max 5 min downtime. My suggestion would be: unless you're expecting bursty traffic in the next 24 hour (e.g. there's a new wave of users coming in in an hour), we can just leave it as-is for now, and upsize if we see error rates start to spike. But if you know you're expecting larger traffic later, we should probably do it before that comes in.

Dots

05/17/2022, 12:11 AM

we should be good then

Dots

05/17/2022, 12:11 AM

the main spike should be taken care of

darora

05/17/2022, 12:12 AM

Cool, in that case I'll leave it as-is.

burggraf

05/17/2022, 12:13 AM

@darora will we have any problems with EBS balance or anything like that?

darora

05/17/2022, 12:14 AM

Their current usage pattern seems fine

burggraf

05/17/2022, 12:14 AM

API server is starting to spike again

burggraf

05/17/2022, 12:14 AM

90% CPU

burggraf

05/17/2022, 12:16 AM

and it's coming back down... ok

darora

05/17/2022, 12:18 AM

Didn't cause any errors afaict. If that happens our hand will be forced into upsizing the DB. Even for a 5k users drop the amount of egress traffic your project is seeing is pretty substantial. If I had to take a stab in the dark I'd guess there's a lot of unnecessary `select *`s happening

burggraf

05/17/2022, 12:20 AM

I saw it peak at 3.28GB/sec. I've never seen that kind of traffic.

burggraf

05/17/2022, 12:21 AM

DB load is still around 200% -- is that ok?

darora

05/17/2022, 12:22 AM

It's at the upper limit of its perf ability

darora

05/17/2022, 12:27 AM

(signing off discord; your DB could still use a resize whenever most opportune)

Rorstro

05/17/2022, 1:07 AM

Also from the Supabase team here, just sent over a friend request @Dots

Dots

05/17/2022, 1:30 AM

accepted if we need to setup DMS

Dots

05/17/2022, 1:30 AM

Also just wanted to say sorry to the team here for not knowing about the upgradeability of the slugs - absolutely stupid on that one

burggraf

05/17/2022, 1:34 AM

We're happy to help, any time! Just reach out whenever you know you're going to have a need 😀

Dots

05/17/2022, 1:35 AM

Yea def should of opened a ticket earlier as well to get some of this dialogue going, if you guys collect NFTs at all - got some free 1s on me.

burggraf

05/17/2022, 1:36 AM

I don't, but that sounds like a really nice offer!

Dots

05/17/2022, 2:38 AM

Hey guys how are we holding up? Seeing some errors

Dots

05/17/2022, 2:41 AM

@here any ideas? Looks like we are hard down

burggraf

05/17/2022, 1:14 PM

Div still wanted to upgrade your db server. When is the best time for us to do that?

Dots

05/17/2022, 1:53 PM

Now

Dots

05/17/2022, 1:53 PM

Go for it

burggraf

05/17/2022, 1:59 PM

Let me see if div is here no.

burggraf

05/17/2022, 1:59 PM

now

burggraf

05/17/2022, 2:01 PM

I don't see him online. Do you know what size you wanted to upgrade the DB to?

burggraf

05/17/2022, 2:02 PM

db is now: m6g.2xlarge api is now: m6g.12xlarge

burggraf

05/17/2022, 2:04 PM

I'll see if Inian can weigh in on this, hold on.

burggraf

05/17/2022, 2:06 PM

How has it been over the last 12 hours or so?

Dots

05/17/2022, 2:08 PM

db and api were upgraded?

Dots

05/17/2022, 2:08 PM

or thats what they are currently

burggraf

05/17/2022, 2:09 PM

yes that's current

burggraf

05/17/2022, 2:11 PM

Sorry I'm not sure how big they wanted to go, so I'll hold off for now unless you're having big problems?

burggraf

05/17/2022, 2:12 PM

I just want to make sure it doesn't fall through the cracks.

Dots

05/17/2022, 2:18 PM

Yea I think its less of a DB issue and more just network requests - we are working on some things on our end

burggraf

05/17/2022, 2:20 PM

Ok once Div gets back and has a chance to analyze things we can upgrade that box.

Dots

05/18/2022, 3:26 AM

hmmm looks like still getting some intermittent issues with upserts

burggraf

05/18/2022, 12:31 PM

Ok, when is a good time to upgrade your db instance?

burggraf

05/18/2022, 12:34 PM

Your API instance is on 12xl, your DB is on 2xl -- did you want to go all the way up to 12xl for the DB just to be safe, or Div said you might be ok with 8x or 4x. It's your call.

Dots

05/18/2022, 3:32 PM

sorry was sleeping

Dots

05/18/2022, 3:32 PM

we have 8 hours left on our DB usage

Dots

05/18/2022, 3:32 PM

might as well bump it since we keep going down

Dots

05/18/2022, 3:42 PM

let me know when you have it back up if you bounced it

burggraf

05/18/2022, 3:45 PM

So did you want to go to 4xl, 8xl, or all the way to 12xl?

Dots

05/18/2022, 3:46 PM

whatever is gonna stop the connection drops - maybe just bump to 4xl and pray

burggraf

05/18/2022, 3:46 PM

ok, I will up to 4xl now, good?

Dots

05/18/2022, 3:46 PM

sound good

burggraf

05/18/2022, 3:50 PM

upgrade in progress

burggraf

05/18/2022, 3:53 PM

Ok, this is done.

burggraf

05/18/2022, 3:53 PM

m6g.4xlarge with 16cpus and 64gigs ram

burggraf

05/18/2022, 3:53 PM

that's double CPU and double ram I think from what you had

Dots

05/18/2022, 3:54 PM

yea seems just taking a second to get connected

burggraf

05/18/2022, 3:55 PM

It's coming online, but looks good so far.

burggraf

05/18/2022, 3:55 PM

I'll watch it for a while.

burggraf

05/18/2022, 3:59 PM

It looks real good now, CPU now maxes around 50-52%. So far, so good.

Dots

05/18/2022, 9:41 PM

@burggraf if you are around can ou check - seems we are down

burggraf

05/18/2022, 9:42 PM

Will look

burggraf

05/18/2022, 9:44 PM

It looks up to me, and I don't see any down periods

Dots

05/18/2022, 9:45 PM

hmmm no clue why we see the drop off on those

Dots

05/18/2022, 9:45 PM

weird

burggraf

05/18/2022, 9:45 PM

High usage on the CPU for the API server, but it's up and taking requests as far as I can see. Is your app responding?

burggraf

05/18/2022, 9:46 PM

DB, last 3 hours

burggraf

05/18/2022, 9:47 PM

API last 3 hours

burggraf

05/18/2022, 9:48 PM

Those high CPU spikes are still concerning. Are you still working on optimizing things on your end?

Dots

05/18/2022, 9:48 PM

weird the API is getting hammered after the changes we have made

Dots

05/18/2022, 9:48 PM

API closes soon

burggraf

05/18/2022, 9:50 PM

You might want to move this thread over to our shared Slack channel.

burggraf

05/18/2022, 9:50 PM

That'll make it easier for me to bring in other teams if we need to, moving forward

burggraf

05/18/2022, 9:56 PM

I'm still seeing a reference to

select *

in your postgrest logs, which is super inefficient

Dots

05/18/2022, 9:56 PM

fair but the table is only 3 columns and decently small

Dots

05/18/2022, 9:56 PM

if you have a reccomendation on the change shoot

Dots

05/18/2022, 9:56 PM

and maybe we should just bounce the API server if possible because we are down

Dots

05/18/2022, 9:56 PM

zero requests getting through to the DB

burggraf

05/18/2022, 9:57 PM

ScientistTokens has 40k+ rows

burggraf

05/18/2022, 9:57 PM

I can reboot the API server, you want me to do that?

burggraf

05/18/2022, 9:58 PM

I'm ready to reboot API, just say the word.

Dots

05/18/2022, 9:59 PM

word

Dots

05/18/2022, 9:59 PM

ready

burggraf

05/18/2022, 9:59 PM

rebooting

Dots

05/18/2022, 9:59 PM

also maybe this is poor understanding - but we are using a where clause and not just returning the entire DB

Dots

05/18/2022, 9:59 PM

unless im trippin

burggraf

05/18/2022, 9:59 PM

I don't see a where clause in the log

Dots

05/18/2022, 10:00 PM

alright let me check which query is doing that if thats the case

burggraf

05/18/2022, 10:00 PM

here's a sample log entry: May 18 15:48:58 bpzricpsuaeqlvanvnxf postgrest[2209]: 127.0.0.1 - anon [18/May/2022:15:48:54 +0000] "GET /ScientistTokens?select=* HTTP/1.1" 503 - "" "python-h ttpx/0.21.3"

burggraf

05/18/2022, 10:01 PM

that looks to me like it's getting the whole table, but I could be wrong

burggraf

05/18/2022, 10:01 PM

here's a log record with an eq comparison:

Copy code

May 18 15:49:03 bpzricpsuaeqlvanvnxf postgrest[2209]: 127.0.0.1 - anon [18/May/2022:15:48:54 +0000] "GET /ScientistTokens?select=*&tokenId=eq.5364 HTTP/1.1" 50
0 - "" "python-httpx/0.21.3"

burggraf

05/18/2022, 10:02 PM

but there are very few of those, most look like the first one I sent

burggraf

05/18/2022, 10:02 PM

That would explain the huge network load

burggraf

05/18/2022, 10:05 PM

Is it looking better since the reboot?

Dots

05/18/2022, 10:08 PM

yea look better

Dots

05/18/2022, 10:08 PM

if you have an example of the call we are making

Dots

05/18/2022, 10:08 PM

I can see if we can fix it

Dots

05/18/2022, 10:08 PM

nah t hat should do our where call the record you sent

Dots

05/18/2022, 10:09 PM

will see what I have on our end

burggraf

05/18/2022, 10:09 PM

Basically it looks like you're grabbing the entire

ScientistTokens

table with no where clause.

Dots

05/18/2022, 10:10 PM

we extended our window for an hour - it all closes down at 7pm CST - might want to keep a eye on it if possible and are available

Dots

05/18/2022, 10:10 PM

or ill gladly take access to reboot the instance haha

burggraf

05/18/2022, 10:10 PM

I will keep an eye on it.

burggraf

05/18/2022, 10:10 PM

You can reboot the entire project from the dashboard at any time, but only I can reboot just the API server, I think.

burggraf

05/18/2022, 10:11 PM

I've asked our postgres expert to confirm what I'm seeing. But check your code for this.

Dots

05/18/2022, 10:11 PM

I got a gift for you after this is over burg haha

burggraf

05/18/2022, 10:11 PM

Thanks, but I really live for doing stuff like this. 😀

Dots

05/18/2022, 10:12 PM

2Gig a second constant stream is no joke

burggraf

05/18/2022, 10:12 PM

down again?

Dots

05/18/2022, 10:12 PM

hmmm if you see us spiking maybe

Dots

05/18/2022, 10:13 PM

id check the app but dont want to be part of the problem

burggraf

05/18/2022, 10:13 PM

looks down, let me boot it again

Dots

05/18/2022, 10:13 PM

on reboot are we just instantly hitting max CPU again?

burggraf

05/18/2022, 10:14 PM

wait, could just be a lag in the logging

burggraf

05/18/2022, 10:14 PM

CPU is spiked, yes

burggraf

05/18/2022, 10:14 PM

96%

Dots

05/18/2022, 10:14 PM

alright we pushed new code

Dots

05/18/2022, 10:14 PM

you bounce - and lets see

Dots

05/18/2022, 10:14 PM

we removed a call that does a count

burggraf

05/18/2022, 10:14 PM

reboot the API server now?

Dots

05/18/2022, 10:15 PM

yea

burggraf

05/18/2022, 10:16 PM

done

burggraf

05/18/2022, 10:16 PM

https://supabase.com/docs/reference/javascript/select#querying-with-count-option

burggraf

05/18/2022, 10:16 PM

That syntax will let you count all rows without returning any data (except the count) and is 48,000 times faster in this case, LOL.

Dots

05/18/2022, 10:16 PM

haha

Dots

05/18/2022, 10:17 PM

our code is pushed- lets see how this goes

burggraf

05/18/2022, 10:17 PM

Look at CPU now

burggraf

05/18/2022, 10:19 PM

Is your app up and running now?

burggraf

05/18/2022, 10:19 PM

Because CPU % dropped off a cliff

Dots

05/18/2022, 10:20 PM

yea it should be

Dots

05/18/2022, 10:20 PM

haha

Dots

05/18/2022, 10:21 PM

If you are interested in the app

Dots

05/18/2022, 10:21 PM

https://alos.augmintedlabs.io/

Dots

05/18/2022, 10:22 PM

what monitoring app do you use?

burggraf

05/18/2022, 10:22 PM

The server stats are night and day now

burggraf

05/18/2022, 10:23 PM

Grafana

burggraf

05/18/2022, 10:24 PM

I'm looking at your app and I have no idea what I'm looking at. LOL

Dots

05/18/2022, 10:24 PM

haha

burggraf

05/18/2022, 10:25 PM

So can you confirm everything is running smoothly? The server isn't even hardly registering any usage.

burggraf

05/18/2022, 10:26 PM

you sure your code is ok?

Dots

05/18/2022, 10:27 PM

let me double check

burggraf

05/18/2022, 10:27 PM

In your app, I get an error when I hit "connect"

burggraf

05/18/2022, 10:27 PM

Copy code

Trying to connect to your Metamask wallet
551-0c1df362475b9617.js:1 Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'request')

burggraf

05/18/2022, 10:28 PM

But I don't have any wallets, not sure if that makes a difference.

Dots

05/18/2022, 10:29 PM

yea def need metamask to connect if you want to give it a shot

Dots

05/18/2022, 10:29 PM

will send you some ETH so you can mint something haha

Dots

05/18/2022, 10:29 PM

but yea our APP is up

Dots

05/18/2022, 10:29 PM

we were using a python wrapper for calls that I don't think impelemented the count correctly

Dots

05/18/2022, 10:29 PM

so that might of been the smoking gun 😦

burggraf

05/18/2022, 10:34 PM

Wow, if it's working, then this may just be a 48,000 times improvement 🙂

burggraf

05/18/2022, 10:35 PM

You're now peaking at 20-40kbps in network traffic, about 1000 times lower than before.

burggraf

05/18/2022, 10:39 PM

Crisis averted!

Dots

05/18/2022, 10:39 PM

it was that simple all along

burggraf

05/18/2022, 10:40 PM

Yep. You should be in great shape now

burggraf

05/18/2022, 10:40 PM

You can go from 5,000 to 500,000 users now

Dots

05/19/2022, 12:57 AM

Alright we should be able to scale evreything down now - should see greatly reduced traffic

burggraf

05/19/2022, 12:58 AM

You should be able to do this yourself from the dashboard I think, if not let me know.

Dots

05/19/2022, 12:58 AM

will do!

2 Views

Previous Next