opened a critical ticket @admins
# off-topic
d
opened a critical ticket @admins
b
What do you mean? Nothing is down that I'm aware of.
d
Opened a ticket
our API requests are dropping on our go live
b
Investigating now
d
Thank you sir
b
You had a huge spike in usage on your api server, where network usage went from 0 to over 300MB/sec. Something's using your server bigtime. Check your support ticket.
d
uea
yea we have a go live today
with over 5000 active users
Can. you guys spin up some more dinos?
b
You're running this on a t4g.micro instance with one gig of ram!
d
how do we upgrade
b
You can upgrade your instance yourself immediately from the dashboard.
Let me show you:
Settings / Billing & Usage / Change Subscription / Edit Plan Configuration.
d
alright just updated lmfao
Idk how we didnt figure this out fromt he last time
but thank you
b
I'm so happy that this problem is a super successful project and not an attack or anything. This is easy to solve! Just jump up to a big instance...
you can always dial it down later -- but I'd go big then see how it goes and back off if you have too much power.
Happy to help, and happy to see developers be successful!
d
can you doublec chekc
some traffic now?
b
Let me look
I sent you a new Grafana shot in your support ticket. Looks MUCH better now!
You're still handling 1-3GB/sec network traffic but the server is barely sweating now
What kind of application is this that's using so much network?
crazy!
CPU on the API server is going higher now, though
How's the app seem to be running?
cpu 93% busy
d
crashing every once in a while
cant go higher can we?
b
Is that the biggest instance available?
What's doing the 1.5GB/sec of network traffic? That's huge
d
most likely multiple servers pulling API data
its a pretty big launch
hmmm
yea we are on the biggest at 400$ right now
b
I can see if any of our backend engineers are available now to look at this.
d
dope
b
one of our other team members is looking at your server configuration to try to optimzie it
d
thank you
tell them we are fine with spending $$$ for this
up to 2000$
actually no budget really just bill us
b
Our team is upgrading you now: 12xlarge
d
sounds good
b
From our engineer: Requests look fine right now but I’d like to upsize their DB to 4 or 8xlarge to be on the safe side. @Mark Burggraf when can I do this?
Can they upgrade you right now? It may mean a little downtime.
d
as long as its less then 3 mins we should be good
b
Probably 2-3 minutes, max 5
is that good?
@darora is coming in
d
Hey Darora!
d
Ok so your traffic looks fine and healthy right now, but to be on the safe side I'd like to bump your DB up a tier. Max 5 min downtime. My suggestion would be: unless you're expecting bursty traffic in the next 24 hour (e.g. there's a new wave of users coming in in an hour), we can just leave it as-is for now, and upsize if we see error rates start to spike. But if you know you're expecting larger traffic later, we should probably do it before that comes in.
d
we should be good then
the main spike should be taken care of
d
Cool, in that case I'll leave it as-is.
b
@darora will we have any problems with EBS balance or anything like that?
d
Their current usage pattern seems fine
b
API server is starting to spike again
90% CPU
and it's coming back down... ok
d
Didn't cause any errors afaict. If that happens our hand will be forced into upsizing the DB. Even for a 5k users drop the amount of egress traffic your project is seeing is pretty substantial. If I had to take a stab in the dark I'd guess there's a lot of unnecessary `select *`s happening
b
I saw it peak at 3.28GB/sec. I've never seen that kind of traffic.
DB load is still around 200% -- is that ok?
d
It's at the upper limit of its perf ability
(signing off discord; your DB could still use a resize whenever most opportune)
r
Also from the Supabase team here, just sent over a friend request @Dots
d
accepted if we need to setup DMS
Also just wanted to say sorry to the team here for not knowing about the upgradeability of the slugs - absolutely stupid on that one
b
We're happy to help, any time! Just reach out whenever you know you're going to have a need šŸ˜€
d
Yea def should of opened a ticket earlier as well to get some of this dialogue going, if you guys collect NFTs at all - got some free 1s on me.
b
I don't, but that sounds like a really nice offer!
d
Hey guys how are we holding up? Seeing some errors
@here any ideas? Looks like we are hard down
b
Div still wanted to upgrade your db server. When is the best time for us to do that?
d
Now
Go for it
b
Let me see if div is here no.
now
I don't see him online. Do you know what size you wanted to upgrade the DB to?
db is now: m6g.2xlarge api is now: m6g.12xlarge
I'll see if Inian can weigh in on this, hold on.
How has it been over the last 12 hours or so?
d
db and api were upgraded?
or thats what they are currently
b
yes that's current
Sorry I'm not sure how big they wanted to go, so I'll hold off for now unless you're having big problems?
I just want to make sure it doesn't fall through the cracks.
d
Yea I think its less of a DB issue and more just network requests - we are working on some things on our end
b
Ok once Div gets back and has a chance to analyze things we can upgrade that box.
d
hmmm looks like still getting some intermittent issues with upserts
b
Ok, when is a good time to upgrade your db instance?
Your API instance is on 12xl, your DB is on 2xl -- did you want to go all the way up to 12xl for the DB just to be safe, or Div said you might be ok with 8x or 4x. It's your call.
d
sorry was sleeping
we have 8 hours left on our DB usage
might as well bump it since we keep going down
let me know when you have it back up if you bounced it
b
So did you want to go to 4xl, 8xl, or all the way to 12xl?
d
whatever is gonna stop the connection drops - maybe just bump to 4xl and pray
b
ok, I will up to 4xl now, good?
d
sound good
b
upgrade in progress
Ok, this is done.
m6g.4xlarge with 16cpus and 64gigs ram
that's double CPU and double ram I think from what you had
d
yea seems just taking a second to get connected
b
It's coming online, but looks good so far.
I'll watch it for a while.
It looks real good now, CPU now maxes around 50-52%. So far, so good.
d
@burggraf if you are around can ou check - seems we are down
b
Will look
It looks up to me, and I don't see any down periods
d
hmmm no clue why we see the drop off on those
weird
b
High usage on the CPU for the API server, but it's up and taking requests as far as I can see. Is your app responding?
DB, last 3 hours
API last 3 hours
Those high CPU spikes are still concerning. Are you still working on optimizing things on your end?
d
weird the API is getting hammered after the changes we have made
API closes soon
b
You might want to move this thread over to our shared Slack channel.
That'll make it easier for me to bring in other teams if we need to, moving forward
I'm still seeing a reference to
select *
in your postgrest logs, which is super inefficient
d
fair but the table is only 3 columns and decently small
if you have a reccomendation on the change shoot
and maybe we should just bounce the API server if possible because we are down
zero requests getting through to the DB
b
ScientistTokens has 40k+ rows
I can reboot the API server, you want me to do that?
I'm ready to reboot API, just say the word.
d
word
ready
b
rebooting
d
also maybe this is poor understanding - but we are using a where clause and not just returning the entire DB
unless im trippin
b
I don't see a where clause in the log
d
alright let me check which query is doing that if thats the case
b
here's a sample log entry: May 18 15:48:58 bpzricpsuaeqlvanvnxf postgrest[2209]: 127.0.0.1 - anon [18/May/2022:15:48:54 +0000] "GET /ScientistTokens?select=* HTTP/1.1" 503 - "" "python-h ttpx/0.21.3"
that looks to me like it's getting the whole table, but I could be wrong
here's a log record with an eq comparison:
Copy code
May 18 15:49:03 bpzricpsuaeqlvanvnxf postgrest[2209]: 127.0.0.1 - anon [18/May/2022:15:48:54 +0000] "GET /ScientistTokens?select=*&tokenId=eq.5364 HTTP/1.1" 50
0 - "" "python-httpx/0.21.3"
but there are very few of those, most look like the first one I sent
That would explain the huge network load
Is it looking better since the reboot?
d
yea look better
if you have an example of the call we are making
I can see if we can fix it
nah t hat should do our where call the record you sent
will see what I have on our end
b
Basically it looks like you're grabbing the entire
ScientistTokens
table with no where clause.
d
we extended our window for an hour - it all closes down at 7pm CST - might want to keep a eye on it if possible and are available
or ill gladly take access to reboot the instance haha
b
I will keep an eye on it.
You can reboot the entire project from the dashboard at any time, but only I can reboot just the API server, I think.
I've asked our postgres expert to confirm what I'm seeing. But check your code for this.
d
I got a gift for you after this is over burg haha
b
Thanks, but I really live for doing stuff like this. šŸ˜€
d
2Gig a second constant stream is no joke
b
down again?
d
hmmm if you see us spiking maybe
id check the app but dont want to be part of the problem
b
looks down, let me boot it again
d
on reboot are we just instantly hitting max CPU again?
b
wait, could just be a lag in the logging
CPU is spiked, yes
96%
d
alright we pushed new code
you bounce - and lets see
we removed a call that does a count
b
reboot the API server now?
d
yea
b
done
That syntax will let you count all rows without returning any data (except the count) and is 48,000 times faster in this case, LOL.
d
haha
our code is pushed- lets see how this goes
b
Look at CPU now
Is your app up and running now?
Because CPU % dropped off a cliff
d
yea it should be
haha
If you are interested in the app
what monitoring app do you use?
b
The server stats are night and day now
Grafana
I'm looking at your app and I have no idea what I'm looking at. LOL
d
haha
b
So can you confirm everything is running smoothly? The server isn't even hardly registering any usage.
you sure your code is ok?
d
let me double check
b
In your app, I get an error when I hit "connect"
Copy code
Trying to connect to your Metamask wallet
551-0c1df362475b9617.js:1 Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'request')
But I don't have any wallets, not sure if that makes a difference.
d
yea def need metamask to connect if you want to give it a shot
will send you some ETH so you can mint something haha
but yea our APP is up
we were using a python wrapper for calls that I don't think impelemented the count correctly
so that might of been the smoking gun 😦
b
Wow, if it's working, then this may just be a 48,000 times improvement šŸ™‚
You're now peaking at 20-40kbps in network traffic, about 1000 times lower than before.
Crisis averted!
d
it was that simple all along
b
Yep. You should be in great shape now
You can go from 5,000 to 500,000 users now
d
Alright we should be able to scale evreything down now - should see greatly reduced traffic
b
You should be able to do this yourself from the dashboard I think, if not let me know.
d
will do!
2 Views