Not related to SST or Seed but with the knowledge ...
# general
r
Not related to SST or Seed but with the knowledge in this group I thought I'd chance my arm on some DynamoDB help. We have a lambda that performs some GetItems and PutItems to a DynamoDB and we're seeing some weird performance issues. We're using Epsagon to gather the statistics. One of the operations results in >1000 GetItem calls. Another results in >1000 PutItem calls. Some invocations result in sub 100ms execution for both Get and Put. However, sometimes they're taking as much as 7000ms - sometimes it's all the Gets, sometimes it's all the Puts. If it's the Put that takes 7000ms then the Gets are fast, and vice versa. There is no throttling going on, the table is using On Demand billing and we're using eventually consistent reads. I'm at a loss to explain what's going on. Not sure it should be relevant but there's a stream attached to the table which is consumed by a separate lambda and item size is about 2Kb Can anyone recommend an avenue of investigation?
t
The invocation does both 1000 gets and 1000 puts every time? And you've verified when it's fast it's not doing a smaller number?
and you're sure when it's slow it's not the lambda cold starting?
Also how much memory is your lambda allocated?
r
Lambda has 2Gb of memory, we have both fast and slow, cold start or not. Yeah, it's basically a batch data processor, calls a 3rd party external API which returns a big old XML doc and processes that data. It looks up some data in DDB for cached data as part of that processing, then writes some data.
f
Just wanted to chime in, I had some weird DynamoDB issue last time, and posted in the AWS developers slack. And someone on the DDB team was able to take a deeper look.
r
Yeah, I've posted on there too. Was hoping Alex DeBrie might chip in 😄
Even better if a core team member does
o
Could be networking between lambda and dynamo - maybe it sometimes takes a while to setup the http connection because of throttling somewhere along the route. Assuming there’s no VPC shenanigans
Also curious if these BatchGetItem or GetItem calls?
Do those latency spikes show up on the dynamodb metrics dashboard?
r
We set keepAlive on the https.Agent as part of the dynamodb initialisation parameters, these are GetItem rather than BatchGetItem, I'll look into batch as a potential alternative, sound like it may be better suited. Yes, metrics shows the same
o
ah ok then its probably internal to dynamodb…which is a black box
f
Also worth checking if DynamoDB calls are getting retried internally?
We had occasional DDB timeouts previously b/c aws sdk keeps retrying.
o
I use dataloader in all my lambdas (not just graphql) to convert GetItems to BatchGetItems - useful in cases like this where you’re mapping stuff (or a lambda is processing an SQS batch or something in parallel)
r
Great info guys, thank-you
t
What is this aws slack you guys are referencing 👀
r
DM'ed you an invite. It's the one set up by Corey Quinn, the Last Week in AWS guy, founder of Duckbill Group
Here's a trace example - all of these are GetItems, first 4 are quick, then it jumps to 2s and increases up to 5s.
f
Are the 1000 GetItems calls fired at the same time?
r
Yes, although here there was only 30
f
Yeah.. I also see something like this on our side too i think
we nvr make more than 5 at the same time, so it’s not as bad as 2s
r
interesting, and AWS shouts about consistent double digit ms read performance!
f
I was thinking this might be network related? but I nvr looked into this
r
That was where my thinking was going, we have functions in a VPC and a DDB endpoint in play
f
are you consistently getting this issue?
r
yeah
Well, more accurately I'd say we are regularly getting this issue but it's not consistent
f
Maybe try: • writing a test Lambda function that just makes 30 concurrent DDB calls (in a VPC and a DDB endpoint); if reproducible then try • writing a test Lambda function that just makes 30 concurrent DDB calls (without VPC and DDB endpoint); if reproducible then try • try GetItems on a non-existent key so response payload is tiny; if reproducible then try • set AWS SDK DynamoDB client retry to 0; if reproducible then try • set memory to 10GB; …
lol until u get to the bottom
r
😄 A good list of steps
we've just specified the max retry and timeout options and now it's not finding the DDB endpoint!
f
hmmm… that smells fishy 🕵️‍♂️
r
Actually, it's not finding it for 1 in 30 GetItems
t
have you tried tweeking the max sockets param and seeing if that changes anything https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/node-configuring-maxsockets.html
what happens if you set it to like 500 or something?
r
No, but worth a try, thank you
t
good luck
Did this help?
r
I'm not sure yet, I'll definitely report back though when I am
t
@Ross Coundon how did it go at the end?
r
Unfortunately I haven't been able to properly test as the project where it was appearing is currently on an hiatus due to some upstream problems