Not related to SST or Seed but with the knowledge in this gr Serverless Stack #general

Not related to SST or Seed but with the knowledge ...

Ross Coundon

08/03/2021, 6:15 PM

Not related to SST or Seed but with the knowledge in this group I thought I'd chance my arm on some DynamoDB help. We have a lambda that performs some GetItems and PutItems to a DynamoDB and we're seeing some weird performance issues. We're using Epsagon to gather the statistics. One of the operations results in >1000 GetItem calls. Another results in >1000 PutItem calls. Some invocations result in sub 100ms execution for both Get and Put. However, sometimes they're taking as much as 7000ms - sometimes it's all the Gets, sometimes it's all the Puts. If it's the Put that takes 7000ms then the Gets are fast, and vice versa. There is no throttling going on, the table is using On Demand billing and we're using eventually consistent reads. I'm at a loss to explain what's going on. Not sure it should be relevant but there's a stream attached to the table which is consumed by a separate lambda and item size is about 2Kb Can anyone recommend an avenue of investigation?

thdxr

08/03/2021, 6:18 PM

The invocation does both 1000 gets and 1000 puts every time? And you've verified when it's fast it's not doing a smaller number?

thdxr

08/03/2021, 6:19 PM

and you're sure when it's slow it's not the lambda cold starting?

thdxr

08/03/2021, 6:21 PM

Also how much memory is your lambda allocated?

Ross Coundon

08/03/2021, 6:47 PM

Lambda has 2Gb of memory, we have both fast and slow, cold start or not. Yeah, it's basically a batch data processor, calls a 3rd party external API which returns a big old XML doc and processes that data. It looks up some data in DDB for cached data as part of that processing, then writes some data.

Frank

08/03/2021, 6:51 PM

Just wanted to chime in, I had some weird DynamoDB issue last time, and posted in the AWS developers slack. And someone on the DDB team was able to take a deeper look.

Ross Coundon

08/03/2021, 6:59 PM

Yeah, I've posted on there too. Was hoping Alex DeBrie might chip in 😄

Ross Coundon

08/03/2021, 7:02 PM

Even better if a core team member does

Omi Chowdhury

08/03/2021, 8:34 PM

Could be networking between lambda and dynamo - maybe it sometimes takes a while to setup the http connection because of throttling somewhere along the route. Assuming there’s no VPC shenanigans

Omi Chowdhury

08/03/2021, 8:34 PM

Also curious if these BatchGetItem or GetItem calls?

Omi Chowdhury

08/03/2021, 8:36 PM

Do those latency spikes show up on the dynamodb metrics dashboard?

Ross Coundon

08/03/2021, 8:49 PM

We set keepAlive on the https.Agent as part of the dynamodb initialisation parameters, these are GetItem rather than BatchGetItem, I'll look into batch as a potential alternative, sound like it may be better suited. Yes, metrics shows the same

Omi Chowdhury

08/03/2021, 8:50 PM

ah ok then its probably internal to dynamodb…which is a black box

Frank

08/03/2021, 8:53 PM

Also worth checking if DynamoDB calls are getting retried internally?

Frank

08/03/2021, 8:53 PM

https://seed.run/blog/how-to-fix-dynamodb-timeouts-in-serverless-application.html

Frank

08/03/2021, 8:54 PM

We had occasional DDB timeouts previously b/c aws sdk keeps retrying.

Omi Chowdhury

08/03/2021, 8:54 PM

I use dataloader in all my lambdas (not just graphql) to convert GetItems to BatchGetItems - useful in cases like this where you’re mapping stuff (or a lambda is processing an SQS batch or something in parallel)

Ross Coundon

08/03/2021, 8:56 PM

Great info guys, thank-you

thdxr

08/04/2021, 2:57 AM

What is this aws slack you guys are referencing 👀

Ross Coundon

08/04/2021, 5:45 AM

DM'ed you an invite. It's the one set up by Corey Quinn, the Last Week in AWS guy, founder of Duckbill Group

Ross Coundon

08/04/2021, 7:28 AM

Here's a trace example - all of these are GetItems, first 4 are quick, then it jumps to 2s and increases up to 5s.

Frank

08/04/2021, 7:36 AM

Are the 1000 GetItems calls fired at the same time?

Ross Coundon

08/04/2021, 7:37 AM

Yes, although here there was only 30

Frank

08/04/2021, 7:38 AM

Yeah.. I also see something like this on our side too i think

Frank

08/04/2021, 7:39 AM

we nvr make more than 5 at the same time, so it’s not as bad as 2s

Ross Coundon

08/04/2021, 7:39 AM

interesting, and AWS shouts about consistent double digit ms read performance!

Frank

08/04/2021, 7:39 AM

I was thinking this might be network related? but I nvr looked into this

Ross Coundon

08/04/2021, 7:40 AM

That was where my thinking was going, we have functions in a VPC and a DDB endpoint in play

Frank

08/04/2021, 7:41 AM

are you consistently getting this issue?

Ross Coundon

08/04/2021, 7:45 AM

yeah

Ross Coundon

08/04/2021, 7:45 AM

Well, more accurately I'd say we are regularly getting this issue but it's not consistent

Frank

08/04/2021, 7:46 AM

Maybe try: • writing a test Lambda function that just makes 30 concurrent DDB calls (in a VPC and a DDB endpoint); if reproducible then try • writing a test Lambda function that just makes 30 concurrent DDB calls (without VPC and DDB endpoint); if reproducible then try • try GetItems on a non-existent key so response payload is tiny; if reproducible then try • set AWS SDK DynamoDB client retry to 0; if reproducible then try • set memory to 10GB; …

Frank

08/04/2021, 7:47 AM

lol until u get to the bottom

Ross Coundon

08/04/2021, 7:47 AM

😄 A good list of steps

Ross Coundon

08/04/2021, 7:48 AM

we've just specified the max retry and timeout options and now it's not finding the DDB endpoint!

Frank

08/04/2021, 7:54 AM

hmmm… that smells fishy 🕵️‍♂️

Ross Coundon

08/04/2021, 7:54 AM

Actually, it's not finding it for 1 in 30 GetItems

Thomas Ankcorn

08/04/2021, 1:13 PM

have you tried tweeking the max sockets param and seeing if that changes anything https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/node-configuring-maxsockets.html

Thomas Ankcorn

08/04/2021, 1:14 PM

what happens if you set it to like 500 or something?

Ross Coundon

08/04/2021, 1:31 PM

No, but worth a try, thank you

Thomas Ankcorn

08/04/2021, 1:32 PM

good luck

Thomas Ankcorn

08/06/2021, 10:44 AM

Did this help?

Ross Coundon

08/06/2021, 3:56 PM

I'm not sure yet, I'll definitely report back though when I am

Tomasz Sobczyk

09/22/2021, 2:56 PM

@Ross Coundon how did it go at the end?

Ross Coundon

09/22/2021, 3:00 PM

Unfortunately I haven't been able to properly test as the project where it was appearing is currently on an hiatus due to some upstream problems

Open in Slack

Previous Next