I have a very small data set ~2MB total across ~100 items in Serverless Stack #random

I have a very small data set (~2MB total across ~1...

Seth Geoghegan

02/17/2022, 5:42 PM

I have a very small data set (~2MB total across ~100 items) in DynamoDB with a service that is fetching the data at fairly high velocity. That data changes very infrequently once it's put into DynamoDB. Because the data changes infrequently, I'm wondering if another strategy is warranted (e.g. a cache, store the data in S3, something else)?

thdxr

02/17/2022, 5:42 PM

Could even consider bundling it with your function

thdxr

02/17/2022, 5:43 PM

S3 is a good choice we well

Seth Geoghegan

02/17/2022, 5:43 PM

Yeah, that was my first inclination. Just read the damn thing into memory and be done with it

thdxr

02/17/2022, 5:43 PM

Functions should cache it all situations if change is infrequent. But eviction can be tricky

thdxr

02/17/2022, 5:43 PM

Bundling it in the function solves that issue since it'll be part of a deploy

Seth Geoghegan

02/17/2022, 5:44 PM

yeah, the team that is building this came to me with questions about using DAX, which seems wayyy overkill for this use case

thdxr

02/17/2022, 5:46 PM

Yeah definitely overkill

Akos

02/17/2022, 8:36 PM

Not sure if you're main concern is about hitting Dynamo read throughput, but another option (probably way cheaper than DAX) is to use partition key sharding: https://www.dynamodbguide.com/leaderboard-write-sharding/ Quite simple to implement and can give you "unlimited" throughput with enough shards. Writes become slightly more complex, but reads just read a random shard.

Seth Geoghegan

02/17/2022, 8:52 PM

@Akos You make a really going point re: hitting Dynamo vs read throughput. I asked the very same question to the team. It seems like they want to avoid hitting DDB so frequently for data that rarely changes. Also, I love the write sharding technique 🙂

Akos

02/17/2022, 9:04 PM

I mean short of embedding the data into the lambda as Dax recommended, you need to load the data from somewhere. It then just becomes a question of how quick you need the data and at what cost. Was curious at the S3 vs DynamoDB costs: S3 is cheaper than DynamoDB when simulating 100 reads per second and 1 write per hour. AFAIK DynamoDB has lower latency, but haven't tested that yet.

Seth Geoghegan

02/17/2022, 9:33 PM

I'm pushing for more data about the usage. It's not clear to me that we need to anything at all. Perhaps we don't need to do much more than caching the response to the DDB request outside the handler, but local to the lambda. The data would be "cached" for the duration of the warmed lambda, which might be enough 🤷

Seth Geoghegan

02/17/2022, 9:34 PM

I don't like implementing caching before I have evidence there is a reason to do so

Seth Geoghegan

02/17/2022, 9:35 PM

You make a good point, that data needs to come from somewhere

4 Views

Open in Slack

Previous Next