I have a very small data set (~2MB total across ~1...
# random
s
I have a very small data set (~2MB total across ~100 items) in DynamoDB with a service that is fetching the data at fairly high velocity. That data changes very infrequently once it's put into DynamoDB. Because the data changes infrequently, I'm wondering if another strategy is warranted (e.g. a cache, store the data in S3, something else)?
t
Could even consider bundling it with your function
S3 is a good choice we well
s
Yeah, that was my first inclination. Just read the damn thing into memory and be done with it
t
Functions should cache it all situations if change is infrequent. But eviction can be tricky
Bundling it in the function solves that issue since it'll be part of a deploy
s
yeah, the team that is building this came to me with questions about using DAX, which seems wayyy overkill for this use case
t
Yeah definitely overkill
a
Not sure if you're main concern is about hitting Dynamo read throughput, but another option (probably way cheaper than DAX) is to use partition key sharding: https://www.dynamodbguide.com/leaderboard-write-sharding/ Quite simple to implement and can give you "unlimited" throughput with enough shards. Writes become slightly more complex, but reads just read a random shard.
s
@Akos You make a really going point re: hitting Dynamo vs read throughput. I asked the very same question to the team. It seems like they want to avoid hitting DDB so frequently for data that rarely changes. Also, I love the write sharding technique 🙂
a
I mean short of embedding the data into the lambda as Dax recommended, you need to load the data from somewhere. It then just becomes a question of how quick you need the data and at what cost. Was curious at the S3 vs DynamoDB costs: S3 is cheaper than DynamoDB when simulating 100 reads per second and 1 write per hour. AFAIK DynamoDB has lower latency, but haven't tested that yet.
s
I'm pushing for more data about the usage. It's not clear to me that we need to anything at all. Perhaps we don't need to do much more than caching the response to the DDB request outside the handler, but local to the lambda. The data would be "cached" for the duration of the warmed lambda, which might be enough 🤷
I don't like implementing caching before I have evidence there is a reason to do so
You make a good point, that data needs to come from somewhere