Has anybody wrote a way for developers to easily c...
# help
p
Has anybody wrote a way for developers to easily create sample data? And easily destroy it and recreate it? How do you do it?
g
Could do you explain better your usage scenario? Meanwhile You could take a look at sst.Script
p
Sure. I'm writing an application from scratch and to be able to write the the API and the frontend I need sample data. My app is a directory of questions (https://www.conversation.guru), so as a developer, when I create a dev stage, I need the database to have a few hundred questions for the app to behave realistically.
But pretty much every time I wrote an application I wrote a script that creates what I call "sample data", so that any developer can run it and get a variety of data that shows lots of different conditions and they are up and running quickly.
At some point, for this particular app, I might want to take snapshots from prod and use that, maybe.
Right now prod doesn't exist.
I am not sure how Script would help me exactly.
Right now I have the code written as a Lambda function that I can call from an API, but DynamoDB doesn't offer a good way of clearing the database.
So it looks like my current process for refreshing my sample data, something that I'm doing frequently right now, would would: 1. Stop SST. 2. Delete DynamoDB table. 3. Wait... wait... wait.... 4. Delete Storage stack. 5. Wait... wait... wait... 6. Run SST. 7. Call API to generate sample data. Pre-SST, working with a local PostgreSQL, I would just truncate it and it would be very fast.
g
Through sst.Script you could load data on deploy time. Honestly I didn’t well what you are trying to make, if it is a solution to perform integration testing or to generate a sandbox environment of your application. Anyway, get a snapshot of your data in production environment is something that I wouldn’t suggest in any possible scenario. Stages for me must be isolated from each other. If you need to load data/clean fast in dynamo, you could save some data in json files (stored in your codebase or in s3 if the data is heavy) and use batchwrite to write and to delete. BatchWrite is fast, you can write/delete up to 100 items through one request.
p
I had around 300 putitems and the 10 second timeout on dynamo wasn't enough. That's somewhat worrying. Well... I think that was the problem, I'm not 100% sure, since I got no error message.
If the sample data is loaded on deploy time, how do you cause the deployment of a stack that hasn't changed?
g
sst.Script couldn't be a solution for your specific problem. If you want load/clean the dataset when you need it you can declare a specific sst.Function for it. Make 300 put request is too much time expensive, you could perform 3 request to load or delete through batch operation
I use sst.script to load a dataset which is used to make some integration testing, or to load some asset to s3 bucket
p
Yeah, right now they are not batch operations to use the same "create" function the app itself would use, for consistency. I don't mind it take long, or running from my local machine. I just can't find the right building blocks here.
j
@Pablo Fernandez so is the problem the DynamoDB timeouts?
p
@Jay I didn't want to point to a particular technical problem, but the general problem of sample data generation for development.
Generally, for example, I have a function that creates a user. The tests for users use it, and the sample data generation also use it to create sample users, so code is shared. Sometimes it's the same function the system calls when a user is created. That's normally how I design things but I can be flexible.
The timeouts I was referring to were lambda timeouts, not dynamodb timeouts. And I'm just guessing, since I didn't see any timeout-errors. I just noticed the lambda function didn't finished and run for 10.4s, and when I changed the timeout to 20s it run for 20.4s and when I changed it to 30s it run for 30.4s. It's a bit worrying that DynamoDB seemed to be so slow, but one thing at a time.
j
@Frank any thoughts on this?
g
What is the average size of a single PutItem?
f
@Pablo Fernandez Yeah a couple of thoughts: 1. Are you using the
Script
construct? The
onCreate
function has 900s timeout. I wonder if you still see the timeout. 2. If you have a lot of data to seed the DynamoDB table, you can have the
onCreate
function spawn multiple lambda function, and each function seeds a chunk. 3. If you see the X-Ray id in your Lambda log, you can take a look at the X-Ray trace and see which
aws-sdk
call is taking long.
p
@gio each PutItem is tiny, less than 1k.
@Frank 1) no, I'm not using script, it's a REST API that I can call to regenerate the data. 2) onCreate function? Can you give me a bit more context, onCreate where? 3) I'll look into X-Ray.
f
Ah I see. Here’s what I’d do: 1. First look into what’s causing the timeout, and fix any issues there. 2. Then not place the Lambda behind a REST API because the API has a maximum timeout of 30s. You can just leave it as a standalone function ie.
new sst.Function(…)
, set the timeout to 300s. And trigger it through the SST Console. 3. If you want to automatically run this function on deploying to a new stage, you can use the
sst.Script
construct, and hook up this function to
onCreate
.
p
1. I'm not sure what you mean by causing the timeout. Do you mean what's causing this to take so long? that I don't know. The timeout is just the standard sst.Api timeout. 2. Ok. 3. Ah, I see.
g
1. it’s not clear for me, are you talking about latency or timeout? If it’s timeout you should be already inside lambda so what I would like to understand is which task (dynamo puts?) spend so much time.
1. If you don’t care about synchronicity of this operation, you can create an sst.Function as suggested by Frank and triggered inside API in “Event” mode, in this way api is closed fast but in background a lambda (with 300 sec expire) is processing
p
@gio the HTTP requests ends because of the sst.Api timeout.