So what are folks doing for data migrations (e.g. ...
# help
o
So what are folks doing for data migrations (e.g. schema changes) - esp if you’re running multiple DBs? Avoid them by writing backward compatible code? Write migration scripts? I ended up using the same tooling as the rest of the app (lambda, sqs, common libs etc) to build a stack that executes the migration. The process gets kicked off via the AWS lambda console, and once it finished I removed the associated resources from AWS.
t
I did a prototype with Prisma + Postgres where I created a custom resource that executed migrations
Copy code
export const trigger: CloudFormationCustomResourceHandler = async (
  event,
  _ctx
) => {
  if (event.RequestType === "Delete") return
  const result = execSync(
    `${process.execPath} /opt/nodejs/node_modules/prisma/build/index.js migrate deploy --preview-feature`,
    {
      env: {
        DATABASE_URL: Config.postgres.url(),
      },
    }
  )
  console.log(result.toString())
}
o
Ah interesting, don’t think I’ve used an ORM/framework with built in migrations since my cakephp days in high school lol. Seems like this is an area where tooling is sorely lacking for serverless apps and serverless friendly DBs.
Don’t think I’ve seen much tooling for my use case: I needed to change the format of an indexed attribute in dynamo, and the value came from somewhere else in dynamo. So essentially I needed to do a
Copy code
foreach tenant
  foreach sub-object
    foreach sub-sub-object
      query for value
      save value
where each foreach is a paginated dynamo query that could be pretty large
So I wrote a few lambdas stitched together with SQS so that the result of each foreach would go into the queue and fan out each step
t
That makes sense, I've been thinking about something similar for dynamo
Definitely room for some tooling when dealing with full scan migrations
f
Curious.. are you guys thinking about rollbacks as well?
t
I liked the model of doing it via a custom resource because none of my new code gets deployed until it's executed. That way my function doesn't need to be backwards compatible, only the schema change
No, I don't mind not having rollbacks because data migrations should be backwards compatible for the most part
a
I am using mongodb and currently running the migrations manually using
mongo-migrate-ts
. I definitely want to plug it into my deployments via sst and seed but currently I’ve no idea how.
o
Haven’t cared about roll backs. I make sure the migration is idempotent, so that if it hits an issue I can re-run it after handling that issue
f
@Omi Chowdhury @thdxr I’m trying to put something together to help solve data migration. Curious did you choose to run the migration inside a Lambda. Why not directly from ur local?
t
When deploying to production it seemed more appropriate for everything to run not dependent on anyone's local machine. If you're using postgres and its in a VPC your CI environment might not even have access to talk to it
f
The reason i’m asking is I’m trying decide if SST should create a custom resource to run migration vs run it directly before or after
sst deploy
(kind of like adding a before deploy/after deploy hook).
t
I also wanted to make it so no new code is deployed if migrations fail. That way I can still write code assuming the data is in its new form and not have to do two deployments migration first then code
f
hmm.. VPC is a really good point!
especially when the DB is in an isolated subnet..
o
Yeah my last project I had a migration script that ran locally (technically in a VM so that my prod AWS keys were separated from dev). That worked fine for that project because the amount of data was quite low so migrations didn’t take very long
New project has a couple of orders of magnitude more data so being able to run things is parallel was key
Agree there’s not much difference than running between local and single lambda - I wanted to fan out multiple times and parallelise things
f
I see. Thanks for the details @Omi Chowdhury. If SST had something like:
Copy code
new sst.Run(this, "DBMigration", {
  script: "src/migrations/script.main",
});
It basically creates a Lambda function with the script code and a CloudFormation custom resource that runs the Lambda function on CloudFormation deployment.
Does it help you run your data migrations? Or are you looking for ways to simplify the fanout?
o
It would run it, but tbh more comfortable running it from the lambda console - so that I can run it multiple times. I usually run it twice and expect to see that the migration found no data that needed to be migrated in the second run
f
I guess you can code that into the script. ie. it does a scan and update each item, and then does another scan to verify 0 results found.
o
The problems I’d be looking to have solved is tracking which migrations were have been run and when, what the results were, being able to fan out work, but also track how those tasks are doing. Like a typical example would be - I want to run this migration of every tenant. Oh wait 3% of tenant’s failed to finish the migration - because they’re super old and have a different format to their data than expected. I write an updated migration, deploy it and run the migration again. it only runs on the 3% that fails, identifying that the migration isn’t needed on the other tenants. Would be amazing if the migration script could run in read-only mode and identify and store where there would be failures in a dry run (probably through failed assertions, a read-only flag and some way to return data and store that)
Right a lot of it is in the script, with a migration lib that helps implement some of the common stuff
Whether that’s a job for SST 🤷🏾‍♂️
f
tbh I have the exact same issue as what you are describing 🤣
What you described isn’t “best practice”, as you shouldn’t trigger one off Lambdas in your production environment like that, But I’m sure a ton of ppl are doing that.
Let me open an issue with what you described. Something we definitely want to solve at some point.
o
Yeah manually triggering lambdas is defo a bit suspect, I guess the best would be a migrations service that manages them, with its own backing storage and external API
On the flip side doing it that way means I can bring all the tooling I have for excellent code used in the rest of my app to migrations