So what are folks doing for data migrations e g schema chang Serverless Stack #help

So what are folks doing for data migrations (e.g. ...

Omi Chowdhury

06/22/2021, 2:43 AM

So what are folks doing for data migrations (e.g. schema changes) - esp if you’re running multiple DBs? Avoid them by writing backward compatible code? Write migration scripts? I ended up using the same tooling as the rest of the app (lambda, sqs, common libs etc) to build a stack that executes the migration. The process gets kicked off via the AWS lambda console, and once it finished I removed the associated resources from AWS.

thdxr

06/22/2021, 2:50 AM

I did a prototype with Prisma + Postgres where I created a custom resource that executed migrations

thdxr

06/22/2021, 2:51 AM

Copy code

export const trigger: CloudFormationCustomResourceHandler = async (
  event,
  _ctx
) => {
  if (event.RequestType === "Delete") return
  const result = execSync(
    `${process.execPath} /opt/nodejs/node_modules/prisma/build/index.js migrate deploy --preview-feature`,
    {
      env: {
        DATABASE_URL: Config.postgres.url(),
      },
    }
  )
  console.log(result.toString())
}

Omi Chowdhury

06/22/2021, 3:01 AM

Ah interesting, don’t think I’ve used an ORM/framework with built in migrations since my cakephp days in high school lol. Seems like this is an area where tooling is sorely lacking for serverless apps and serverless friendly DBs.

Omi Chowdhury

06/22/2021, 3:02 AM

Don’t think I’ve seen much tooling for my use case: I needed to change the format of an indexed attribute in dynamo, and the value came from somewhere else in dynamo. So essentially I needed to do a

Copy code

foreach tenant
  foreach sub-object
    foreach sub-sub-object
      query for value
      save value

where each foreach is a paginated dynamo query that could be pretty large

Omi Chowdhury

06/22/2021, 3:03 AM

So I wrote a few lambdas stitched together with SQS so that the result of each foreach would go into the queue and fan out each step

thdxr

06/22/2021, 3:39 AM

That makes sense, I've been thinking about something similar for dynamo

thdxr

06/22/2021, 3:39 AM

Definitely room for some tooling when dealing with full scan migrations

Frank

06/22/2021, 3:40 AM

Curious.. are you guys thinking about rollbacks as well?

thdxr

06/22/2021, 3:40 AM

I liked the model of doing it via a custom resource because none of my new code gets deployed until it's executed. That way my function doesn't need to be backwards compatible, only the schema change

thdxr

06/22/2021, 3:41 AM

No, I don't mind not having rollbacks because data migrations should be backwards compatible for the most part

Ashishkumar Pandey

06/22/2021, 4:08 AM

I am using mongodb and currently running the migrations manually using

mongo-migrate-ts

. I definitely want to plug it into my deployments via sst and seed but currently I’ve no idea how.

Omi Chowdhury

06/22/2021, 2:21 PM

Haven’t cared about roll backs. I make sure the migration is idempotent, so that if it hits an issue I can re-run it after handling that issue

Frank

06/23/2021, 4:09 AM

Open an issue to track this topic https://github.com/serverless-stack/serverless-stack/issues/483

Frank

06/24/2021, 8:27 PM

@Omi Chowdhury @thdxr I’m trying to put something together to help solve data migration. Curious did you choose to run the migration inside a Lambda. Why not directly from ur local?

thdxr

06/24/2021, 8:28 PM

When deploying to production it seemed more appropriate for everything to run not dependent on anyone's local machine. If you're using postgres and its in a VPC your CI environment might not even have access to talk to it

Frank

06/24/2021, 8:28 PM

The reason i’m asking is I’m trying decide if SST should create a custom resource to run migration vs run it directly before or after

sst deploy

(kind of like adding a before deploy/after deploy hook).

thdxr

06/24/2021, 8:30 PM

I also wanted to make it so no new code is deployed if migrations fail. That way I can still write code assuming the data is in its new form and not have to do two deployments migration first then code

Frank

06/24/2021, 8:33 PM

hmm.. VPC is a really good point!

Frank

06/24/2021, 8:34 PM

especially when the DB is in an isolated subnet..

Omi Chowdhury

06/24/2021, 9:44 PM

Yeah my last project I had a migration script that ran locally (technically in a VM so that my prod AWS keys were separated from dev). That worked fine for that project because the amount of data was quite low so migrations didn’t take very long

Omi Chowdhury

06/24/2021, 9:45 PM

New project has a couple of orders of magnitude more data so being able to run things is parallel was key

Omi Chowdhury

06/24/2021, 9:46 PM

Agree there’s not much difference than running between local and single lambda - I wanted to fan out multiple times and parallelise things

Frank

06/24/2021, 10:14 PM

I see. Thanks for the details @Omi Chowdhury. If SST had something like:

Copy code

new sst.Run(this, "DBMigration", {
  script: "src/migrations/script.main",
});

It basically creates a Lambda function with the script code and a CloudFormation custom resource that runs the Lambda function on CloudFormation deployment.

Frank

06/24/2021, 10:15 PM

Does it help you run your data migrations? Or are you looking for ways to simplify the fanout?

Omi Chowdhury

06/24/2021, 10:19 PM

It would run it, but tbh more comfortable running it from the lambda console - so that I can run it multiple times. I usually run it twice and expect to see that the migration found no data that needed to be migrated in the second run

Frank

06/24/2021, 10:25 PM

I guess you can code that into the script. ie. it does a scan and update each item, and then does another scan to verify 0 results found.

Omi Chowdhury

06/24/2021, 10:25 PM

The problems I’d be looking to have solved is tracking which migrations were have been run and when, what the results were, being able to fan out work, but also track how those tasks are doing. Like a typical example would be - I want to run this migration of every tenant. Oh wait 3% of tenant’s failed to finish the migration - because they’re super old and have a different format to their data than expected. I write an updated migration, deploy it and run the migration again. it only runs on the 3% that fails, identifying that the migration isn’t needed on the other tenants. Would be amazing if the migration script could run in read-only mode and identify and store where there would be failures in a dry run (probably through failed assertions, a read-only flag and some way to return data and store that)

Omi Chowdhury

06/24/2021, 10:26 PM

Right a lot of it is in the script, with a migration lib that helps implement some of the common stuff

Omi Chowdhury

06/24/2021, 10:27 PM

Whether that’s a job for SST 🤷🏾‍♂️

Frank

06/24/2021, 10:32 PM

tbh I have the exact same issue as what you are describing 🤣

Frank

06/24/2021, 10:34 PM

What you described isn’t “best practice”, as you shouldn’t trigger one off Lambdas in your production environment like that, But I’m sure a ton of ppl are doing that.

Frank

06/24/2021, 10:35 PM

Let me open an issue with what you described. Something we definitely want to solve at some point.

Omi Chowdhury

06/24/2021, 10:41 PM

Yeah manually triggering lambdas is defo a bit suspect, I guess the best would be a migrations service that manages them, with its own backing storage and external API

Omi Chowdhury

06/24/2021, 10:42 PM

On the flip side doing it that way means I can bring all the tooling I have for excellent code used in the rest of my app to migrations

Open in Slack

Previous Next