I'd be interested to here what people do for produ...
# cfml-general
n
I'd be interested to here what people do for production DB migrations in a containerised environment. i.e, you have an AWS codepipeline which does a build -> blue/green deployment, where there are migrations that need to be run. As I see it, you could • trigger the migration from the container itself when the application starts : however that can create a race condition if you have multiple containers • As above, but set a database flag that a migration has started, so the other containers don't try to run it • As above but scale back the fargate service to 1 container before the deployment and then scale back up after the migration has completed • trigger the migration earlier in the build process, where you might do it after the build process but before deployment: however that means you need to have a database connection in your build environment, and you're going to have containers running in production against a database that has been migrated but the application hasn't been deployed yet • Add a new stage before deployment to solely spin up a container, run the DB migrations and exit. There's still a window where the application is running against a migrated database but the new application hasn't been deployed yet. • Do a blue/green deployment on the database as well, so you can run the migrations on the new database before the application is deployed - I have no idea how you'd do this with RDS / and what would happen to the existing in use data. • Just take stuff offline and deal with the downtime. OR, is the answer simply "ensure your migration is always backwards compatible with the current application code so it doesn't matter when the migration happens". Is there an avenue I'm missing?
👀 1
s
We "ensure [the] migration is always backwards compatible with the current application code" and then "trigger the migration earlier in the build process" effectively. We have an automated deployment for database migrations and we run it to completion before we deploy any application upgrades that depend on the DB changes.
We track migrations in a table, and the migration machinery figures out which migrations are missing and runs them. In dev/test we can start with an empty DB and it'll run all the migrations to bring the DB current. We've been doing that for years (originally in CF, back when a lot of our deployment was somewhat manual, and now all in Clojure with fully-automated deployments). Our QA DB is up to migration 973 right now and production is at 972.
This also ensures that if the DB migration is run more than once, the second and subsequent runs will do nothing until a new migration file is added.
d
It's always "it depends", but if you can, I think the best practice is to do the DB too. It's more complicated with blue/green than just dev/qa/prod, but that's the price you pay for choosing blue/green I guess.
s
For us, the complexity of replication and the sheer volume of data makes blue/green pretty much impossible for our DB -- we have plenty of tables with over 100M rows and some have over 250M rows.
🤘 1
d
RDS is fantastic for this though! You can clone a massive DB super fast now, and there's lots of tools for moving data around. If you're doing rolling stuff you probably have a datalake or whatnot, so are dealing with consolidating data slurped in from all over
RDS is incredibly fast at cloning massive amounts of data and it just keeps getting faster
You have trade-offs for everything, and "messy" data is one you have to eat if you're mixing potentially bad deployments with good, I don't think there's a way around that. If you're at that scale you're going to be doing some data massaging anyways— from special database instances just for reports, to picking what kind of storage for what kind of data. The nice thing about dev/qa/prod (or as many as you need, but separate, including the DB) is you don't have to worry about messy data as much. You run the env, do the tests, have clients do some tests, and then afterwards all that data is ignored as you do the prod deploy. This also lets your clients go nuts in testing stuff, etc., and makes handling tickets far easier. So much is relative to what you have, and what you want to have, it's hard to say what's best, but FWIW where RDS shines is in cloning massive DBs and whatnot— though there is lots of vendor lock-in at that point, you've probably already bought into AWS so… "in for a penny, in for 10 billion dollars unexpectedly on your next bill", as they say 😜
Reporting requirements alone can vastly change what approaches are optimal, or indeed required, and you always have to worry about handling failover so you don't lose data. And while I was joking about it, the cost thing is nothing to sneeze at either, especially with tons of data— how you decide to do stuff is going to vary in cost perhaps in the order of hundreds of thousands to million(s), and easily tens of thousands, and probably for sure thousands, of dollars. You gotta factor all that stuff in when laying things out too.
n
Thanks both - all very interesting. I think the takeaway thus far is to "just write the migrations better" (i.e more defensively) and make sure the codebase reflects that, and also definitely split the migration layer as it's own stage in the deployment process. We're already doing tracked migrations in the DB etc, so it's just a matter of getting the order right, and being more strict when a migration comes along about possible side effects
d
More often than not schema syncing is tied to application initialization (with locking and whatnot to prevent race conditions) for good reason, but so much depends on your deployment pipeline, and what all is using what, and who does QA on what. We had great luck with [insert the name of the sql migration tool that won a contest for being most useful cfml lib here] which basically does what Sean said above: a table to know what migrations have been run, and a directory full of numbered migrations to run if needed (and locking to prevent race conditions) This is basically how every tool I've worked with does it, be it Liquibase (the 800lb gorilla) or that CFC I'm drawing a blank on. The "caching" problem is one of "the 2 hard things in programming" because dependency management is difficult. I don't know how you're doing your tests, but testing the migration should be one of them (on a copy of the prod DB) prior to deployment. This is super easy to do on AWS RDS, even with hundreds of gigs of data (takes like 10 minutes to clone the prod db and run the tests even with just oodles of records) and thousands of migration files (at one place over half our dev time was spent just doing data migration stuff but everything was clockwork). Do you also have a dev/QA pipeline, or is it just always rolling deployments?
n
We have a separate dev -> beta -> prod environments/pipelines, so all the migrations would go through at least two iterations before hitting production. However the dev / beta environments just have single application containers. All the pipelines are triggered by pushing to the relevant branches. I think we just either need to a) put additional locking in when a migration is running on a container or b) just move the migration to a step before deployment, which would ensure only one process is acting on it at any given time.
d
Ah, right on. You could probably also do something with load-balancing, depending on how you're managing that, to control what instance is getting hit when.