if we figure out how to achieve blue / green or ca...
# sst
a
if we figure out how to achieve blue / green or canary deployments, we’ll break the serverless and cloud native world. 😅
f
I’m curious which setup do u have in mind? Common patterns i’ve come across: 1. Using Lambda versions 2. Using API Gateway stages (rare) 3. Using Route53
a
I’d try my best to avoid option 1 and 2. My preferred option would be 3. I’ll provide some basis for my reasoning. Reasoning for avoiding options 1 and 2: • bloats the stack definition with additional routes, roles, etc. (opt 1 and opt 2) • route auth and other config customisability would be limited (opt 1) • easier to trigger individual API Gateway limits (opt 1 and opt 2) • still a single point of failure if the region goes down (opt 1 and opt 2) • cannot rate-limit both versions separately due to shared limits per resource (opt 1 and opt 2) • would increase the deploy time sequentially rather than in parallel (opt 1 and opt 2) I understand some of the stack level limits could be avoided by splitting the stack but it still would be bogged down by individual resource level limits such as number of routes per API, rate-limit per account, etc. My ideal canary deployment would look like as follows: • configurable AWS account and region per canary to avoid account and resource level limits • check for resources that can not be created in that AWS region // would be mostly runtime but compile time possibility would break the www. • support for parameterised replication of an existing infra by using the account and region for that canary. (Useful for cross-region and cross-account deployments, this might need to be looked at SEED level for infra without any replication support) • data replication / migration with ETL support (some services have native support, many do not, consider if we should rely on native support or just build everything from scratch. For ex: DynamoDB global tables allow to write only at the region where they were created and read from everywhere else, what if the region to be written to is unavailable?) • cross-region load balancing via route 53 based on health-checks / proximity / round-robin or any other metric • Ability to import existing constructs and modify them individually for a canary (For ex: modify route’s timeout / authorizers / remove a route / add a route, configure queue / bucket consumers, etc) My knowledge and experience to define something as revolutionary as this is too limited and so we should rather create a discussion and call-to-arms all our experts on GitHub and keep track of it there. I have big dreams lol! For a narrow short term goal lets focus more on my use cases :- • Serve users from closer to their locations, so use API Gateway from a region close to them preferably without Edge Functions or Cloudfront cache as they restrict you further. This also allows me to not cross account level limits for that region since I’d use different accounts for different regions. • Bypass account / region level limits and get access to services that might not be available in a certain region and communicate to those services across regions via events, etc. For ex: AWS CloudSearch which isn’t available in
ap-south-1
but is available
ap-southeast-1
. This is still doable by creating a separate stack for that region and that’s what I’m going to do immediately. • Create a feature specific canary which would route some % of traffic to this specific canary. I’ve no idea how this can be done but would be great for steadily releasing new features, A / B testing, etc. • logs and metrics aggregation for all regions /canaries, for a specific region / canary, etc. • Rerouting partial or complete traffic from a AWS region based on specific criteria such as health-checks, latency, errors, etc. I know I can do this by writing a lot of code, conditional logic, parameterised stacks, etc. but much of this is generic and most major platforms build this from scratch. I do understand that this also depends on how someone decides to architect their application and would greatly differ from person to person. I’m targeting the event-sourcing approach if that helps. A globally available API that is low-latency, highly-customisable, and fault-tolerant is the dream here.
This sounds crazy lol! 🤣🤯