ohhhhh…. just saw this now… <https://docs.serverle...
# help
d
ohhhhh…. just saw this now… https://docs.serverless-stack.com/constructs/NextjsSite Did you guys find a workaround to the SSR lambda@edge that causes stack delete to fail? It’s our main issue with using NextJS for our sites, since we want to be able to fire up and tear down quickly.
Doesn’t look like it. Would love to see something that could be done to
retain
and then use async cleanup to delete orphan functions 2hrs later. Is this something one could use EventBridge or CloudTrail triggered Functions for?
t
I do believe you can receive cloudformation events through the default EventBridge bus
Great use case
f
Hey @Dan Van Brunt, currently no. One thought I had would be to mark the Lambdas to
retain
, and then have custom resources periodically retry until they are successfully removed. The downside being the stack could take hours to remove, but will eventually remove successfully. This is similar to what you suggested. What do you think?
d
I think there is a 30min max to CFN or CRs so I think its off the table.
I was thinking of just retaining the lambdas so the stack (minus the replicas + main function) can be deleted normally (fast)…. then just have some kinda orphan function cleanup, completely outside of that stack.
f
I think you can have async CRs, ie. have a step function to periodically retry, and hits the CFN api to signal the CR is completed.
d
Ya, I think you can do that… but I think a single CFN deploy…. has a 30min max
I remember this being the case for either CFN or CRs about a year ago. It wasn’t easy to find in the docs either.
t
^ I think I ran into this as well. I had a broken CR and after 30min aws killed it
f
I see.. I was reading the CDK docs on the different ways to implement CRs https://docs.aws.amazon.com/cdk/api/latest/docs/core-readme.html#custom-resource-providers
And they claim the
custom-resource.Provider
(they create a step function internally) has Unlimited timeout.
Maybe they mean the CR has unlimited timeout, but the entire Stack has 30min timeout?
d
no… I think you might be right… I found
AWS::CloudFormation::WaitCondition
has a max time of 12 hours…. so CFN must not have this limitation.
Not sure where I got that idea then.
t
What happens if the CR returns immediately but kicks off some job, CF wouldn't be aware of that right
The 30min comes from the CR function not returning in 30min and CF marks the stack as failed
d
ah…. so that isn’t a think now though @thdxr?
f
Ah I see. In this case, the CR would return something right away, like a token.
d
or are you agreeing that the limit is still 30mins on CRs?
t
I think you can do this but just not sure what would own the stepfunction and how it would clean up itself 😵 . Unless we had another stack along with it
d
could you not just trigger something to happen in 2hrs? EventBus, StepFunctions etc?
I was thinking another stack.
t
It might make sense to have a standard SST stack just like how there's a standard CDK stack
d
ya something like that would allow for various cleanup things
I wasn’t yet thinking about SST…. but just that “another stack” could be used to handle the cleanup.
Mores the better though if there was an optional SST stack and have construct able to use it optionally. Like NextJS construct would do what it does now, without it… and WITH it would retain and handle the deletion.
If this is of interest… I wouldn’t mind lending a hand to setup this cleanup function / function caller.
t
yeah will let frank give his pov - I'm just talking without really knowing anything about the underlying problem 😄
f
Give me 10min to wrap up something and I will share some thoughts.
Documented the 2 solutions above in this issue, with some pros/cons on top of my head.
Personally, I lean towards Solution 1, the custom resource approach. Mainly b/c the removal logic is self-contained in the stack.
Ideally we want to minimize the amount of “SST resources” deployed into a user’s account. So I think let’s introduce an
SST Toolkit
stack when we absolutely have to 🤔
Open for thoughts (cc’ing @Jay in case he wants to chime in)
d
@Frank isn’t that kinda what those debug stacks are? would it not be another thing that could move to a toolkit stack and reuse a single stack to debug all deployed stacks?
f
Yeah, I think we’ll definitely move towards the toolkit stack architecture (we had some thoughts on deploying resources to help with monitoring/debugging). But as for the Next.js construct, I’m curious why you prefer solution 2 over 1? Is it b/c the CR solution will make the removal process take hours?
d
@Frank oh sorry, I didn’t see the link to what ticket had your thoughts in it?
Just noticed ur comment. I will follow up in the issue