Has anybody seen issues with components of a stack...
# help
p
Has anybody seen issues with components of a stack just randomly getting updated or even deleted without making changes to the stack? I just had my ec2.BastionHostLinux instance terminated randomly with zero changes to the stack. Made a small update to a lambda function and before I know it my Bastion Host is gone. I’m not even sure how to get back into a good state as I didn’t change the stack code. Any advice for debugging? For context, this is probably the 20th deploy I’ve done since creating that bastion host.
t
Hm that's bizarre
it's no longer in your cfn template?
p
Yes, it was removed
the security group is still there…
I’m in the process of updating sst to see if that fixes it…
t
I'm not sure if this is from SST as we just rely on CDK and CFN for things like this
Is there any conditional code in your stacks ?
p
this is in the root of the stack:
Copy code
const bastionHostLinux = new ec2.BastionHostLinux(this, 'BastionHostLinux', {
      vpc: vpc,
      securityGroup: bastionSecurityGroup,
      subnetSelection: {
        subnetType: ec2.SubnetType.PUBLIC,
      },
    });
no conditionals
I even have outputs that rely on it:
Copy code
this.addOutputs({
      BastionHostinstance: bastionHostLinux.instancePublicDnsName,
...
})
The outputs actually returned too, but the instance was terminated
Is there anyway to get an easy to parse changeset from CDK?
they are so verbose
f
Hey @Patrick Gold, just following up on this. Did you manage to find the culprit?
p
Nope. Never found the root cause, but updating cdk and SST brought back the bastion host without changing the stack
f
hmm.. if this ever happens again, here’s how i’d debug: • Open up
.build/cdk.out
and see if the EC2 instance is still in the template. If it is not, the issue is likely within SST/CDK. • Then, go to CloudFormation console and find the stack, open up the Resources tab and see if the instance is still in there, and what state it is in. If it’s in
CREATE_COMPLETE
or
UPDATE_COMPLETE
it means CFN didn’t remove it. • Then, go to CloudTrail and look through recent logs and you should see the IAM user/role that turned off the instance.
p
Thanks Frank. Some extra info is that I had my Production bastion host termination protection enabled and when it deployed, it failed to delete. After I updated cdk/sst, it created a brand new instance, but my old bastion host is still running and seemingly not connected to the stack
I have a todo to go and manually remove that and update connection strings etc.
f
I see. Appreciate the details!
p
@Frank of course this happens when I don’t necessarily have time to debug, but a similar thing happened today where most of my resources seemingly required and update but I’m not sure why. (No changes to stack). This also triggered the deletion and recreation of the bastion host again.
UPDATE_IN_PROGRESS | AWS::EC2::Instance | BastionHostLinuxD318DEDE | Requested update requires the creation of a new physical resource; hence creating one.
UPDATE_IN_PROGRESS | AWS::EC2::Instance | BastionHostLinuxD318DEDE | Resource creation Initiated
UPDATE_COMPLETE | AWS::EC2::Instance | BastionHostLinuxD318DEDE
DELETE_IN_PROGRESS | AWS::EC2::Instance | BastionHostLinuxD318DEDE
Not really sure how to debug why this randomly happens
Could this be triggered by someone making changes to the EC2 bastion host in the AWS console and not through CDK?
f
@Patrick Gold sorry for the late followup. I’m curious what needed to update in the
AWS::EC2::Instance
that required to replace the resource.
Can you always run
sst diff
prior to
sst deploy
until we track down the issue?
by running
sst diff
that will show us what has changed since last deploy.
Oh btw.. r u deploying through a CI or locally?
p
Yeah that's the issue. this happens on CI before I even can debug. I can add sst diff to run every deployment
f
yeah.. let’s do that. always run sst diff before deploy.
p
Is that an expensive operation? Just thinking about CI timing
f
should be fairly quick
p
I have termination protection enabled in prod to prevent deletion. I think the stack has created 3-4 ec2 extra instances so far that I've had to clear out.
f
I see
Oh also, if u run
git status
after
sst deploy
in CI, do u see a
cdk.context.json
getting updated/created?
p
Wouldn't be such a big deal if I had some of the configuration scripts as part of the bastion host setup
I will check next time. What exactly is that used for? What would it mean if it wasn't updated?
f
So when CDK creates EC2 instances, it will look up the latest AMI image to use. It will then cache the image that’s used in
cdk.context.json
, so when u deploy again in the future, it doesn’t use the newer AMI image.
You need to commit the
cdk.context.json
.
p
Ah, perhaps that's it
f
I have a feeling this might be causing it.. running a
git status
after deploy in CI would answer that.
p
I'm not at my terminal at the moment but can check later
I will add sst diff and git status to the deployment
f
Sounds good. If it was updated, you can commit the file in CI. Or print out the file, change it on ur local and commit it.
sounds good!
p
Appreciate the follow up on this!
I've noticed functions getting needlessly updated too and I was thinking that ENV VARS were causing the issue. is that possible?
f
ah yeah I’ve noticed that too. Most likely some metadata CDK added.
We are looking into it.
p
it seems fairly harmless but makes me nervous. 😅
f
t
@Patrick Gold do you know if it's related to updating sst? We do have an issue right now where when you update sst all your functions redeploy
p
I actually had to update sst to even get my bastion host back last time.
I don't think that's my problem as I hadn't, at the time, updated in a couple months