This message was deleted.
# atlantis-community
s
This message was deleted.
p
you upgraded the git client version ?
j
No.
p
look at the issue
j
This is all in Kubernetes, no change in image, or helm chart
p
this has been documented
j
i have seen the issue
What was posted there, doesnt appear to match up
p
what about the permissions of the Atlantis data dir?
j
They wouldnt randomly change
p
could that have changed ?
j
Copy code
statefulSet:
      securityContext:
        fsGroup: 1000
        runAsUser: 100
        fsGroupChangePolicy: "OnRootMismatch"
no change.
p
well something changed for sure and the Atlantis code does not do that , so if for whatever reason the filesystem permissions changed you could possibly get this error
j
i can not imagine a situation where out of 100+ PVC's, this one randomly changed
p
can you run like a chown on the pod ?
j
I cant launcht he pod
because it fails with the /nonexistent error
Copy code
❯ kubectl logs pod/atlantis-0 -n atlantis
{"level":"info","ts":"2023-04-17T15:32:41.901Z","caller":"vcs/gitlab_client.go:110","msg":"determined GitLab is running version 15.10.0","json":{}}
Error: initializing server: writing generated .git-credentials file with user, token and hostname to /nonexistent/.git-credentials: open /nonexistent/.git-credentials: no such file or directory
just spun up a test ENV
same error
def. tied just to atlantis
did a re-deploy int eh test ENV of everything using EFS
only atlantis breaks
p
efs is nfs basically
j
Yes.
p
could you try and ebs volume for Atlantis ?
get it out of efs?
j
one sec, im spinning up a 3rd test ENV
p
in your test env
j
with EFS and no pod sec. templates
since a fresh install with the defaults fail
One thing thats interesting, is the github issue, and the helm chart all assume the atlantis user is 100:1000, but its 1000:1000 on the debian image
probably a good reason to specify the UID here
Copy code
# Add atlantis user to Debian as well
RUN useradd --create-home --user-group --shell /bin/bash atlantis && \
    adduser atlantis root && \
    chown atlantis:root /home/atlantis/ && \
    chmod g=u /home/atlantis/ && \
    chmod g=u /etc/passwd
ok, its back to the original error
dubious permissions
p
in a non efs volume ?
j
still EFS
i have to setup a EBS provisioner, since we dont use it anywhere
but, im atleast back to atlantis being broken.
Copy code
drwxrwxr-x   5 1001 1001 6.0K Apr 17 15:55  atlantis-data
thats interesting, since that uid/gid does not exist
p
ohhhhh
j
I have no idea where that comes from TBH
and latest debian image is completely broken
Copy code
Digest: sha256:5389ae79b49230b8e4c6a305230f69e43f65c0cd4312f3822684880e39fdb47c
Status: Downloaded newer image for <http://ghcr.io/runatlantis/atlantis:v0.23.4-debian|ghcr.io/runatlantis/atlantis:v0.23.4-debian>
error: failed switching to "atlantis": unable to find user atlantis: no matching entries in passwd file
p
I think that @Dylan Page is fixing or fixed that
we had a few issues with the debían image
j
I dont understand why this is mounting the volume as 1001
i would expect it to mount it as root honestly
d
It’s fixed in the latest dev image
j
im still on the 23.3 image
which worked up until.. today
p
the thing is that the permissions are done at the dockerfile level and after that atlantis run
the code just uses the filesystem so there is no way for atlantis to change that
j
we deploy ia atlantis
i have no MR showing a change
we didnt change
we did a fresh deploy, same error
and the permissions are not done at the dockerfile level.
p
did you try the non EFS install?
j
we only use EFS
in 20 clusters
and only atlantis is having this error
Getting EBS provisioners in place will take a while
to troubleshoot a broken app
p
any information you have add it to the issue you commented since this chat will disappear in a month
j
ill try to get it in place
currently have to back atlantis out and move things back into flux for the time being
so we can operate
p
I do not run atlantis in K8s so I can’t help you much
j
so i found another instance of atlantis we have
deployed with the same code, same base image, only diff is this image we add vault cli
its working
I can tell you that EFS creates a random UID/GID for the mount point, which is where the 10001 or 10002 etc stuff shows up from
im not sure why that would randomly be a problem now.
so a customer EFS storage class is created.. and setting uid/gid, instead of using the Range like it defaults too
Copy code
drwxrwxr-x   4 atlantis atlantis 6.0K Apr 17 17:07  atlantis-data
so that looks more correct
p
can you force the uuid on the chart? I believe is possible
j
we have been doing that
now ive forced EFS to match
testing now
That fixed it
im not sure why the OTHER env is working, because we are not specifying a UID/GID on EFS there, just on the chart.
p
I worked with NFS for like 10 years, I always had to force uuid and user
that is why I wanted you to test EBS just to make sure
j
this is the only chart, where we force it
all using EFS
i think we have 20 clusters at this point.
The other atlantis deployment is working, without it being specified as well, which has me more confused
now it looks like atlantis has decided to use its own TF version
Copy code
{
  "level": "info",
  "ts": "2023-04-17T17:10:47.192Z",
  "caller": "terraform/terraform_client.go:361",
  "msg": "Detected module requires version: 1.4.5",
  "json": {
    "repo": "sphinx/terraform-aws-infra",
    "pull": "75"
  }
}
Copy code
ATLANTIS_DEFAULT_TF_VERSION=v1.3.7
Copy code
DEFAULT_TERRAFORM_VERSION=1.4.2
not even sure where that second ENV var comes from now..
p
in your workflow you have
required_version
? that version will be used
j
Copy code
required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.0"
    }
  }
  required_version = "~> 1.3"
}
thats what is defined everywhere
Copy code
environment:
      ATLANTIS_DEFAULT_TF_VERSION: v1.3.7
Ya, i dont actually see
DEFAULT_TERRAFORM_VERSION
even defined in the chart
j
im not sure what im supposed to be seeing?
required_version = "~> 1.3"
should not resolve to 1.4.5
ya, this doesnt make sense
NOTE
The Atlantis latest docker imageopen in new window
tends to have recent versions of Terraform, but there may be a delay as new versions are released. The highest version of Terraform allowed in your code is the version specified by
DEFAULT_TERRAFORM_VERSION
in the image your server is running.
DEFAULT_TERRAFORM_VERSION=1.4.2
and yet it pulls 1.4.5
when we say
required_version = "~> 1.3"
p
atlantis can download TF for you no matter what the default TF version is
j
the doc imply it wouldnt use 1.4.5
since the image states
DEFAULT_TERRAFORM_VERSION=1.4.2
either way though, we specify 1.3.
Copy code
{
  "level": "info",
  "ts": "2023-04-17T17:13:23.792Z",
  "caller": "models/shell_command_runner.go:156",
  "msg": "successfully ran \"/atlantis-data/bin/terraform1.4.5 plan -input=false -refresh -out \\\"/atlantis-data/repos/sphinx/terraform-aws-infra/75/default/us-gov-west-1/qa/network/default.tfplan\\\"\" in \"/atlantis-data/repos/sphinx/terraform-aws-infra/75/default/us-gov-west-1/qa/network\"",
  "json": {
    "repo": "sphinx/terraform-aws-infra",
    "pull": "75"
  }
}
Copy code
❯ cat us-gov-west-1/qa/network/provider.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.0"
    }
  }
  required_version = "~> 1.3"
}
The patch is relevant in 
~> 1.3.0
 to ensure it matches on only 
1.3.x
 whereas 
~> 1.3
 may match to 
1.4
why would it even match that
now im curious if its been using 1.4.5 this entire time
p
you could check the state file
j
doing so now
def. feels wrong that 1.3 could match 1.4
Copy code
"terraform_version": "1.3.9",
OK, im good now
TY
p
you fixed yourself!!!!
thanks to you
where you using 1.4.5?
d
~= or ~> is essentially a wildcard on the lowest tier version.
It’s common in most package managers
You’re basically saying “any version no less than 1.3, and no greater than 2.0”
If you do 1.3.0, it changes to “any version no less than 1.3.0, and no greater than 1.3”