https://www.runatlantis.io/ logo
Title
j

jwr

05/23/2023, 1:47 PM
I'm seeing this error when running checking conftest policies after having upgraded to v0.24 today. previously this worked fine on v0.23.5 and lower. any thoughts?
d

Dylan Page

05/23/2023, 2:15 PM
There were some new features merged for policy checking
j

jwr

05/23/2023, 2:16 PM
are there user-facing breaking changes where i perhaps need to adjust my workflow or other config?
r

ross strickland

05/23/2023, 2:16 PM
can you provide any specific configuration information or further details in logs?
j

jwr

05/23/2023, 2:17 PM
yeah, i was in the middle of typing up a github issue. i'll have a link in a moment. thanks.
🙏 1
d

Dylan Page

05/23/2023, 2:18 PM
Ross is a contributor to the new policy-checking feature, he'll know the best. I can also help troubleshoot as well
j

jwr

05/23/2023, 2:22 PM
@ross strickland https://github.com/runatlantis/atlantis/issues/3438 thanks for looking. and i'm happy to paste any other relevant config or whatever if i left anything out.
👀 1
d

Dylan Page

05/23/2023, 2:23 PM
events.RunAndEmitStats
seems to be the last call in the stack
unless I'm reading the stacktrace backwards
j

jwr

05/23/2023, 2:25 PM
i attempted to paste it in chronological order, oldest at the top and newest at the bottom.
d

Dylan Page

05/23/2023, 2:27 PM
no its okay, I think the instrumented_client is burying the rest of the stack
r

ross strickland

05/23/2023, 2:30 PM
i'll take a look at the flow. but the recent changes pass information from conftest step runner and command runner as a json blob marshaled into a custom struct. i'm not sure how this might affect running conftest in a custom run command.
i suspect there is some breakdown there though.
d

Dylan Page

05/23/2023, 2:34 PM
I think it might have to do with the config given the error of
after top-level value
actually you're right, its failing on unmarshalling the json blob into the custom struct
r

ross strickland

05/23/2023, 2:35 PM
so, running conftest as a custom run step likely breaks a fair bit more with recent changes, as policy owners and approval count requirements wouldn't be transferred properly.
j

jwr

05/23/2023, 2:45 PM
is a custom command intended to be supported? or am i just doing it wrong the whole time but it somehow never broke until now? IIRC the reason i landed on a custom
conftest
command was that i couldn't find any other way to specify the path with
-p
, since the path is dynamic and changes on each run (it has the PR number or something as part of the path).
r

ross strickland

05/23/2023, 2:48 PM
previously custom command would work. whether or not it makes sense to preserve the functionality for policy_check specifically, warrants a discussion, i think.
with regard to your specific case, i think we should be able to find a path forward.
how is
-p
dynamic? are policies different for different PRs?
or would you be able to provide some example configuration which demonstrates how that parameter is dynamic? what i see in the issue appears static
j

jwr

05/23/2023, 2:55 PM
@ross strickland the pwd when we run the
conftest
command is:
/home/atlantis/.atlantis/repos/SOME_COMPANY/devops/4478/default/terraform/SOME_COMPANY/environments/THIS_ENVIRONMENT
so when we run:
conftest test -p ../../modules/aws/application/policies/
that policy path resolves to:
/home/atlantis/.atlantis/repos/SOME_COMPANY/devops/4478/default/terraform/SOME_COMPANY/modules/aws/application/policies/
The PR number in the middle of the path there is the dynamic part. The example seen on line 21 here seems to imply that we need a full path, and I wouldn't know how to make the PR number part of that full path: https://www.runatlantis.io/docs/policy-checking.html#step-2-define-the-policy-configuration does that make sense?
r

ross strickland

05/23/2023, 2:57 PM
interesting, are the policies materially different between PRs, or is it that they exist in the same repo as the TF code?
j

jwr

05/23/2023, 2:58 PM
same exact policies for each PR. they exist in the same repo as the TF code, and the policies which run against any given PR should be the same policies found in that PR. (or in other words, if i need to add or change a policy, I want atlantis to run the policies that i changed within the same PR where i am changing them.)
r

ross strickland

05/23/2023, 3:11 PM
got it. is there any concern that allowing policies to be modified inline with the code that they're checking could present a security risk? i.e. someone puts up a change, is blocked, and simply modifies the policy code to not block?
in any case, let me take a look. if you want to preserve this approach, there may be a way.
if you're open to other approaches, i'm happy to provide some examples of versioning with policy code.
j

jwr

05/23/2023, 3:14 PM
that security concern is valid, but this is a private repository where 100% of contributors are employees. so technically someone could do that, but if they did, they'd have an angry boss to answer to. and the benefit we get in return for that trade off is that we can more easily test our policies in PR's as we change them, and also we don't have to bake policies into an atlantis image or whatever. but i am happy to try other examples if you can think of a workflow which works for my use case. i really only stumbled into this iteration of the workflow just because i couldn't get prior iterations to work.
r

ross strickland

05/23/2023, 3:22 PM
did using the relpath in the policy configuration not work? i.e.
policies:
  policy_sets:
    - name: "yourpolicies"
      path: ../../modules/aws/application/policies/
      ....
looking at at the code, it seems that it should work, as the policy path is joined with the working dir. the
--all-namespaces
flag can be thrown on as an extra arg in the
policy_check
step.
j

jwr

05/23/2023, 3:26 PM
i don't recall if i had tried that or not, but the docs specifically call out that
path
should be
*absolute dir* path to conftest policy/policies
. so i may not have attempted it for that reason. i can give it a try now though... https://www.runatlantis.io/docs/policy-checking.html#step-2-define-the-policy-configuration
r

ross strickland

05/23/2023, 3:31 PM
got it. what i'm seeing in code indicates it should work. but we'll see.
as far as other approaches - here's what we do for our setup: • policies are hosting in their own repo • repo has tagged releases • SSM parameter in AWS holds desired version • an update script (using flock to prevent update contention) is run prior to the policy_check run step in the policy_check workflow. this queries SSM and runs conftest_update if the downloaded version and desired version do not match.
conftest can pull directly from github; this avoids having to package the policies in the image, and using an update script inline with the policy_check workflow allows for policies to be changed on the fly
j

jwr

05/23/2023, 3:44 PM
so i tried the
policy_sets
suggestion by setting this variable in atlantis and recycling its container:
{
          name = "ATLANTIS_REPO_CONFIG_JSON",
          value = jsonencode({
            policies = {
              owners = {
                users = [...],
              },
              policy_sets = [
                {
                  name   = "devops"
                  path   = "../../modules/aws/application/policies/"
                  source = "local"
                }
              ]
            },
            ...
and then i set my
atlantis.yml
workflow like this:
policy_check:
      steps:
        - policy_check:
            extra_args: ["--all-namespaces"]
but policy checking fails with this error commented back to a PR:
1 error occurred:
	* policy_set: devops: conftest: Error: running test: parse files: get file info: stat /home/atlantis/.atlantis/repos/springbuk/devops/4478/default/terraform/SOME_COMPANY/environments/production/production-default.json: no such file or directory
r

ross strickland

05/23/2023, 4:21 PM
can you add
- show
before
- policy_check
in the policy workflow section in
atlantis.yaml
?
so
policy_check:
      steps:
        - show
        - policy_check:
            extra_args: ["--all-namespaces"]
j

jwr

05/23/2023, 4:29 PM
huh, cool, seems like that worked. thanks.
is there a way that i can also run policy checks during the apply stage? currently i do it like this:
apply:
      steps:
        - init
        - show # this is the equivalent of `terraform show -json $PLANFILE > $SHOWFILE`
        - run: >
            conftest test -p ../../modules/aws/application/policies/ --all-namespaces $SHOWFILE
        - apply:
            extra_args: ["-lock-timeout=300s"]
r

ross strickland

05/23/2023, 4:32 PM
you can probably keep that logic as is and it would continue to work. but, what is the purpose of running it in the apply stage?
j

jwr

05/23/2023, 4:34 PM
one of our policies enforces a sort of maintenance window, where a certain resource should not be deployed during certain hours. for example, if we disallow deployments during a 12:00-13:00 maintenance window, a person might run
atlantis plan
at 11:59 and their policy check would pass. and then if they subsequently run
atlantis apply
at 12:01, they should be blocked. but i guess your suggestion is that since the
apply
stage doesn't usually report policy failures back to github anyway, then we can just leave it as-is?
r

ross strickland

05/23/2023, 4:36 PM
yeah it should continue to work since it won't be hitting that unmarshal block.
though, are there any cases where other policies might get approved and their triggers not resolved? this might cause applies to fail without maintenance window being hit? i'd almost recommend using a custom script which checks the date/time and exits out 0 or 1, with an error message, to break the workflow rather than using conftest policies.
j

jwr

05/23/2023, 4:44 PM
are you describing a scenario where some unrelated conftest policy fails (due to a bucket policy or IAM permission or something, unrelated to the maintenance window), and then someone `atlantis approve`'s it, and then we go ahead with the deploy and accidentally violate our own maintenance window?
r

ross strickland

05/23/2023, 4:46 PM
sort of, but the approval will not prevent conftest from blocking during the apply phase where you're out of maintenance window, but something else is failing policy. if that makes sense?
j

jwr

05/23/2023, 4:48 PM
oh i get it, where we do
atlantis approve
, and then the apply still fails even when the maintenance window is not active, because
conftest -p ...
doesn't know anything about approving anything. cool, thank you.
r

ross strickland

05/23/2023, 4:49 PM
np! hope this helped. do you feel your particular use case has been resolved?
j

jwr

05/23/2023, 4:53 PM
we have one other repo which also uses atlantis which i'll need to verify this stuff also works in that other repo. testing it now, i'll know in a few minutes...
r

ross strickland

05/23/2023, 4:54 PM
👍
j

jwr

05/23/2023, 5:05 PM
hmm, yeah, the multi-repo setup i have doesn't seem to play nicely with the changes we've discussed here. when i add a second repo to the atlantis config, like this:
policy_sets = [
                {
                  name   = "devops"
                  path   = "../../modules/aws/application/policies/"
                  source = "local"
                },
                {
                  name   = "sftp-user-management"
                  path   = "../../modules/aws/sftp-transfer/policies/"
                  source = "local"
                }
              ]
then a PR in the devops repo will get an error saying that the sftp-user-management policies aren't found. and a PR in the sftp-user-management repo will get an error saying that the devops policies are not found. is there a way that i can specify both policy sets in the atlantis config, and then choose which one i want in
atlantis.yml
in each repo?
i'm also happy to stick with the same config/workflow i had previously if the issue reported on github will eventually be made to work, and then i can just wait for 0.24.1 or whatever.
r

ross strickland

05/23/2023, 5:11 PM
how was the multirepo setup being handled previously? were they configured using separate workflows?
j

jwr

05/23/2023, 5:13 PM
i never had any
policy_sets
list in my atlantis config. the
atlantis.yml
for each repo was the only source of truth for which policies should be run in which repos, by way of
conftest -p ...
pointing to the appropriate directory in each repo.
r

ross strickland

05/23/2023, 5:18 PM
so, you can point path in the repo config to a dummy dir, and specify
-p
for the respective policy dir in each repo within the
extra_args
specified in the workflow for that repo
that should work
j

jwr

05/23/2023, 5:32 PM
hmm, i think that's what i'm already doing? this variable in the atlantis container:
name = "ATLANTIS_REPO_CONFIG_JSON",
          value = jsonencode({
            policies = {
              owners = {
                users = [...],
              },
              policy_sets = [
                {
                  name   = "devops"
                  path   = "../../modules/aws/application/policies/"
                  source = "local"
                },
                {
                  name   = "sftp-user-management"
                  path   = "../../modules/aws/sftp-transfer/policies/"
                  source = "local"
                }
              ]
            },
this workflow in
atlantis.yml
in one of the individual repos:
policy_check:
      steps:
        - show # this is the equivalent of `terraform show -json $PLANFILE > $SHOWFILE`
        - policy_check:
            extra_args:
              - "-p ../../modules/aws/application/policies/"
              - "--all-namespaces"
results in this error in a PR:
1 error occurred:
	* policy_set: sftp-user-management: conftest: Error: running test: load: loading policies: load: 1 error occurred during loading: stat ../../modules/aws/sftp-transfer/policies/: no such file or directory
that sftp-transfer directory doesn't exist because that's meant for the other repository.
r

ross strickland

05/23/2023, 5:41 PM
what you could do is combine both policy set configs into a single one which points to an empty directory on the atlantis image, and then add the correct
-p
arg for each repo extra_args on the policy_check workflow section.
j

jwr

05/23/2023, 5:56 PM
@ross strickland that works. this bit is unintuitive, but it does work:
policy_sets = [
                {
                  name   = "policies"
                  path   = "/tmp" # This is a dummy path, the actual policies are specified in atlantis.yml in each repository
                  source = "local"
                }
              ]
and then i can use
-p
in
extra_args
in the
atlantis.yml
of each repo to specify the real path to the real policies.
r

ross strickland

05/23/2023, 5:58 PM
👍 -- i think we should update the issue with this information and then use it to determine if any of the internals need to be modified, or if we just need to update documentation.
j

jwr

05/23/2023, 5:59 PM
sure, i'll leave a comment shortly. thanks for your extensive troubleshooting help with this.
r

ross strickland

05/23/2023, 6:02 PM
of course!
p

PePe Amengual

05/23/2023, 6:14 PM
https://github.com/runatlantis/atlantis/issues/3439 @ross strickland this one just came in
👀 1
a

Alberto Rojas

05/24/2023, 8:39 AM
I just tested my change, the panic is not longer there, but I also got the:
invalid character 'W' after top-level value
The issue I introduced is the one that generated the panic. There is something else that must not be related to my metric changes since I'm not touching the policies or the json file from the plan.
r

RB

05/24/2023, 11:48 AM
Ok then we need to revert both prs for now
@Alberto Rojas can you put in a pr to revert both commits?
a

Alberto Rojas

05/24/2023, 12:58 PM
Which commits? mine and which one? I think we should follow the conversation here: https://github.com/runatlantis/atlantis/issues/3438
r

RB

05/24/2023, 1:04 PM
Oh sorry saw this message very early in the morning
😅 1
a

Alberto Rojas

05/24/2023, 1:04 PM
@ross strickland @jwr can you confirm that the error is not from
RunAndEmitStats
, or am I wrong?
r

RB

05/24/2023, 1:07 PM
There were quite a few policy changes that were merged in, i believe by pseudomorph, dunno if hes in the slack
Oh that's ross lol
@ross strickland any chance you may know of how to get around this issue other than the workaround? Is there a programmatic way to fix this?
j

jwr

05/24/2023, 1:09 PM
@Alberto Rojas i pasted all the logs that i found in my issue (3438) and one of those log entries does mention
RunAndEmitStats
.
r

RB

05/24/2023, 1:09 PM
Now that we no longer see the server panic, i think it would be good to release a patch version to get the latest image release working
a

Alberto Rojas

05/24/2023, 1:11 PM
Let me try to find the commit that break the policy check, and determine if it was the metrics or not
👍 1
r

RB

05/24/2023, 1:14 PM
Thank you alberto, that would be very helpful if you can reproduce it and see which commit caused the issue
a

Alberto Rojas

05/24/2023, 1:15 PM
I'll paste the outcome in the issue
👍 1
r

RB

05/24/2023, 1:19 PM
Policy changes • feat(policies): Add granular policy_sets by @pseudomorph in #3086 (04/20, commit 684f2fa, previous commit d88857b) • feat: add clear policy approval functionality by @pseudomorph in #3351 (05/02, commit 434bab5, previous commit 1843376)
👍 1
:this: 1
We can always release another 0.24.2 patch after 0.24.1 patch
a

Alberto Rojas

05/24/2023, 1:20 PM
testing: d88857bb01a9515223d82d7d084b79581103271b
j

jwr

05/25/2023, 1:35 PM
@ross strickland i have another follow up question from this if you don't mind... i added
policy_sets
to my atlantis container config in order to get policy checking to work on 0.24:
{
          name = "ATLANTIS_REPO_CONFIG_JSON",
          value = jsonencode({
            policies = {
              owners = {
                users = [...],
              },
              policy_sets = [
                {
                  name   = "policies"
                  path   = "/tmp" # This is a dummy path, the actual policies are specified in atlantis.yml in each repository
                  source = "local"
                }
              ]
            },
my
atlantis.yml
file looks like this: https://pastebin.com/StxCajkA when we create a PR which involves
terraform/environments/production
, autoplan kicks in and works as desired, including conftest policy checking. but sometimes we have a PR which involves
terraform/environments/staging
, and that directory isn't mentioned in our
atlantis.yml
at all, so no autoplan happens, which is expected. in some cases we might comment on those PR's with
atlantis plan -d terraform/environments/staging
and atlantis will kick off a plan. but when we do that now, after having added the
policy_sets
block above, these PR's will get an error saying ``Error: running test: load: loading policies: no policies found in [/tmp]` . is there a way to instruct atlantis to only do policy checking for directories which are defined in
atlantis.yml
and do not do policy checking for directories which are not defined in that file?
r

ross strickland

05/25/2023, 3:58 PM
so the staging code does not contain policies in line in the repo?
j

jwr

05/25/2023, 4:01 PM
the staging code does not contain any entry in the atlantis.yml file, so when we manually run a plan for staging, it doesn't have any
extra_args
override for
-p
, which seems to cause atlantis to look for policies in
/tmp
, and there are no policies in
/tmp
because that was just a dummy directory.
r

ross strickland

05/25/2023, 4:01 PM
got it
are you using the default workflow for anything? do you have any other custom workflows than the one you provided an example of?
j

jwr

05/25/2023, 4:06 PM
we haven't defined a workflow called
default
, and the workflow seen in my pastebin is the only workflow in the entire file.
r

ross strickland

05/25/2023, 4:06 PM
unfortunately, enabling/disableing policy checks is global. if you're not using the default workflow for anything defined in atlantis.yaml, you could update policy check to run an empty command.
workflows:
  default:
    policy_check:
      steps:
        run: exit 0 (or something that exits cleanly..)
i'd think something like that should work
j

jwr

05/25/2023, 4:10 PM
cool. thanks. i'll give that a try. do i also need to explicitly define the usual init/plan/apply steps if i'm defining the
default
workflow like that? or would the usual init/plan/apply steps not be overridden if i'm only explicitly overriding the
policy_check
step?
r

ross strickland

05/25/2023, 4:11 PM
you'd only be overwriting the steps that you define, so the defaults should hold.
j

jwr

05/25/2023, 4:11 PM
awesome, thank you. i'll give that a shot later today.
r

ross strickland

05/25/2023, 4:11 PM
👍
j

jwr

05/26/2023, 6:58 PM
@ross strickland sorry for bugging you, but now that i try the suggestion and i add a default workflow which overrides policy_check, i'm still getting the same error. what i added to `atlantis.yml`:
workflows:
  default:
    policy_check:
      steps:
        - run: /bin/true
and then when I push a commit which has no explicitly defined workflow, and then i manually run a
atlantis plan -d ...
, I'm still getting the same error where it's apparently expecting policies to exist in the dummy path:
1 error occurred:
	* policy_set: policies: conftest: Error: running test: load: loading policies: no policies found in [/tmp]
any thoughts on that?
a

Alberto Rojas

05/30/2023, 1:30 PM
Any updates on this? is failing for us :(
invalid character 'T' after top-level value
j

Justin S

05/30/2023, 2:03 PM
@rowleyaj just wondering if you have used 0.24.1 image? I see they added an additional comimt for terraform config
a

Alberto Rojas

05/31/2023, 12:17 PM
we are getting now:
Error running policy_check operation: unexpected end of JSON input
https://github.com/runatlantis/atlantis/issues/3438#issuecomment-1570110273