Hi since yesterday Sept 2nd 4pm UTC my self hosted prisma1 s Prisma #prisma1-community

Hi, since yesterday, Sept 2nd, 4pm UTC, my self-ho...

CHaBou

09/03/2022, 6:33 PM

Hi, since yesterday, Sept 2nd, 4pm UTC, my self-hosted prisma1 spams my logs with some error

com.prisma.akkautil.http.FailedResponseCodeError: Server responded with 503.

(every 30s average). I think this is related to the app.prisma.io shutdown. How can we prevent these errors in our logs?

CHaBou

09/05/2022, 12:10 PM

@Vladi Stevanovic Hi ! I know that is a community-only chan. But I think this is closely related to this Prisma Cloud sunset. Can you help us for this transition?

Vladi Stevanovic

09/05/2022, 12:19 PM

Hello Julien, thank you for raising this! I'll check with out team and get back to you asap!

CHaBou

09/05/2022, 12:25 PM

Thank you very much 🙏

Vladi Stevanovic

09/05/2022, 12:27 PM

I got confirmation from the team that we actually haven't changed anything on our server infrastructure yet (we're a bit behind in shutting them down 😅 ). Therefore, it sounds like this issue was not triggered by the sunset.

CHaBou

09/05/2022, 1:51 PM

oh... I will try to go deeper. This is hard to believe haha 😅

👍 1

Vladi Stevanovic

09/05/2022, 1:52 PM

Ok, let us know if you find any other possible causes or additional information about the issue!

👍 1

CHaBou

09/05/2022, 1:57 PM

Oh ! I found a better error (before the 503):

Push to Prometheus Gateway failed with:

It is triggered by thi function: https://github.com/amblerhq/prisma1/blob/1a58e2f993b398cce590ee7d13c842c3a18020ca/[…]ain/scala/com/prisma/metrics/micrometer/CustomPushGateway.scala

CHaBou

09/05/2022, 1:58 PM

something related with metrics-eu1.prisma.io

CHaBou

09/05/2022, 1:59 PM

It would be great that this URL return a 200 (for every route)

CHaBou

09/05/2022, 1:59 PM

I can set a

PROMETHEUS_PUSHGATEWAY_ENDPOINT

sys env and I can create a "black hole" but I think that I am not the only one with this need

CHaBou

09/07/2022, 12:47 PM

Gentle up @Vladi Stevanovic 🙂

Vladi Stevanovic

09/07/2022, 3:50 PM

👋 Hello Julien, thank you for your patience. I've spoken with the engineering team and they confirmed that: • We also did not make any changes to any metrics servers yet • We did not receive any reports of similar issues from other users. The team will provide some more troubleshooting steps

janpio

09/07/2022, 3:57 PM

Do you maybe have any custom configuration on your setup that could potentially trigger additional or non standard communication to our metrics server? You are sure about the timestamp when this started? (We really did not touch any of the P1 Cloud related servers yet and will not for a few more weeks)

janpio

09/07/2022, 4:00 PM

Can you double check if your deployment is fully set up as a custom server btw? https://v1.prisma.io/docs/1.34/faq/migrate-to-self-hosted-fq15/#custom-servers

CHaBou

09/08/2022, 6:47 AM

I will ! Thanks you for the head's up

CHaBou

09/08/2022, 7:11 AM

I don't think we have any custom configuration and I can confirm that my deployment is fully set up as a custom server (we use prisma server in a docker container in our infrastructure since 2018). Here is our Ansible template, unchanged for years:

Copy code

env:
      PRISMA_CONFIG: |
        managementApiSecret: "{{ prisma_management_api_secret }}"
        port: 4466
        databases:
          default:
            rawAccess: true
            connector: mysql
            migrations: true
            host: "{{ prisma_db_host }}"
            port: 3306
            user: "{{ prisma_db_user }}"
            password: "{{ prisma_db_pass }}"

I understand that you don't change anything but maybe this is an outage? You can try by yourself: http://metrics-eu1.prisma.io/ : any route for this domain returns a

503 Service Unavailable: Back-end server is at capacity

. Not a 4xx. I suspect this is not attended. I understand that I am the only one to report this to you. But I am maybe the only one with a monitor on the error logs of the prisma1 server. Because it still works as expected, maybe this error exists for others but they don't have noticed it 😅

janpio

09/08/2022, 10:08 AM

Agreed. I'll poke around (or make someone poke around) a bit.

janpio

09/08/2022, 10:33 AM

How sure are you this has not been failing the same way since much longer? Was the logging in place all the time already and you are sure, before it just worked?

andrewicarlson

09/08/2022, 12:57 PM

👋 @CHaBou in your terminal if you do a

npx prisma1 account

what is the response?

CHaBou

09/08/2022, 5:04 PM

Hi, @andrewicarlson terminal in my docker container of this prisma1 server ?

CHaBou

09/08/2022, 5:06 PM

@janpio I am not sure that the metrics "worked" but I am 100% sure that no error was printed. We have some "legit" error every day for years.

janpio

09/08/2022, 5:07 PM

Ok cool. From our side the metrics have been broken for a lot longer, hence we were wondering if you might just have changed your logging in some way to now notice something that has already been this way for much longer. But does not seem so 👍

CHaBou

09/08/2022, 5:09 PM

Maybe broken but not returning 503? (or in some way that lead prisma1 server to not complain about it)

CHaBou

09/08/2022, 5:11 PM

@andrewicarlson I don't have a node environment in my container (no npx then)

andrewicarlson

09/08/2022, 5:19 PM

@CHaBou that likely means you execute 'npx prisma1 deploy’ from your local environment or through CI/CD, right?

CHaBou

09/08/2022, 5:32 PM

yes, and I am not logged 🙂

CHaBou

09/08/2022, 5:33 PM

(I was confused because the error is not related to schema deployment, but I understand this is a way to validate that we don't use prisma cloud 🙂 )

CHaBou

09/12/2022, 5:02 PM

Thank you @Mike B. for this confirmation 🙏

janpio

09/12/2022, 5:03 PM

Thanks indeed. We still have not figured out what is going on. Old systems...

andrewicarlson

09/12/2022, 5:03 PM

Thanks @Mike B. and @CHaBou – it does look like you'll need to make a configuration change to stop sending logs to the Prisma 1 Cloud logging service. I'm researching which config needs to be removed now.

🙏 2

andrewicarlson

09/12/2022, 5:06 PM

@Mike B. is your

logConfiguration

that's using

awsfirelens

pointing to your own New Relic instance?

Mike B.

09/12/2022, 5:06 PM

Yes

Mike B.

09/12/2022, 5:08 PM

FWIW, inside our

"environment"

options, we previously had:

Copy code

{
  "name": "LOG_LEVEL",
  "value": "INFO"
}

I removed that section today, redeployed, and same result.

CHaBou

09/12/2022, 5:09 PM

I think the only way is to patch the code or build a fake server that always returns 200 for any subroute https://prisma.slack.com/archives/C0152UA4DH9/p1662386398306359?thread_ts=1662229983.098279&cid=C0152UA4DH9

andrewicarlson

09/12/2022, 5:22 PM

@Mike B. I found an old version of Prisma1 that used an

ENABLE_METRICS

environment variable. I don't have confirmation yet but I'm wondering if that defaults to

true

with the image version you're using. If you're open to to it you might try setting that environment variable to

false

and redeploying

andrewicarlson

09/12/2022, 5:32 PM

Thanks for giving that a go – still digging in!

🙏 1

andrewicarlson

09/12/2022, 5:41 PM

@Mike B. another attempt – based on line 13 here you might try adding a blank

PROMETHEUS_PUSHGATEWAY_ENDPOINT

environment variable (like

""

localhost

etc) I think it might throw a Metrics initialization error (hitting line 36)

andrewicarlson

09/12/2022, 6:05 PM

@Mike B. Awesome thanks for trying that. We're moving in the right direction! I'm trying to avoid setting up an actual black hole to capture those requests.

🤞 1

👍 1

andrewicarlson

09/12/2022, 6:50 PM

@Mike B. I wonder if you push a non-string value if it'll hit that line 36. For example:

PROMETHEUS_PUSHGATEWAY_ENDPOINT: false

If you or @CHaBou can push out a minimally reproducible example I can test some of these theories before sending them out to you!

CHaBou

09/12/2022, 7:29 PM

I didn't have an easy env to test this. I will let @Mike B. tell us 😊

Mike B.

09/12/2022, 8:59 PM

@andrewicarlson I’m defining everything in a task definition file for AWS Elastic Container Service, and it won’t let me put a non-str value.

Copy code

Invalid type for parameter containerDefinitions[0].environment[0].value, value: 0, type: <class 'int'>, valid types: <class 'str'>

I tried both of these and they threw that same error.

Copy code

{
  "name": "PROMETHEUS_PUSHGATEWAY_ENDPOINT",
  "value": false
},

...

{
  "name": "PROMETHEUS_PUSHGATEWAY_ENDPOINT",
  "value": 0
}

andrewicarlson

09/12/2022, 9:37 PM

Yeah sorry – I guess the task definition file requires strings. I haven't found docs confirming this but it may be coercing types later... Did you try

"false"

"0"

Mike B.

09/14/2022, 3:42 PM

@andrewicarlson Sorry for the delay. I had actually originally tried “false” and “0" which produced the same result.

Jeff Gardner

09/21/2022, 4:52 PM

any update on this? we are having the same issue

andrewicarlson

09/21/2022, 6:31 PM

@Jeff Gardner have you tried updating the environment variables like above?

Copy code

"name": "PROMETHEUS_PUSHGATEWAY_ENDPOINT",
"value": "localhost"

Jeff Gardner

09/21/2022, 6:33 PM

i have not, although i got the impression from the messages above that it didn’t fix the issue for those who tried it, was i incorrect?

andrewicarlson

09/21/2022, 6:33 PM

@Jeff Gardner it's as close as we have right now – not perfect but it should reduce the noise in your logs considerably as there won't be a full stacktrace

Jeff Gardner

09/21/2022, 6:38 PM

ok, i can give it a shot

CHaBou

09/26/2022, 10:04 PM

I finally made a server deployed on vercel that returns 200 for any path and method (GET/POST etc..). You have to set

PROMETHEUS_PUSHGATEWAY_ENDPOINT

<http://http200.vercel.app|http200.vercel.app>

No more error 🙂

👍 2

🎉 1

8 Views

Open in Slack

Previous Next