Hi, since yesterday, Sept 2nd, 4pm UTC, my self-ho...
# prisma1-community
c
Hi, since yesterday, Sept 2nd, 4pm UTC, my self-hosted prisma1 spams my logs with some error
com.prisma.akkautil.http.FailedResponseCodeError: Server responded with 503.
(every 30s average). I think this is related to the app.prisma.io shutdown. How can we prevent these errors in our logs?
@Vladi Stevanovic Hi ! I know that is a community-only chan. But I think this is closely related to this Prisma Cloud sunset. Can you help us for this transition?
v
Hello Julien, thank you for raising this! I'll check with out team and get back to you asap!
c
Thank you very much πŸ™
v
I got confirmation from the team that we actually haven't changed anything on our server infrastructure yet (we're a bit behind in shutting them down πŸ˜… ). Therefore, it sounds like this issue was not triggered by the sunset.
c
oh... I will try to go deeper. This is hard to believe haha πŸ˜…
πŸ‘ 1
v
Ok, let us know if you find any other possible causes or additional information about the issue!
πŸ‘ 1
c
Oh ! I found a better error (before the 503):
Push to Prometheus Gateway failed with:
It is triggered by thi function: https://github.com/amblerhq/prisma1/blob/1a58e2f993b398cce590ee7d13c842c3a18020ca/[…]ain/scala/com/prisma/metrics/micrometer/CustomPushGateway.scala
something related with metrics-eu1.prisma.io
It would be great that this URL return a 200 (for every route)
I can set a
PROMETHEUS_PUSHGATEWAY_ENDPOINT
sys env and I can create a "black hole" but I think that I am not the only one with this need
Gentle up @Vladi Stevanovic πŸ™‚
v
πŸ‘‹ Hello Julien, thank you for your patience. I've spoken with the engineering team and they confirmed that: β€’ We also did not make any changes to any metrics servers yet β€’ We did not receive any reports of similar issues from other users. The team will provide some more troubleshooting steps
j
Do you maybe have any custom configuration on your setup that could potentially trigger additional or non standard communication to our metrics server? You are sure about the timestamp when this started? (We really did not touch any of the P1 Cloud related servers yet and will not for a few more weeks)
Can you double check if your deployment is fully set up as a custom server btw? https://v1.prisma.io/docs/1.34/faq/migrate-to-self-hosted-fq15/#custom-servers
c
I will ! Thanks you for the head's up
I don't think we have any custom configuration and I can confirm that my deployment is fully set up as a custom server (we use prisma server in a docker container in our infrastructure since 2018). Here is our Ansible template, unchanged for years:
Copy code
env:
      PRISMA_CONFIG: |
        managementApiSecret: "{{ prisma_management_api_secret }}"
        port: 4466
        databases:
          default:
            rawAccess: true
            connector: mysql
            migrations: true
            host: "{{ prisma_db_host }}"
            port: 3306
            user: "{{ prisma_db_user }}"
            password: "{{ prisma_db_pass }}"
I understand that you don't change anything but maybe this is an outage? You can try by yourself: http://metrics-eu1.prisma.io/ : any route for this domain returns a
503 Service Unavailable: Back-end server is at capacity
. Not a 4xx. I suspect this is not attended. I understand that I am the only one to report this to you. But I am maybe the only one with a monitor on the error logs of the prisma1 server. Because it still works as expected, maybe this error exists for others but they don't have noticed it πŸ˜…
j
Agreed. I'll poke around (or make someone poke around) a bit.
How sure are you this has not been failing the same way since much longer? Was the logging in place all the time already and you are sure, before it just worked?
a
πŸ‘‹ @CHaBou in your terminal if you do a
npx prisma1 account
what is the response?
c
Hi, @andrewicarlson terminal in my docker container of this prisma1 server ?
@janpio I am not sure that the metrics "worked" but I am 100% sure that no error was printed. We have some "legit" error every day for years.
j
Ok cool. From our side the metrics have been broken for a lot longer, hence we were wondering if you might just have changed your logging in some way to now notice something that has already been this way for much longer. But does not seem so πŸ‘
c
Maybe broken but not returning 503? (or in some way that lead prisma1 server to not complain about it)
@andrewicarlson I don't have a node environment in my container (no npx then)
a
@CHaBou that likely means you execute 'npx prisma1 deploy’ from your local environment or through CI/CD, right?
c
yes, and I am not logged πŸ™‚
(I was confused because the error is not related to schema deployment, but I understand this is a way to validate that we don't use prisma cloud πŸ™‚ )
Thank you @Mike B. for this confirmation πŸ™
j
Thanks indeed. We still have not figured out what is going on. Old systems...
a
Thanks @Mike B. and @CHaBou – it does look like you'll need to make a configuration change to stop sending logs to the Prisma 1 Cloud logging service. I'm researching which config needs to be removed now.
πŸ™ 2
@Mike B. is your
logConfiguration
that's using
awsfirelens
pointing to your own New Relic instance?
m
Yes
FWIW, inside our
"environment"
options, we previously had:
Copy code
{
  "name": "LOG_LEVEL",
  "value": "INFO"
}
I removed that section today, redeployed, and same result.
c
I think the only way is to patch the code or build a fake server that always returns 200 for any subroute https://prisma.slack.com/archives/C0152UA4DH9/p1662386398306359?thread_ts=1662229983.098279&cid=C0152UA4DH9
a
@Mike B. I found an old version of Prisma1 that used an
ENABLE_METRICS
environment variable. I don't have confirmation yet but I'm wondering if that defaults to
true
with the image version you're using. If you're open to to it you might try setting that environment variable to
false
and redeploying
Thanks for giving that a go – still digging in!
πŸ™ 1
@Mike B. another attempt – based on line 13 here you might try adding a blank
PROMETHEUS_PUSHGATEWAY_ENDPOINT
environment variable (like
""
or
localhost
etc) I think it might throw a Metrics initialization error (hitting line 36)
@Mike B. Awesome thanks for trying that. We're moving in the right direction! I'm trying to avoid setting up an actual black hole to capture those requests.
🀞 1
πŸ‘ 1
@Mike B. I wonder if you push a non-string value if it'll hit that line 36. For example:
PROMETHEUS_PUSHGATEWAY_ENDPOINT: false
If you or @CHaBou can push out a minimally reproducible example I can test some of these theories before sending them out to you!
c
I didn't have an easy env to test this. I will let @Mike B. tell us 😊
m
@andrewicarlson I’m defining everything in a task definition file for AWS Elastic Container Service, and it won’t let me put a non-str value.
Copy code
Invalid type for parameter containerDefinitions[0].environment[0].value, value: 0, type: <class 'int'>, valid types: <class 'str'>
I tried both of these and they threw that same error.
Copy code
{
  "name": "PROMETHEUS_PUSHGATEWAY_ENDPOINT",
  "value": false
},

...

{
  "name": "PROMETHEUS_PUSHGATEWAY_ENDPOINT",
  "value": 0
}
a
Yeah sorry – I guess the task definition file requires strings. I haven't found docs confirming this but it may be coercing types later... Did you try
"false"
or
"0"
?
m
@andrewicarlson Sorry for the delay. I had actually originally tried β€œfalse” and β€œ0" which produced the same result.
j
any update on this? we are having the same issue
a
@Jeff Gardner have you tried updating the environment variables like above?
Copy code
"name": "PROMETHEUS_PUSHGATEWAY_ENDPOINT",
"value": "localhost"
j
i have not, although i got the impression from the messages above that it didn’t fix the issue for those who tried it, was i incorrect?
a
@Jeff Gardner it's as close as we have right now – not perfect but it should reduce the noise in your logs considerably as there won't be a full stacktrace
j
ok, i can give it a shot
c
I finally made a server deployed on vercel that returns 200 for any path and method (GET/POST etc..). You have to set
PROMETHEUS_PUSHGATEWAY_ENDPOINT
to
<http://http200.vercel.app|http200.vercel.app>
No more error πŸ™‚
πŸ‘ 2
πŸŽ‰ 1