This message was deleted Puppet Community #puppet-enterprise

Join Slack

This message was deleted.

# puppet-enterprise

Slackbot

06/07/2023, 1:44 PM

This message was deleted.

Marty Ewings

06/07/2023, 2:38 PM

Hey! i remember a discussion about pe_status_checks where the root cause was the fact timed out because the server api was slow to respond, but we couldnt raise the timeout in the fact as then facter excution would balloon in puppet runs

Marty Ewings

06/07/2023, 2:38 PM

Do you have a slow server response to the API thats goes away after a restart?

bastelfreak

06/07/2023, 2:38 PM

yes, we raised a ticket about the slow pe_status_checks some months ago

Marty Ewings

06/07/2023, 2:39 PM

is this the same issue with ops dashboard?

bastelfreak

06/07/2023, 2:39 PM

yes that's correct. the API gets slower over time and after 1 to 5 days it doesn't respond within a minute, which is the timeout in telegraf

bastelfreak

06/07/2023, 2:39 PM

yes same issue

bastelfreak

06/07/2023, 2:39 PM

we saw it on PE 2019.something and now on PE2021.7.3 as well.

bastelfreak

06/07/2023, 2:40 PM

and we really don't have many agents. in one setup it's ~500 agents loadbalanced on two compilers

Marty Ewings

06/07/2023, 2:40 PM

does it correspond with load on the server or is it just duration of puppetserver being online?

Marty Ewings

06/07/2023, 2:40 PM

oops you answered that

Marty Ewings

06/07/2023, 2:41 PM

the performance of the api shouldnt degrade with constant load, and nominal database size, and usage

bastelfreak

06/07/2023, 2:41 PM

yeah really just duration. When we get metrics, they are low. the systems aren't under high load/the memory/cpu is never full utilized

bastelfreak

06/07/2023, 2:41 PM

so I assume some garbage collection isn't working correctly in puppetserver

bastelfreak

06/07/2023, 2:42 PM

but that's more of a guess

Marty Ewings

06/07/2023, 2:42 PM

well we have the metrics for that, so there should be some indication of that before it goes silent

bastelfreak

06/07/2023, 2:46 PM

I can raise a new ticket, if you like to have a new support archive

Marty Ewings

06/07/2023, 2:46 PM

do you know the case number from last time

bastelfreak

06/07/2023, 2:53 PM

mhm I think 49999. @simonhoenscheid raised it

simonhoenscheid

06/07/2023, 2:56 PM

Yes it is

Marty Ewings

06/07/2023, 2:56 PM

49999 was the status checks one right?

Marty Ewings

06/07/2023, 2:56 PM

ok there is an engineering case on the slow api response ill follow that up

bastelfreak

06/07/2023, 2:57 PM

do you have a link to that?

Marty Ewings

06/07/2023, 2:57 PM

engineering cases are internal for PE, reference is PE-35341

bastelfreak

06/07/2023, 2:58 PM

mhm I thought SDP consultants are supposed to view the PE board, but I cannot access it

Marty Ewings

06/07/2023, 3:00 PM

this the jira instance you are using? https://tickets.puppetlabs.com/browse/PE-35341

bastelfreak

06/07/2023, 3:00 PM

yeah

bastelfreak

06/07/2023, 3:00 PM

I got the SDP certification recently, maybe something is still missing with my permisssions (or we're not supposed to view it?)

Marty Ewings

06/07/2023, 3:01 PM

Im not sure if im honost

bastelfreak

06/07/2023, 3:01 PM

dito 😄

Marty Ewings

06/07/2023, 3:01 PM

not my area

bastelfreak

06/07/2023, 3:03 PM

yeah, I will talk to the SDP manager

Marty Ewings

06/07/2023, 3:04 PM

probably best, looking at the support case, you indicated the slowness was due to heavy use of the jvm heap, i assume the restarts is how you are coping with this at the moment

Marty Ewings

06/07/2023, 3:06 PM

Probably best for a new ticket, the old scope was more around the debug API sometimes taking long time to come back as it pertained to the status checks timeout, which is only a few seconds, if you are restarting every few days due to 5min timeouts that are related to only duration of uptime, thats totally different,

simonhoenscheid

06/07/2023, 3:06 PM

We are currently thinking of establishing a systemd timer. Not the best solution

Marty Ewings

06/07/2023, 3:11 PM

Get a new ticket in, with this that precise description as a bug report and we will get it raised, the scope of the old engineering case didnt really capture this problem, which is worse

simonhoenscheid

06/07/2023, 3:12 PM

We will, but might probably be at the beginning of next week

Open in Slack

Previous Next