This message was deleted.
# puppet-enterprise
s
This message was deleted.
m
Hey! i remember a discussion about pe_status_checks where the root cause was the fact timed out because the server api was slow to respond, but we couldnt raise the timeout in the fact as then facter excution would balloon in puppet runs
Do you have a slow server response to the API thats goes away after a restart?
b
yes, we raised a ticket about the slow pe_status_checks some months ago
m
is this the same issue with ops dashboard?
b
yes that's correct. the API gets slower over time and after 1 to 5 days it doesn't respond within a minute, which is the timeout in telegraf
yes same issue
we saw it on PE 2019.something and now on PE2021.7.3 as well.
and we really don't have many agents. in one setup it's ~500 agents loadbalanced on two compilers
m
does it correspond with load on the server or is it just duration of puppetserver being online?
oops you answered that
the performance of the api shouldnt degrade with constant load, and nominal database size, and usage
b
yeah really just duration. When we get metrics, they are low. the systems aren't under high load/the memory/cpu is never full utilized
so I assume some garbage collection isn't working correctly in puppetserver
but that's more of a guess
m
well we have the metrics for that, so there should be some indication of that before it goes silent
b
I can raise a new ticket, if you like to have a new support archive
m
do you know the case number from last time
b
mhm I think 49999. @simonhoenscheid raised it
s
Yes it is
m
49999 was the status checks one right?
ok there is an engineering case on the slow api response ill follow that up
b
do you have a link to that?
m
engineering cases are internal for PE, reference is PE-35341
b
mhm I thought SDP consultants are supposed to view the PE board, but I cannot access it
m
this the jira instance you are using? https://tickets.puppetlabs.com/browse/PE-35341
b
yeah
I got the SDP certification recently, maybe something is still missing with my permisssions (or we're not supposed to view it?)
m
Im not sure if im honost
b
dito 😄
m
not my area
b
yeah, I will talk to the SDP manager
m
probably best, looking at the support case, you indicated the slowness was due to heavy use of the jvm heap, i assume the restarts is how you are coping with this at the moment
Probably best for a new ticket, the old scope was more around the debug API sometimes taking long time to come back as it pertained to the status checks timeout, which is only a few seconds, if you are restarting every few days due to 5min timeouts that are related to only duration of uptime, thats totally different,
s
We are currently thinking of establishing a systemd timer. Not the best solution
m
Get a new ticket in, with this that precise description as a bug report and we will get it raised, the scope of the old engineering case didnt really capture this problem, which is worse
s
We will, but might probably be at the beginning of next week