https://www.puppet.com/community logo
Join Slack
Powered by
# puppet-enterprise
  • b

    bastelfreak

    08/16/2022, 8:02 PM
    an f5 and a firewall and maybe something nobody knows
  • b

    bastelfreak

    08/16/2022, 8:02 PM
    (yey "enterprise" environment)
  • b

    bastelfreak

    08/16/2022, 8:02 PM
    I was told the firewall accepts the connections for pxp-agent and has an idle timeout of 60minutes. and in general connections work. it just looks like some are aborted from time to time
  • b

    bastelfreak

    08/16/2022, 8:03 PM
    which is an issue because during plans/tasks the orchestrator looses the connections and thinks the pxp-agent is dead. in fact the pxp-agent receives the job, executes it and also sends a success massage back
  • b

    bastelfreak

    08/16/2022, 8:04 PM
    I raised a support ticket about it, but support is running out of ideas
  • n

    npwalker

    08/16/2022, 8:04 PM
    I’ve heard of a task/plan run not running on nodes because they were disconnected at the start but I don’t think I’ve heard of the pxp-agent receiving it and getting disconnected after
  • s

    Slackbot

    08/16/2022, 8:05 PM
    This message was deleted.
    n
    • 2
    • 1
  • n

    npwalker

    08/16/2022, 8:06 PM
    seems like it’s definitely going to require some network spelunking
  • b

    bastelfreak

    08/16/2022, 8:06 PM
    yeah
  • b

    bastelfreak

    08/16/2022, 8:07 PM
    I was hoping the broker just might be low on resources. but how would I verify this
  • b

    bastelfreak

    08/16/2022, 8:08 PM
    the only jolokia metric that looked helpful was pcp-connect or pcp-connected and that's around 150-200 on each compiler. that should be the amount of connected pxp-agents. so the F5 loadbalancing is working/balancing even
  • b

    bastelfreak

    08/16/2022, 8:08 PM
    But also the broker shouldn't require many resources, that's just a few tcp connections with low traffic and not even many TLS handshakes
  • n

    nlew

    08/16/2022, 8:08 PM
    850 agents across 4 brokers is a light load, so I’m guessing that’s probably not the problem.
  • n

    nlew

    08/16/2022, 8:09 PM
    There is most likely some piece of network gear that’s either intentionally or unintentionally dropping or delaying packets.
  • n

    nlew

    08/16/2022, 8:10 PM
    Is there any pattern to the agents that have this problem, specifically with respect to where they’re located on the network?
  • b

    bastelfreak

    08/16/2022, 8:12 PM
    so I've 34 agents that are connected to one compiler without an F5 in between, they have 0 errors in the log
  • v

    vchepkov

    08/16/2022, 8:12 PM
    I would create a custom tcp profile and set keep-alive interval to say 60 seconds
  • v

    vchepkov

    08/16/2022, 8:12 PM
    that will keep connection from dying
  • b

    bastelfreak

    08/16/2022, 8:12 PM
    but if/where there is a firewall or any router, I don't know
  • b

    bastelfreak

    08/16/2022, 8:12 PM
    documentation is... rare
  • n

    nlew

    08/16/2022, 8:12 PM
    You could try decreasing the agents’ ping interval, that’s the
    puppet_enterprise::pxp_agent::ping_interval
    parameter
  • b

    bastelfreak

    08/16/2022, 8:14 PM
    I'm afraid that this has side effects and might kill the F5 or something 😄
  • b

    bastelfreak

    08/16/2022, 8:14 PM
    but, well
  • b

    bastelfreak

    08/16/2022, 8:14 PM
    something has to die in this setup
  • n

    nlew

    08/16/2022, 8:14 PM
    It could also be that there’s a maximum connection timeout (on the network device) that they’re exceeding, regardless of the idle timeout
  • v

    vchepkov

    08/16/2022, 8:15 PM
    Copy code
    # list ltm profile tcp tcp-custom 
    ltm profile tcp tcp-custom {
        app-service none
        defaults-from /Common/tcp-lan-optimized
        keep-alive-interval 60
    }
  • n

    nlew

    08/16/2022, 8:15 PM
    If you’re noticing failures, that means the agent isn’t managing to reconnect as soon as it’s disconnected, as it’s meant to. To me, that implies the connection is being silently dropped rather than being actively closed.
  • v

    vchepkov

    08/16/2022, 8:16 PM
    I am using it for some 'idle' services with no negative effect
  • b

    bastelfreak

    08/16/2022, 8:16 PM
    @vchepkov what's your idle timeout for those connections?
  • v

    vchepkov

    08/16/2022, 8:16 PM
    default is 5 minutes
1...192021...73Latest