https://www.puppet.com/community logo
Join Slack
Powered by
# puppet-enterprise
  • b

    bastelfreak

    08/16/2022, 8:40 PM
    while I really hate the appliances, I think they are common for enterprise customers? so if it's a general problem I assume more people would open support tickets?
  • n

    nlew

    08/16/2022, 8:40 PM
    This sort of issue comes up relatively frequently.
  • b

    bastelfreak

    08/16/2022, 8:40 PM
    oho!
  • b

    bastelfreak

    08/16/2022, 8:41 PM
    tell me more šŸ˜„
  • n

    nlew

    08/16/2022, 8:41 PM
    Well just in terms of šŸ‘‹ ā€œsome network device seems to be causing troubleā€ šŸ‘‹
  • b

    bastelfreak

    08/16/2022, 8:41 PM
    meh šŸ˜ž
  • b

    bastelfreak

    08/16/2022, 8:41 PM
    yeah
  • b

    bastelfreak

    08/16/2022, 8:42 PM
    okay, will do some tcpdumping tomorrow I guess
  • n

    nlew

    08/16/2022, 8:42 PM
    We’ve been discussing how we can make it easier to manage/diagnose, and whether there are changes to the software to make it more resilient
  • n

    nlew

    08/16/2022, 8:42 PM
    But unidirectional packet loss is the worst
    āž• 1
  • b

    bastelfreak

    08/16/2022, 8:42 PM
    do it like consul: support providing a list of brokers to the pxp-agent. let them pick a random broker and manage failover
  • b

    bastelfreak

    08/16/2022, 8:43 PM
    so we can eliminate loadbalancers (unless required for network separation)
  • n

    nlew

    08/16/2022, 8:45 PM
    It does support multiple brokers now, but I think it tries them in order, which is not so helpful for load balancing.
  • b

    bastelfreak

    08/16/2022, 8:45 PM
    yep. they are tried in order. I could randomize them. but support told me that's not recommended and loadbalancers are prefered
  • n

    nlew

    08/16/2022, 8:46 PM
    Even without a load balancer in the middle, there’s still likely to be other picky network devices along the route. šŸ˜•
  • b

    bastelfreak

    08/16/2022, 8:46 PM
    yes. but it would eliminate one potential error source
  • n

    nlew

    08/16/2022, 8:47 PM
    Yeah for sure.
  • n

    npwalker

    08/16/2022, 8:47 PM
    I thought you connected some of them directly without the LB and there were less or no errors?
  • b

    bastelfreak

    08/16/2022, 8:48 PM
    ah well. yes. I've two locations. one is small and has their own compiler. the pxp-agents/puppet-agents in that location connect directly to the local compiler
  • b

    bastelfreak

    08/16/2022, 8:48 PM
    the other location has the primary + 4 compiler. puppet-agent/pxp-agent there connect to the f5 an then to the compilers
  • c

    csharpsteen

    08/16/2022, 9:06 PM
    In general, "random % of nodes don't respond to Orchestrator" is caused by network devices dropping connections that are supposed to be persistent. Same was true of MCollective.
  • b

    bastelfreak

    08/16/2022, 9:11 PM
    I will see if I can do some debugging tomorrow
  • s

    Slackbot

    08/17/2022, 11:41 AM
    This message was deleted.
    s
    s
    • 3
    • 2
  • b

    bastelfreak

    08/17/2022, 1:18 PM
    coming back to the PXP debugging from last night: since 2020 the 15min idle timeout in the pcp-broker isn't 15min and it's also not hardcoded anymore. It got changed to 6min and is configureable: https://github.com/puppetlabs/pcp-broker/pull/227
  • b

    bastelfreak

    08/17/2022, 1:18 PM
    coming back to the PXP debugging from last night: since 2020 the 15min idle timeout in the pcp-broker isn't 15min and it's also not hardcoded anymore. It got changed to 6min and is configureable: https://github.com/puppetlabs/pcp-broker/pull/227
  • s

    Slackbot

    08/18/2022, 3:54 AM
    This message was deleted.
    m
    • 2
    • 3
  • m

    masterjc

    08/18/2022, 3:58 AM
    the status of the service itself is: Active: active (exited) and not Active: active (running) Is there a way to ensure it starts if failed, but with the oneshot type as (per my understanding) oneshot causes it to be active but exited
  • b

    bastelfreak

    08/19/2022, 2:19 PM
    again, it's me! debugging pxp agent. A question about the following log:
    Copy code
    2022-08-18 14:27:41.658329 DEBUG puppetlabs.cpp_pcp_client.connector:335 - Sending heartbeat ping
    2022-08-18 14:27:46.658799 WARN  puppetlabs.cpp_pcp_client.connection:670 - WebSocket onPongTimeout event
    2022-08-18 14:27:48.247769 DEBUG puppetlabs.cpp_pcp_client.connection:655 - WebSocket onPong event
    2022-08-18 14:29:41.658706 DEBUG puppetlabs.cpp_pcp_client.connector:335 - Sending heartbeat ping
    My understanding: • pxp-agent sends every 120s a keepalive • pcp-broker responds • pxp-agent expects the response with 5s •
    WebSocket onPong event
    between onPongTimeout and heartbeat means that the agent receives the response, but too late? (I've a lot of random onPongTimeout logentries)
  • b

    bastelfreak

    08/19/2022, 2:19 PM
    as a workaround I wanted to increase the 5s timeout to 10s. but that's hardcoded šŸ˜ž
  • n

    nlew

    08/19/2022, 4:59 PM
    Yep that’s right. The timeout is hardcoded but even if you increased it, I saw in the logs at least one pong that came in 59 seconds after the ping.
1...212223...73Latest