https://www.puppet.com/community logo
Join Slack
Powered by
# puppet-enterprise
  • b

    bastelfreak

    08/19/2022, 5:03 PM
    yep. most of them have a delay of a few seconds, just a few are around 59
  • b

    bastelfreak

    08/19/2022, 5:03 PM
    okay, so the questions is, where is the delay coming from šŸ¤”
  • n

    nlew

    08/19/2022, 5:12 PM
    Can you correlate a ā€œlateā€ pong with the pcp-broker logs to determine whether the broker a) received the ping in a timely manner, and/or b) sent the ping in a timely manner?
  • n

    nlew

    08/19/2022, 5:13 PM
    It looks like when things are functioning properly, pongs come back in a couple of milliseconds. My assumption is that the packets are being delivered late in one direction or the other, so either the broker receives the ping ~59 seconds after it was supposed to, sends back a response immediately and that response is delivered immediately (very late), or the broker receives the ping immediately, sends back the response immediately, but the response is delivered ~59 seconds later.
  • n

    nlew

    08/19/2022, 5:19 PM
    I’m unclear what the buffering behavior of f5 is, but I also wonder if some of this traffic is being buffered somewhere.
  • v

    vchepkov

    08/19/2022, 5:22 PM
    I can tell for sure, no šŸ™‚
  • b

    bastelfreak

    08/19/2022, 5:22 PM
    my problem is that that the broker doesnt log from where it received a ping / to where it sends it
  • b

    bastelfreak

    08/19/2022, 5:22 PM
    and I have around 200 agents connected to each broker and I cannot isolate a broker. I the logback config be adjusted so it logs the remote host ame/certname?
  • n

    nlew

    08/19/2022, 5:26 PM
    Ah yeah it looks like all the ping behavior is handled by jetty directly, hmm
  • n

    nlew

    08/19/2022, 5:26 PM
    Turning on debug logging for
    org.eclipse.jetty.websocket.common
    might show it. I need to test that locally though
  • n

    nlew

    08/19/2022, 5:40 PM
    Well, that does show it, but it also logs way more than just that
  • n

    nlew

    08/19/2022, 5:56 PM
    Is there any correlation between these slow pongs and when you are actually running task jobs? That is, does this only happen when the broker is actually under some amount of load?
  • b

    bastelfreak

    08/19/2022, 6:04 PM
    no. I see the timeouts all the time. I didnt find a pattern yet. it varies from 5 to 100 timeouts per day per node
  • b

    bastelfreak

    08/19/2022, 6:05 PM
    and primary/compiler VMs are kinda beefy, and we rarely run more than two tasks at the same time (with 1 to 150 targets maybe)
  • n

    nlew

    08/19/2022, 6:06 PM
    The only way I could imagine your environment creating substantial load is if you’re running tasks that return really large results (mbs)
  • b

    bastelfreak

    08/19/2022, 6:07 PM
    mhm
  • b

    bastelfreak

    08/19/2022, 6:07 PM
    we have some envs where every class does a notify to print the class name
  • b

    bastelfreak

    08/19/2022, 6:08 PM
    and when a task runs puppet there is a lot of output. maybe a few hundred lines. but should be several MB
  • b

    bastelfreak

    08/19/2022, 6:08 PM
    has the broker or orchestrator any metric I should take a look at?
  • n

    nlew

    08/19/2022, 6:08 PM
    I would expect that to only cause a problem when that task is actually running though šŸ˜•
  • b

    bastelfreak

    08/19/2022, 6:08 PM
    and we didnt tune any of the default pcp message size options
  • b

    bastelfreak

    08/19/2022, 6:09 PM
    mhm yeah
  • n

    nlew

    08/19/2022, 6:10 PM
    There are KB articles like this https://support.f5.com/csp/article/K87173503 but I assume that would be a problem for all TCP traffic through the load balancer, not just PXP
  • n

    nlew

    08/19/2022, 6:10 PM
    And if the problem is simply network congestion/latency, why don’t other applications experience problems?
  • n

    nlew

    08/19/2022, 6:15 PM
    It’s hard to imagine, though not impossible, network conditions where the same request/response might take any of: 3 milliseconds, 7 seconds, and 59 seconds.
  • b

    bastelfreak

    08/19/2022, 6:52 PM
    maybe other applications are not that sensitive to latency
  • s

    Slackbot

    08/25/2022, 1:05 PM
    This message was deleted.
    j
    t
    r
    • 4
    • 13
  • s

    Slackbot

    08/25/2022, 1:18 PM
    This message was deleted.
    n
    b
    s
    • 4
    • 21
  • s

    Slackbot

    08/25/2022, 1:19 PM
    This message was deleted.
    eyeson 1
    c
    b
    • 3
    • 4
  • g

    Gareth McGrillan

    08/25/2022, 1:45 PM
    @Gareth McGrillan has left the channel
1...222324...73Latest