https://www.puppet.com/community logo
Join Slack
Powered by
# puppet-enterprise
  • b

    bastelfreak

    08/26/2025, 9:24 PM
    I recommend to use the puppet_operational_dashboard and tune based on the statistics it delivers
  • j

    jms1

    08/26/2025, 9:24 PM
    not familiar with that, is that part of PE or is it a third-party thing?
  • b

    bastelfreak

    08/26/2025, 9:25 PM
    https://forge.puppet.com/modules/puppetlabs/puppet_operational_dashboards/readme
  • j

    jms1

    08/26/2025, 9:28 PM
    okay, so ... that looks interesting (and i've written it down) but it's not something i'm able to dig into right now ... back to my original question, does the output of
    puppet infrastructure tune
    change based on how many managed nodes are listed in the database and/or how often those nodes check in?
  • b

    bastelfreak

    08/26/2025, 9:31 PM
    I assume so, but no idea. I haven't used that in ages
  • b

    bastelfreak

    08/26/2025, 9:31 PM
    set reserved code cache to 2G, start 1 jruby instance per CPU core, add 1G HEAP per jruby instance and see how it goes
    thisdead 1
  • j

    jms1

    08/26/2025, 9:32 PM
    the "see how it goes" is what i'm hoping to avoid.
  • b

    bastelfreak

    08/26/2025, 9:33 PM
    it's a difference if all your nodes have 10 or 5000 managed resources
  • b

    bastelfreak

    08/26/2025, 9:33 PM
    and compiling a catalog with 50 PQL queries is way more expensive than 0 queries
  • b

    bastelfreak

    08/26/2025, 9:33 PM
    so estimating required resources is like rolling a dice
  • j

    jms1

    08/26/2025, 9:34 PM
    i've been doing that with PE 2016 for almost ten years now, and if this command can just give somebody a set of values that comes close, then i won't have to spend two hours figuring out the right values every time one of the processes crashes, or every time the kernel oom-kills one of them because somebody guessed too big.
  • b

    bastelfreak

    08/26/2025, 9:34 PM
    you can roughly calculate how long a catalog compilation takes and interpolate that for amount of nodes / runtime
  • b

    bastelfreak

    08/26/2025, 9:35 PM
    well throwing too many resources at it for such a small environment is easy
  • c

    csharpsteen

    08/26/2025, 9:37 PM
    puppet infra tune --estimate
    will attempt an estimation based on average compile time. But, for best results: • Set up metrics monitoring • Ensure your agent runs are evenly distributed by eliminating thundering herds • Add JRubies to compilers until you get to the CPU count, or a max of about 12 per compiler (Java code cache limits how far JRuby can scale vertically) • Add compilers until metrics show no requests waiting for JRuby services • Then, add one more compiler to give enough spare capacity to allow for reboots or other maintenance
  • b

    bastelfreak

    08/26/2025, 9:38 PM
    without the proper metrics, it's so so hard to figure out if you are exhausting the code cache / HEAP. and those are important values
  • b

    bastelfreak

    08/26/2025, 9:39 PM
    if that's exhausted, adding mir jrubies will just make it worse
  • c

    csharpsteen

    08/26/2025, 9:39 PM
    The key bit is that node performance is site-specific. JRuby time consumed depends on what mix of module code is being assigned to your population. So, you need metrics in order to see how the load from your nodes is affecting the compiler pool.
  • b

    bastelfreak

    08/26/2025, 9:40 PM
    and you usually need to reevaluate it from time to time, because your puppet code base grows / you update modules
  • j

    jms1

    08/26/2025, 9:50 PM
    so it sounds like ... (1) the theoretical ideal values will depend on the size of the code base, and on how many nodes are asking for catalogs at the same time ... (2) it is useful to re-evaluate the values from time, based on code growth and node growth (which is the question i had in my brain, even if it isn't what came tumbling out of my fingers) ... (3) seeing metrics from the PE server is an important part of doing that evaluation ... (4) the
    puppet infrastructure tune
    command is a "quick and dirty" way to get some starting values for a newly built PE server ... unless i got something wrong, i think that's enough for what i need right now, AND it gives me a direction to move toward in the future. thank you both
  • b

    bastelfreak

    08/26/2025, 9:51 PM
    2G reserved code cache fixes 50% of the issues for larger installations 👀
  • k

    kenyon

    08/26/2025, 9:53 PM
    I rerun
    puppet infra tune
    after each PE upgrade, but we really need the operational dashboards, just haven't been able to prioritize setting that up
  • c

    csharpsteen

    08/26/2025, 9:54 PM
    puppet infrastructure tune --estimate
    will give an estimated count of jrubies needed based on the node count and average compile time sourced by PuppetDB. This is done via Little's Law, which is very idealistic (it assumes no variance between nodes, i.e. no "expensive" outliers). That number is then padded out to target 50% capacity instead of 100% as the effect of outliers on queue latency tends to go exponential around 80% saturation.
    --estimate
    can provide a starting point. But, the math it is able to use is very simple, therefore monitoring actual performance remains important.
  • j

    jms1

    08/26/2025, 9:55 PM
    i think our busiest PE2016 server is managing just over 100 machines ... as for the code base ... catalogs (again, this is PE2016) seem to take about 25-30 seconds to compile and 45-50 seconds to run if no changes are made ... and i don't honestly know if our catalogs are "bigger than usual" compared to other sites.
  • j

    jms1

    08/26/2025, 9:56 PM
    Copy code
    (root@jc1) # cd $( puppet config print client_datadir )/catalog/
    (root@jc1) # jq -r -M '.resources|length' *.json
    4184
  • c

    csharpsteen

    08/26/2025, 9:56 PM
    Also,
    --estimate
    will tell you that you need +infinity JRubies if used while performance is already underwater. So, it's a tool for planning only. If you have a performance issue the three levers are: • Add JRuby instances • Increase the Puppet Run interval (eliminating thundering herds also helps a bit here by cutting the "peak" request rate) • Review module code and cut or re-factor expensive classes
  • c

    csharpsteen

    08/26/2025, 9:59 PM
    All of the above also assumes catalog compilation is the largest expense. This is usually the case. But, for some installations the following are also significant: • Time spent serving files. Especially large files. Offload these to dedicated file servers. • Time spent processing reports. Usually pops up if custom report processors are in use as the default PuppetDB processor offloads the data to PuppetDB as quickly as it can. • Time spent getting node classification. Usually pops up in Open Source Puppet or Puppet Core installations that use custom ENC plugins.
  • c

    csharpsteen

    08/26/2025, 10:09 PM
    For 100 nodes running every 30 minutes, with evenly distributed start times, and averaging 30 second compile times, the napkin math for catalog compilation would be: 100 nodes * (1 catalog request / 1800 seconds) * 30 seconds = 1.6 node-catalog requests to handle at all times So, 2 JRubies to soak up that load and prevent a wait list from growing. Double that to 4 to leave extra capacity and room for outliers. More may be needed if file serving, report processing, or node classification are not a rounding error on top of that 30 second compile time, or there is a lot of variance in run start times or catalog compilation times.
  • a

    Adrian Parreiras Horta

    08/26/2025, 10:13 PM
    Another anecdotal but still useful recommendation: no more than 12 JRubies per compiling server and no more than 2GB of ram per JRuby. Anything more than that is a bandaid over a problem that needs investigating
  • j

    Jason St-Cyr

    08/28/2025, 8:13 PM
    ⚠️ NOTICE: Upcoming Maintenance Window for the Puppet Forge WHEN: Monday, September 1st, 2025 between 0630 0730 AM GMT IMPACT: There may be a brief downtime on forge.puppet.com during a switchover to an upgraded backend service. WHAT: We are currently planning an upgrade window for some backend infrastructure that the Puppet Forge application currently relies upon. The forge.puppet.com application might have a brief downtime of a few minutes during the maintenance window. This outage also impact automation that is pulling modules from the Forge domain, so we recommend ensuring that retries have been setup in your automation so that it will reconnect once the system is back online.
  • k

    kenyon

    08/28/2025, 9:50 PM
    FYI there is a duplicate resource between this https://github.com/puppetlabs/puppetlabs-puppet_agent/blob/306809afba21d80e60c11c5a5714a4e04e93e492/manifests/osfamily/debian.pp#L51-L54 and line 111 of
    puppet_enterprise::repo::config
    in PE 2025.5.0