Adrian Parreiras Horta
12/04/2025, 9:25 PMjms1
12/04/2025, 10:20 PMr10k-alike script, some agents will occasionally fail with an error message like "unable to find module stdlib", because they happen to request a catalog while their environment is in the middle of being rebuilt. for PE2023 the "real" PE servers (being used to configure normal machines) will be using code manager so i don't expect this to ever happen, but on the "dev" PE servers (only used to prototype and test new puppet code before it gets committed and pushed to a repo) the environments are still being built by hand, because having to commit, push, and then wait for code manager to rebuild the environment, takes long enough that i lose track of what i was doing ... instead i use a script to rsync my changes directly to the environment, and i can go from "save changes" to "run puppet agent -t on a scratch machine" in about five seconds.jms1
12/04/2025, 10:23 PMr10k wrapper now known as "code manager", had some kind of secret API that it used to (1) wait until the compiler wasn't using a given environment, (2) "lock" that environment so the compiler wouldn't use it, (3) rebuild the files in the environment, and (4) "unlock" it so the compiler could build catalogs again ... but they also said "i think that's an internal thing that puppetlabs (at the time) doesn't want to share the details of"jms1
12/04/2025, 10:25 PMjms1
12/04/2025, 10:26 PMCVQuesty
12/05/2025, 1:26 PMcsharpsteen
12/05/2025, 7:04 PMpe-puppetserver.
The server side runs on the PE Primary and exposes a HTTP API that receives deployment requests which it handles by:
• Authorizing the request using a PE RBAC token.
• Running r10k deploy environment for each control repo branch listed in the request. This step includes logic to safely run r10k deploy concurrently against multiple branches along with logic to prune the r10k caches to prevent them from growing too large.
• When r10k finishes running, post-run scripts are executed. By default, this includes running puppet generate types on the environment, if needed.
• The result of r10k deploy enviroment + post run scripts is committed to an internal Git repository known as "File Sync Storage".
• Optionally, the API call may wait before returning a HTTP response until all clients ACK the deployment is live or a timeout has elapsed.
The client side runs in pe-puppetserver and pe-orchestration-services and:
• Polls the Primary for new deployments, every 5 seconds. This polling request also serves as the deployment ACK by notifying the Primary of the latest environment versions the client has deployed.
• If there are new deployments, runs a git fetch operation to pull commits from File Sync Storage to local client copies of the repository.
• Deploys updated environments brought in by git fetch. These updates are made atomic either by a very heavy JRuby read+write lock (legacy deployment) or by creating a new versioned copy of the environment and updating a symlink (lockless deployment, modern default). Lockless deployment gets a significant speed boost from modern GNU coreutils that default to cp --reflink=auto and a filesystem that supports reflinks (XFS, BTRFS, ZFS, notably NOT EXT4). Basically, for best performance run PE infrastructure on RHEL 9 or newer or Ubuntu 24.04 or newer (but don't use Ubuntu's filesystem default of ext4) .
• Performs cleanup of superseded environment content and git history.
Code Manager also serves a dual purpose when DR is enabled in that it syncs deployed code, CA state, and PE configuration from the Primary to the Replica.csharpsteen
12/05/2025, 7:13 PMfile resources that use content deployed through Code Manager.
This provides two benefits:
• An entire source of JRuby contention is eliminated as agents no longer have to make file_metadata requests to determine expected checksums, it's all just there in the catalog. file_content requests are also cheaper as they stay in the Java layer and hit the JGit service instead of going down to JRuby.
• The agent gets file content from the same deployment that its catalog was compiled from. Not a different version that may have come down in a subsequent deployment.
IIRC, the above combined into something like a 20% cut to the JRuby load in Puppet Lab's internal infrastructure when it was benchmarked years ago. Milage will vary, custom file mounts serving large blobs are still something to shift over to a dedicated file or artifact server.jms1
12/05/2025, 8:29 PM$DAYJOB has a "use it or lose it" policy for PTO)kenyon
12/09/2025, 6:08 PMservice { 'puppet': ...})? I see in the puppet_agent module that PE is supposed to manage it somehow, but I don't see anything in the puppet_enterprise module nor in the PE docs about how to manage it. https://github.com/puppetlabs/puppetlabs-puppet_agent/blob/b2ec88bf5a8fa331a485d5770dff6d51a0d07dd4/manifests/init.pp#L245-L253kenyon
12/09/2025, 6:09 PMkenyon
12/09/2025, 6:44 PMkenyon
12/09/2025, 6:47 PMcsharpsteen
12/09/2025, 7:52 PMpuppet manage puppet tends to go very weird (making a process kill its self is just signing up for odd occurances).
Using the service task to stop puppet would be my recommendation. Do not stop the service for an extended period of time on Infrastructure nodes, executing puppet agent --disable is a better approach there so that they stay active in PuppetDB.kenyon
12/09/2025, 8:36 PMservice task from a puppet run?
I thought about it being weird managing itself, but I just tested this and the systemd unit has KillMode=process, so restarting the service doesn't kill the puppet process that is applying the catalog, so it seems to achieve the desired effect of instantiating a new environment for the agentbastelfreak
12/09/2025, 8:41 PMcsharpsteen
12/09/2025, 9:16 PMservice task outside of a Puppet run, from Orchestrator.
Puppet is great at managing services, but having it manage its own puppet service is a fundamentally different animal. I can say from a decade+ of experience that that is a briar patch full of sharp edges. Been there, got the scars, find a different way to do it if you can.,csharpsteen
12/09/2025, 9:23 PMcsharpsteen
12/09/2025, 9:28 PMreboot resource gets that all in one operation.kenyon
12/09/2025, 9:35 PMreboot resource is an idea. We do that for physical machines because they have to be unplugged and moved after provisioning, but of course that doesn't happen for VMs. Have to be very sure that my domain join exec doesn't cause unnecessary reboots thoughkenyon
12/09/2025, 9:43 PMhashim vayalar
12/10/2025, 6:34 AMvchepkov
12/10/2025, 6:53 AMvchepkov
12/10/2025, 6:54 AMhashim vayalar
12/10/2025, 10:12 AMjms1
12/10/2025, 2:32 PMbastelfreak
12/10/2025, 2:33 PMjms1
12/10/2025, 2:37 PMbastelfreak
12/10/2025, 2:41 PMcsharpsteen
12/10/2025, 3:00 PMas a PE trial user you have access to the agent on the operating system you’ve installed the Puppet server on.https://help.puppet.com/pe/2023.8/topics/purchasing_and_installing_a_license_key.htm