Is anyone having an issue with the Ortus docker im...
# box-products
p
Is anyone having an issue with the Ortus docker images just... not doing anything when launched? I have a docker-compose with the following service:
Copy code
services:
  intranet_cfml:
    image: ortussolutions/commandbox:latest
    container_name: intranet_cfml
    hostname: intranet_cfml
    environment:
      - BOX_INSTALL=true
      - BOX_SERVER_APP_CFENGINE=adobe@2021
      - cfconfig_adminPassword=password
      - TZ="US/Pacific"
    ports:
      - 80:8080
    volumes:
      - ./app:/app
    networks:
      - intranet_network
But the log in the Docker dashboard does nothing. Running on an Apple M2 MacBook Air.
b
What does the console output show?
p
Absolutely nothing. Pruning the system and re-downloading the images is doing nothing either... it's... running, but it's not doing anything?
b
What command are you running to start it?
p
docker compose up -d
in the same directory with the docker-compose.yml file. Would you like the entire .yml file contents?
b
Take off the
-d
and what do you see?
You can also bring up just that
intranet_cfml
service
p
Nada!
b
Wow
What if you try
docker run
by itself (to avoid attaching to the container). I assume there's some sort of low level error but I can't imagine what
p
Let's try. 🙂
b
Also check for any obvious issues like out of disk space, etc
p
1.6 TB free. I'm clear there.
Still nada.
Other containers are working. ACF docker works when it works. (Due to emulation). SQL Server works. It's only commandbox that's sitting and spinning.
But this is clearly running the container.
Just... no output and nothing works. 😐
p
Using your yml, I tossed the network and hostname and it spun up for me but I am on a Mac M1 and Java 11
even did a dump of cgi just to verify working
under docker instance you have there, what does the inspect tab show you have running for Java?
p
Standby... relaunching the container. I purged everything and reset Docker as part of my troubleshooting.
jdk-11.0.16.1+1
p
Have you run any CF/commandbox stuff on your mac yet?
p
Indeed I have. The change was Ventura.
But weird that it would affect CommandBox and not other containers.
p
Ahh I have not made that upgrade yet. But it could likely be a security issue, typically check Security/Privacy to see if something is blocked.
Could be from it not being an officially signed app
p
Including the ACF container. (Which, again, only works 35% of the time anyway due to AMD64 emulation.)
p
Never had any issues running Commandbox w/2021 on my M1/Monteray like that...strange
p
Nothing in Privacy & Security that could be blocking. I'm going to hail-mary reinstall Docker.
Complete Docker uninstall/reinstall didn't help. 🤨
p
My instance hung for 30-60 sec or less to prob grab the image then it was off to the install.
Sounding like Ventura is gonna be a pain down the road 😕
p
It's so strange because Docker is acting like the container is running without issue. I can start it up and shut it down like it's operating under normal circumstances. Otherwise it's just the container itself that isn't doing anything.
p
workaround sounds like network settings of all random things heh
p
And like I said, SQL Server? No problem. ACF? No problem; when it works. MySQL? No problem. I'm going to try the Lucee container to see if it's dying. I know the code will need some changes to work on Lucee, but it may help troubleshoot.
p
But were those setups you had already b4 upgrading the OS?
In Docker that is
p
No, I'm running them presently.
After purging everything. CommandBox is the only one giving me grief.
p
well maybe that suggestion on the link about network is the problem; dunno. It is what I tossed and it spun up
p
Removing the network from the docker compose didn't help. Adding the network configuration in the settings was strange... the value was already there. 😕
b
@pegarm Sorry, had to run an errand. Normally when box hangs on Windows, it's antivirus going crazy, but I wouldn't expect that to be an issue here
Can you override the
CMD
to be something like
bash
and then poke around inside the container • try running
box
directly • try running
run.sh
directly
There should be output from the bootstrap shell scripts before even reaching the parts that call
box
@jclausen Have you ever seen our docker containers just hang with no output?
p
Please. You’re helping me. I have all the patience in the world.
j
I have seen that happen in two cases, and it usually has nothing to do with the image or code: 1. There is a port binding conflict and docker networking can’t establish the external port 2. There is a core issue where some thread in the Docker engine of the machine is hung In both cases, restarting Docker desktop usually resolves it.
I would check for the first, since you are binding to port 80. If something else grabbed it, that could be the reason.
p
I did check that and tested multiple ports. Nothing else is grabbing port 80, and any other port, that I tried to bind it to didn’t have any effect.
j
Did you restart Docker desktop?
p
Restarted, docker desktop. Restarted the entire machine. Reinstalled, docker desktop.
j
Did you run
docker-compose up -d --build
after you changed the compose config? It won’t pick up changes to the compose file unless you rebuild.
You can also try this one-off command without networking:
Copy code
docker run ortussolutions/commandbox /bin/sh -c "box version"
If that succeeds, then I would suspect a networking issue.
p
Is --build necessary if I did a
docker system prune -a
,
docker volume prune
and
docker network prune
in between?
...and let it completely rebuild from the pull?
j
Yes. The build flag is always necessary. The compose definitions are stored in a different space
p
On it.
j
I also see you have
intranet_network
declared as a network on the definition. I would try the whole compose file without network declarations. You don’t need them in a single stack.
p
I did. With no effect. Although I didn't
--build
in between.
j
The only time you would use that attribute is you have a network declared on the docker host that the containers in the stack need access to.
p
Although, the larger docker compose has two containers... one for cfml and one for sql
j
You also don’t need this:
BOX_SERVER_APP_CFENGINE=adobe@2021
You can just use the image
ortussolutions/commandbox:adobe2021
instead.
It will save you the download time of the engine.
Even so, they will know how to talk to each other if they are in the same compose file.
You just use the service name as the host name.
p
My updated (and full) docker-compose.yml:
Copy code
version: "3.8"

services:
  intranet_cfml:
    image: ortussolutions/commandbox:adobe2021
    container_name: intranet_cfml
    hostname: intranet_cfml
    environment:
      - BOX_INSTALL=true
      - cfconfig_adminPassword=redacted
      - TZ="US/Pacific"
    ports:
      - 80:8080
      - 8500:8500
    volumes:
      - ./app:/app
      - ../common/app:/app/common

  intranet_sql:
    image: <http://mcr.microsoft.com/azure-sql-edge:latest|mcr.microsoft.com/azure-sql-edge:latest>
    container_name: intranet_sql
    hostname: intranet_sql
    environment:
      - ACCEPT_EULA=Y
      - MSSQL_SA_PASSWORD=P@ssword
    ports:
      - 1433:1433
    volumes:
      - mssqldata:/var/opt/mssql
      - ./data:/data
    networks:
      - intranet_network

volumes:
  mssqldata:
j
You also don’t need
hostname
declared, as those will be the same names you already have present.
p
Quit pokin' holes in my stupidity. 🤣
j
Lol! I’m not poking! Just trying to remove any redundancies for the purposes of debugging. 🙂
Try this one-off startup to see if the container starts outside of compose:
Copy code
docker run --rm -p 80:8080 ortussolutions/commandbox:adobe2021
Oh, I see you did the one-off startup above.
Did you, by any chance, recently restore from an X86 mac to an M1?
p
Nope. Clean install. Format the SSD on the M2 Mac, download and fresh install of the OS
Just ran the --build against that latest docker-compose.yml (minus the hostname) and still just dies.
j
Weird. What kind of memory/CPU settings do you have set for Docker on your machine?
p
message has been deleted
j
I suggest giving it all of the CPU’s. Does your
server.json
file have any heap size settings? I would also suggest turning on the VirtioFS experimental feature. On the M1, it makes file operations on volume mounts about 20x or more faster.
Here’s what I currently run:
Unless you already have data in the database container, can you try running a
docker-compose down
in the root directory of that compose file?
That will clean up every reference to the stack before rebuilding
p
I've done many `docker compose down`'s as well as purges.
New settings. Still no dice.
j
I’m wondering if it’s having trouble with directory permissions, possibly?
p
message has been deleted
It "dies" here.
j
Are you able to see it in Docker Desktop again?
p
"Dies" because the container is running.
j
What is the output of
docker inspect relaxed_thompson
?
p
And I can terminal into the machine and do stuff on it.
j
Not seeing anything there that would jump out at me, at all.
p
I love presenting challenges. 😐
j
If you ssh in to the container:
Copy code
docker exec -it relaxed_thompson bash
and run
box server list
Do you see a server running?
If so, run
box server log
and see if there’s any output there?
p
box server list
gives the same result... it just hangs.
Same with
box server log
j
Hmm…. So Commandbox, itself, seems to be hanging starting up.
That’s nuts.
Did you change any of your default Docker architectures?
Because the JVM in that container is built for arm64
p
I have not changed any default Docker architectures.
(I'd have to look up how to do it if you asked me to.)
At least not intentionally.
j
OK. I want to try something. 1. Shut everything down 2. Run
docker system prune
to remove any old images 3. Run
export DOCKER_DEFAULT_PLATFORM=linux/arm64
4. Try to run the
docker run
command above again.
( You may need to re-pull the image)
p
On it.
I have no containers running... I have no images locally...
Image is pulling now...
Hung.
j
😕
Can you try this command:
Copy code
docker run --rm ortussolutions/commandbox:adobe2021 /bin/sh -c "java --version"
I want to see if java itself is hanging.
p
message has been deleted
Does not appear to be.
j
OK. So it’s not Java. I might have to defer to @bdw429s to see if there’s any reason he can think of that CommandBox would hang on start. Especially if anything
box
just hangs inside the container.
p
Sorry for causing a headache. 😞
j
Something in your environment is different. Luis and two other of our team members use those images on M1's daily for development
Not a headache. 🙂 Just another problem to be investigated and solved. 🐿️
p
But... • Clean OS install • Clean Docker install • No crazy customizations other than CPU/Memory settings • All other containers work • Java works • Box fails on anything • M2 Apple MacBook Air 8 Core/10 Core, 24 GB / 2 TB
I wouldn't think this is an M1/M2 issue. I use it on an M1 MacBook Pro / 10C/32C, 64 GB, 2 TB and it works just fine.
j
The CommandBox images is running on your M1, but not on the M2?
p
Correct.
j
So, it could be something specific in the M2 processor arch that CommandBox is failing on.
I thought they were both identical as
arm64
architectures, but maybe there’s some difference
p
Could be, but I thought the same.
Apple's gonna Apple though.
j
I will pick this up in the AM, and dig around a bit more. I can’t see any major differences in architecture that could cause this, but something is. I may have to run to the Apple store and hack around on an M2. 🙂
👍 1
[ Hugs his last of the Intel chip Macbook Pros, pets it, and tells it to never leave me ]
p
message has been deleted
😂 1
b
You can run box with
Copy code
box -clidebug
to get some more information about when it hangs
p
b
That all looks fine
(the errors are expected as part of Jline)
You may try using
jstack
to grab a stack trace of the JVM to see what it's doing!
j
@bdw429s When I run it on an x86 (
docker run --rm ortussolutions/commandbox:adobe2021 /bin/sh -c "box -clidebug version"
), I see this:
Copy code
Nov 15, 2022 11:24:26 PM org.jline.utils.Log logr
FINE: Using terminal DumbTerminal
CommandBox 5.6.1+00618
I notice in the output above, it is using a different terminal after the error:
Copy code
Nov 15, 2022 11:14:37 PM org.jline.utils.Log logr
FINE: Registering shutdown-hook: Thread[JLine Shutdown Hook,5,main]
Nov 15, 2022 11:14:37 PM org.jline.utils.Log logr
FINE: Adding shutdown-hook task: org.jline.terminal.impl.PosixSysTerminal$$Lambda$166/0x0000000840302c40@11bac6d7
Nov 15, 2022 11:14:37 PM org.jline.utils.Log logr
FINE: Using terminal PosixSysTerminal
Nov 15, 2022 11:14:37 PM org.jline.utils.Log logr
FINE: Using pty ExecPty
How is the terminal detected?
b
The terminal detection is done by JLine, and it's detecting a dumb terminal because you didn't use the
-it
flag to make it interactive 🙂
j
Ah. Which version of jLine are we using?. I noticed some commits in v3 of jLine that are specific to ARM architecture support. There is only one version of MacBooks currently sold with the M2, so I am thinking it’s something specific to that processor arch and opening a terminal that hangs.
b
We're on JLine 3.21.0
If there's a new version out that may improve this, it's easy to test by just dropping in the jar over the old jar in CommandBox's lib folder
I check for new versions of these libs periodically
j
It looks like 3.21 is the latest.
👍 1
I may have to make a trip to the Apple store to play on an M2.
b
Ok, good. I would still recommend grabbing a stack trace of the JVM to see where it's hung
It's pretty easy with the jstack binary which comes in a JDK
Copy code
jstack -l <pid>
If it's hung in Jline, we can file a bug with them-- I've had pretty good luck in the past getting fixes in there
j
That’s true. @pegarm if you could try the one-off command to get the container actively running. Then
docker exec
in to it and run the stack trace command above, that might help.
p
Let me see what I can do. Meetings all morning, so it's going to be a couple of hours.
@bdw429s I've never used jstack. Do I literally execute
jstack -l <pid>
, or does pid represent a value?
b
Yes
j
In the container run “top” and you will see the process ID associated to CommandBox. It's a numeric process ID
b
The only trick is you likely don't have jstack in the container since it comes with a JDK, but not with a JRE
oh yes, sorry
you replace that with the actual PID, lol
j
The Adobe containers use the JDK now
b
cool
It's in the
bin
folder then
p
b
Well there you go, Lucee is to blame 🙂
I'm guessing you have no external internet connection inside the container?
Copy code
java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(java.base@11.0.16.1/Native Method)
        at java.net.SocketInputStream.socketRead(java.base@11.0.16.1/SocketInputStream.java:115)
        at java.net.SocketInputStream.read(java.base@11.0.16.1/SocketInputStream.java:168)
        at java.net.SocketInputStream.read(java.base@11.0.16.1/SocketInputStream.java:140)
        at java.io.BufferedInputStream.fill(java.base@11.0.16.1/BufferedInputStream.java:252)
        at java.io.BufferedInputStream.read1(java.base@11.0.16.1/BufferedInputStream.java:292)
        at java.io.BufferedInputStream.read(java.base@11.0.16.1/BufferedInputStream.java:351)
        - locked <0x000000008f55cdd0> (a java.io.BufferedInputStream)
        at sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.16.1/HttpClient.java:788)
        at sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.16.1/HttpClient.java:723)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.16.1/HttpURLConnection.java:1615)
        - locked <0x000000008f54b5d0> (a sun.net.www.protocol.http.HttpURLConnection)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.16.1/HttpURLConnection.java:1520)
        - locked <0x000000008f54b5d0> (a sun.net.www.protocol.http.HttpURLConnection)
        at java.net.HttpURLConnection.getResponseCode(java.base@11.0.16.1/HttpURLConnection.java:527)
        at lucee.loader.engine.CFMLEngineFactory.downloadBundle(CFMLEngineFactory.java:709)
        at lucee.runtime.osgi.OSGiUtil._loadBundle(OSGiUtil.java:566)
While loading OSGI bundles, Lucee decided to reach out and download the bundle from its update server
I don't even know if there is a timeout for those requests
j
If the local machine is connected to WiFi, then the container should have a path out to the internet.
p
I would think there's not since the container was running for hours yesterday in its hung state.
The local machine is connected to WiFi.
j
You can test it in the container
curl <https://forgebox.io>
b
HTTP calls can hang forever if they want ¯\_(ツ)_/¯
p
Yay!
b
Perhaps it was downloading a bunch of things
perhaps the remote server was the one hung. Who knows
p
...for hours on a gigabit fiber connection?
b
Um, yes, lol
That's like saying a Ferrari with an empty tank of gas should still be fast
If an HTTP client doesn't receive a response from a server, the speed of the connection is irrelevant
If you want to know what is getting downloaded, turn off this Lucee feature https://docs.lucee.org/guides/Various/system-properties.html#lucee_enable_bundle_downloadluceeenablebundledownload
p
I was stuck on "perhaps it was downloading." Read as: "was actively downloading."
b
The stack looks the same regardless to be honest
It's just stuck at "wait for another byte to come in"
p
Right
b
You can set that env var in the container before running
box
Copy code
LUCEE_ENABLE_BUNDLE_DOWNLOAD=false
It will throw an error instead of trying to download anything
I'd also like to point out I hate this stupid auto-downloading crap Lucee does with the passion of a 1000 burning suns and every time I've ever complained to Micha about it I was promised Lucee would always come with all necessary bundles and would never download anything. Yet, here we are and I run into users hitting this on a monthly basis for the last many years. 😠 Ok, I'm done complaining 🙂
p
Only a thousand?
b
That's a conservative estimate 😉
p
My current docker-compose is:
Copy code
version: "3.8"

services:
  intranet_cfml:
    image: ortussolutions/commandbox:adobe2021
    container_name: intranet_cfml
    environment:
      - BOX_INSTALL=true
      - LUCEE_ENABLE_BUNDLE_DOWNLOAD=false
      - cfconfig_adminPassword=redacted
      - TZ="US/Pacific"
    ports:
      - 80:8080
      - 8500:8500
    volumes:
      - ./app:/app
      - ../common/app:/app/common

  intranet_sql:
    image: <http://mcr.microsoft.com/azure-sql-edge:latest|mcr.microsoft.com/azure-sql-edge:latest>
    container_name: intranet_sql
    environment:
      - ACCEPT_EULA=Y
      - MSSQL_SA_PASSWORD=P@ssword
    ports:
      - 1433:1433
    volumes:
      - mssqldata:/var/opt/mssql
      - ./data:/data

volumes:
  mssqldata:
Going to try and spin this up.
b
On a related note, it's also possible to bump up Lucee's "deploy" log level to see information on downloaded bundles as well, but that's only possible inside the CommandBox CLI by manually editing the Lucee-server.xml file on disk which is meh...
Hey, I use the exact same admin password! 🙂
p
Ready for a kick in the balls?
I disconnected from our office WiFi and connected to my phone, and ran
docker run --rm -p 80:8080 ortussolutions/commandbox:adobe2021
message has been deleted
Something on our office WiFi must be blocking what's hitting Lucee.
I mean, the underlying problem of the whole container locking up when iot can't get to the server is still a problem, but...
j
Well, if network traffic is being blocked outbound, maybe there’s some policy on your firewall that is flagging the traffic coming from Docker
( like maybe it doesn’t want to download
jar
files? )
p
Checking the firewall.
b
Did you get the error message Lucee throws when bundle downloading is off?
I'm just curious what the URL/host are
So you can test with
curl
p
I haven't tested it yet.
Found this in the Firewall:
Copy code
alert http $HOME_NET any -> $EXTERNAL_NET any (msg:"ET POLICY Vulnerable Java Version 11.0.x Detected"; flow:established,to_server; flowbits:set,ET.http.javaclient.vulnerable; threshold: type limit, count 2, seconds 300, track by_src; http.user_agent; content:"Java/11.0."; content:!"13"; within:2; reference:url,<http://www.oracle.com/technetwork/java/javase/11u-relnotes-5093844.html;|www.oracle.com/technetwork/java/javase/11u-relnotes-5093844.html;> classtype:bad-unknown; sid:2028867; rev:8; metadata:affected_product Java, attack_target Client_Endpoint, created_at 2019_10_18, deployment Perimeter, signature_severity Informational, updated_at 2021_12_22;)
That's from my machine to 205.210.189.210.
Then from that IP address to my machine:
Copy code
alert http $EXTERNAL_NET any -> $HOME_NET any (msg:"ET INFO JAVA - Java Archive Download By Vulnerable Client"; flow:from_server,established; flowbits:isset,ET.http.javaclient.vulnerable; file_data; content:"PK"; depth:2; classtype:trojan-activity; sid:2014473; rev:5; metadata:created_at 2012_04_04, updated_at 2022_05_03;)
So Synology Threat Prevention doesn't like this at all.
b
Wow, I can't tell if it doesn't like what's being downloaded, or the fact that it's coming from the Java HTTP client
And since when is "Java Version 11.0.x" vulnerable 🤔
p
Good question.
b
@zackster ☝️ Useful information to keep in the back of your head. It seems Synology Thread Prevention may block Lucee's HTTP download of bundles for... reasons.
Thanks for sticking through the debugging on this one David. It's satisfying to know the root cause. 🙂
p
Indeed it is. Thank you all for helping me track it down.
b
If you can still add that env var, I would like to see the error to know what bundle was being downloaded so I can report it to the Lucee team
The latest version of CommandBox is using the latest version of Lucee which shouldn't be downloading stuff!
This has prevented (air gapped) entire government shops from being able to use CommandBox at all in the past 😕
p
Ugh. That's horrible.
Let me see what I can do to get that env var added (by simply spinning up my docker compose) and I'll get you the error tonight.
👍 1
With the
- LUCEE_ENABLE_BUNDLE_DOWNLOAD=false
ENV variable, it works just fine.
shakes fist at sky
1
b
Did you already put an exception in your firewall?
Normally Lucee will blow up when it reaches something it needs to download, but that flag is turned off
p
I did not, specifically so I could test. I did the following: 1. Added the ENV variable 2. Deleted all containers and images. 3. Did a
docker system prune -a
4. Spun up the
docker compose
...and observed it working. I then repeated the process, but removed the ENV variable, and observed it not working.
Then screamed into a bag.
b
Hmm, very interesting. It's possible the error is being ignored somewhere. Did the console logs show any indication of it blowing up?
p
I can find out.
b
I would have expected Lucee to blow up, which would cause
box
to fail, which would cause the container to error
If there's nothing in the console, there is one other way-- it's just more annoying
• run the container directly by starting
bash
• edit
path/to/.CommandBox/engine/cfml/cli/lucee-server/context/lucee-server.xml
and set
level="trace"
for the
deploy.log
logger (I usually install
nano
at this point) • Now run
box something
and let it error (or perhaps just hang) • Go open up
path/to/.CommandBox/engine/cfml/cli/lucee-server/context/logs/deploy.log
and see what goodness was logged there.