Hey guys, I recently activated Configuration Cache...
# community-support
p
Hey guys, I recently activated Configuration Cache in my project. Everything works fine, except when executing
./gradlew :app:dependencies
I receive the following exception:
Copy code
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "pool-1-thread-1"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"
I am using
org.gradle.jvmargs=-Xmx10240m
and also increasing it to 12 GB didn't help. All the other tasks just run fine, e.g. building, testing etc. - the activated configuration cache is on develop for a few days now already. Does anyone have an idea what could cause the OOM on the dependencies task and how to fix it? Deactivating the configuration cache fixes it but should not be an option here 😄
t
If I would have to debug this, I would create a Heapdump and analzye it. https://docs.gradle.org/current/userguide/config_gradle.html -> HeapDumpOnOutOfMemoryError Its a possibility that the outOfMemory error occurs in the gradle client rather in the daemon that would explain why setting jvmargs has no effect
j
As a first step you could also use the
jconsole
utility to check if your heap settings make their way to the correct JVM process and the difference in heap usage between the other tasks and
dependencies
. If the latter one eats several GiBs more there is definitely an issue to investigate.
v
As a first step, I would simply use
--stacktrace
to see where the OOM happens (cli, daemon, worker, ...). And if it is not the CLI process, the
gradle-profiler
might also help to investigate what is happening with the heap.
p
In
VM Summary
Tab in
jconsole
I can see my jvmargs values as expected. Unfortunately, HeapDumpOnOutOfMemorryError seems to have no effect. Same applies for
--stacktrace
option. Memory Usage in
jconsole
is around 4 GB für Heap, so low enough I think and
testDebugUnitTest
consumes even more than
dependencies
but doesn't throw OOM, that's strange. I will have a look on the
gradle-profiler
t
that would kind of indicate to me that you are looking at the gradle daemon, which is affected by the jvmargs, but the Error happens in the client/cli which is not affected by the jvmargs but needs to be adressed using the JAVA_OPTS
👀 1
v
HeapDumpOnOutOfMemorryError seems to have no effect
Probably because wrong process, that's why I said, check first where the error happens
Same applies for
--stacktrace
option
Of course it still fails, but it shows you the full stacktrace where you can see where it happens, as I said.
p
I meant
--stacktrace
does not give any additional output
v
using the JAVA_OPTS
If so, better
GRADLE_OPTS
or you hit all start scripts that look at
JAVA_OPTS
👍 1
I meant
--stacktrace
does not give any additional output
Then what is the full stacktrace?
Where exactly did you get those two lines you pasted, they surely are not all output
Make sure you are not fooled by https://youtrack.jetbrains.com/issue/IDEA-241844 if this is from within IJ or AS.
p
I used the terminal in AS - default Mac terminal looks the same.
t
does adding --console plain help?
p
Nope, doesn't change anything unfortunately 😕
v
Hm, very strange. I don't know whether the
gradle-profiler
will help if it indeed is a CLI process problem. Does it fix it if you use
GRADLE_OPTS
to increase the CLI process max heap, respectively do you get a heap dump if you there add the dump option?
You can just do
Copy code
GRADLE_OPTS="...." ./gradlew ...
in case you didn't know
❤️ 1
p
with
Copy code
GRADLE_OPTS="-Xmx10240m -XX:+HeapDumpOnOutOfMemoryError" ./gradlew :app:dependencies --stacktrace
the task is running successfully 🤔
v
Yeah, then it is a CLI process problem
Remove the Xmx, option then you should get the heap dump hopefully
p
Copy code
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid10549.hprof ...
yup
v
I've never seen the CLI process going OOM actually 😄
😅 2
j
also if you have a chance with
jconsole
on a process that fails take a look at the Memory tab and check the various charts for different memory pools including non-heap to find the one that gets close to the limit
v
Well, as increasing the max hep helps, I'd say non-heap is not the problem 😄
p
I don't want to rule out the possibility that it's our pretty complex project setup. I'm kinda new to the team and still in the process of unterstanding the dependency stuff. Maybe I need to dig more into that
v
But, well, the CLI process is basically just handling the commandline input and output. That's why the default max heap is very low (
64m
).
j
I'd say non-heap is not the problem
it didn't help above, and OP wrote usage stayed around 4GB so it's worth looking IMO, won't take long...
or that's the launcher that is failing?
v
That might be a different problem, but the main topic here is the OOME and that is in the CLI process and increasing the max heap of the CLI process did help
j
I've never seen the CLI process going OOM actually 😄
same here, but weird things happen, and hopefully it will be easier to debug
v
Anyway, it would definitely be good if you could reproduce somehow getting this OOME and create an issue with it, or maybe with that heap dump, or right away find out yourself where the problem lies. Maybe there is some leak somewhere, or some unlucky handling in the CLI or something like that.
Or maybe you send one line that is longer than 64m and the CLI tries to read it, no idea 😄
j
you could also try different settings of Xmx to find out how off you are from the default limit
v
Hm, no,
println(".".repeat(64 * 1024 * 1024))
works without problem
p
Or maybe you send one line that is longer than 64m and the CLI tries to read it, no idea 😄
Wait a second... This project has pretty heavy output stuff during configuration phase which I currently can't get rid of. Since the problem occurred after enabling configuration cache - might that output be the cause for it?
j
if you want to analyze the heap dump yourself you could try
visualvm
❤️ 1
p
I just caught the
gradle-wrapper.jar :app:dependencies
Process in
jconsole
(got the daemon the other times) and that one actually doesn't have the arguments, Xmx is 64m for that
v
Yes, that's the default as I said 🙂
👍 1
Well, the default is in the
gradlew
and
gradlew.bat
Hm, interestingly now
println(".".repeat(64 * 1024 * 1024))
does fail with OOME in the commandline. But I get a nicer message and a proper stacktrace.
You could maybe try with a newer wrapper implementation
What is the SHA-256 of your
gradle-wrapper.jar
?
sha256sum gradle/wrapper/gradle-wrapper.jar
p
Got it 👍 Let me check where all the text output during configuration phase comes from in our project, maybe that's the real cause (since it's the same for every module it doesn't really provide a benefit)
7d3a4ac4de1c32b59bc6a4eb8ecb8e612ccd0cf1ae1e99f66902da64df296172
currently using 8.14.3
v
Hm, no, that's from 8.14.3, that should be fine, I tested with 7.6.4
👍 1
p
But could you test with 8.14.3? Maybe the missing output is a bug in newer versions?
v
Let me check where all the text output during configuration phase comes from in our project
Many output should not be a problem usually. I just thought maybe it is a very long line that busts the memory if the CLI reads it line by line from the daemon.
Hm, also interesting, I tried with exactly that wrapper. First run worked. Second run again shows an OOME with stacktrace, ..., wtf
😮 1
j
that kind of streaming bug can be sensitive to timing
v
Yeah, probably the 64m String is right on the edge and thus it might work or might not. It's just that now it consistenly fails, just worked the first time. 🤷‍♂️
Besides that still the OOME seems to come from somewhere else, as I get proper output and a proper stacktrace, opposed to what OP showed
But well, with OOME you never know where they happen.
Maybe some almost too big line filled it and then some other thread
pool-1-thread-1
actually has hit the OOME and produced that output.
So essentially, look at the heap dump you got with some analzyer like IntelliJ or JProfiler or any other to see what filled the heap
j
that's where investigating a heap dump is interesting ... now you can have an interesting exercise submitting a non-flaky test case to the project along a bug report 😉
😄 1
p
quite a challenge when you never used a heap dump 😄 will try my best 😉 might answer tomorrow since I leave soon
👋 2
Good morning guys, Sorry for late reply, I was sick on Friday. I will continue on that topic today. One new information I got: The massive log output during configuration phase wasn't the cause for the OOM as Björn already assumed. I got rid of it but OOM stays.
Currently trying to figure out what to look for in the heapdump
FWIW, this it how it looks in VisualVM
t
can you tell us how many lines ./gradlew appdependencies prints? (when its working, sth like ./gradlew appdependencies | wc -l
p
122.121
t
🙂 yup that will do it. gradle isnt exactly optimized for huge amounts of outputs without increasing memory. Pretty much the same thing will happen if your tests prints millions of lines
p
so it might just be a coincident that this happened after enabling configuration cache?
t
i would say yes. I could also be the case that something odd is going on and using configuration cache you suddenly have more dependencies
v
Our integ tests probably do output veeeery many lines and we never had that problem. Are you sure @TheGoesen? I would not expect the CLI to keep all lines in memory, that wouldn't make too much sense. That's why I thought it might be a very long line instead.
p
I just disabled it out of curiosity and got the line count again, pretty much the same with 122.119 but ran successfully without increasing the heap size of the CLI via GRADLE_OPTS (which I need to do with configuration cache enabled), kinda strange.
If I'm not wrong, longest line count can be get via
wc -L
, right? Then it would be 227, so not really long
😞 1
v
Hm, I tried with
Copy code
(1..1_000_000_000).forEach {
    println("foo: $it")
}
After ~300_000 lines the daemon went OOM, the CLI was fine at 35 MiB
I would say problem that a gradle process consumes lines too fast for its own good has been reported multiple times
v
Didn't look at the issues you linked to yet, but I now increased the daemon max heap with my
Copy code
(1..1_000_000_000).forEach {
    println("foo: $it")
}
snippet. You can hardly send output faster to the CLI process from the daemon. I'm at 4_500_000 messages right now and there is no problem on the CLI at all. The CLI process oscillates between 20 and 50 MiB. Only the daemon seems to have some leak, currently being at 12 GiB RAM. But that could also be because it does not do a bigger garbage collection as still enough heap is available.
So it seems any "CLI cannot receive output fast enough" seems to either be obsolete or a red herring to me.
Now had a look at your issues, and they are irrelevant. The first one is about the daemon going OOM which is still not the case here. The second one is about testkit going OOM which also is not the case here. Here it is about the CLI process going OOM. And my quick test-balloon now burst after ~7_000_000 messages, but again because now the 15 GiB max heap I configured for the daemon were exhauset. Of course that is a problem, but not the one discussed here.
The CLI process is still healthy and does not have any problem.
t
I am aware that these issues are not specific about the gradle cli, but in general gradle is not throttling itself when it reads too much output, thats pretty much present in every component. Calling println in a loop is actually not a very fast way to send output to the cli, as it causes a full flush. so something like this will most likely still work:
Copy code
val foo by tasks.registering {
    doLast {
        repeat(Int.MAX_VALUE) {
            println()
            repeat(5) {
                print("$it aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")
            }
        }


    }
}
but if you bump 5 to 50 or something its too fast
v
Talking about that speed, would you agree that
Copy code
repeat(Int.MAX_VALUE) {
    println()
    repeat(50) {
        print("$it aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")
    }
}
is the same regarding how fast the output is given to the CLI as
Copy code
repeat(Int.MAX_VALUE) {
    println()
    print((1..50).joinToString("") { "$it aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" })
}
if not even faster, because the whole string is given in one call?
I'm asking, because the latter is working just fine, while the former does reproduce exactly the behavior OP has described.
t
good question, I would not be confident saying they have or do not have the same performance characteristics.
v
Pff, running the CLI with profiler results in a result file wihtout content, running through debugger with debugger in OOM constructor still does not cause the breakpoint to be hit. This seems to be a very stubborn OOM somewhere.
😮 1
It's also not only the speed, this also works fine:
Copy code
repeat(Int.MAX_VALUE) {
    repeat(50) {
        print("$it aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n")
    }
}
So this is some really annoying edge case I guess.
Similar to this annoying f***er: https://bugs.openjdk.org/browse/JDK-8365649 but there it was easier to investigate as you "just" needed to reproduce the edge-case timing.
Hm, strange, now even the "original"
Copy code
repeat(Int.MAX_VALUE) {
    println()
    repeat(50) {
        print("$it aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")
    }
}
runs wihout any problems
Really strange thing
t
Looking around in the sourcecode: GroupingProgressLogEventGenerator$OperationGroup is the culprit here. The log output goes into the bufferedLogs (not sure if thats "correct", if the "focus" is not on the operationgroup where is it ?) bufferedLogs is just a ArrayList which can grow until OOME. And then there is this beauty of sourcedecode
maybeFlushOutput
, which only flushes the output after 2 seconds(maybe) or after 30 s (unconditionally) Not sure what happens when the OOME doesnt happen, I would guess either it gets so slow that the 2 s timeout hits or its supposed to always forward its output directly
and that may may not even be the original issue of op 😉
v
I also got the stacktrace-less non-debuggable OOMEs with
--conole=plain
. I think with that there should be no grouping tried.
Also I tried not within a task implementation, but in configuration code, so there should anyway no grouping be done yet
So if there is a problem, it probably is a further one
j
It can be excruciatingly difficult to nail down these kind of streaming/timing issues and craft a reliable reproducer (and then, non-flaky test cases for them). Typically it will crash or not depending on CPU load, i/o load, RAM availability and some randomness thrown in. Testing reliably usually require introducing artificial pauses in processing threads. My hypothesis here is that the configuration cache makes the daemon faster and/or make it group output beyond buffering capacity of the launcher. The issues quoted above are certainly close to the root cause.
p
If I can assist any further, let me know. Sharing the repo is not possible unfortunately 😕
v
Well, ultimately you should probably open an issue if there is none. The ones posted here were not about the CLI going OOM
p
Alright, will do that tomorrow 👍
👌 1