I've got an intermittent issue that's been hitting...
# community-support
c
I've got an intermittent issue that's been hitting an android app while on CI, not locally (new employee!). There's only two of us, and the person that did the initial build setup is long gone. The intermittent issue is > The message received from the daemon indicates that the daemon has disappeared After researching it seems like it could be because of
org.gradle.jvmargs
. The project isn't really that big, but it does use buildSrc (😢), some code-gen + ksp and a bunch of modules. I think this app could probably just be one single module because the app really isn't that involved (but I can't change that now) jvmargs is set to
-Xmx12g -Xms4g -Dfile.encoding=UTF-8 -XX:MaxMetaspaceSize=2g -XX+HeapDumpOnOutOfMemoryError
The rest of the gradle.properties is pretty basic
android.useAndroidX=true
kotlin.code.style=official
kotlin.daemon.jvmargs-Xmx4g
Doesn't fail locally on my 16GB machine (but it is slow as heck), but fails on CI on a 16GB machine.
a
We have the same problem on our CIs from time to time!! To note we have 400 modules and large gradle heap like 48g on our CI
c
😭 So... I guess the only solution is some sort of retry mechanism? I just wish we got a little more info in terms of what was going wrong
a
That’s what we do. I’m guessing it’s a form of memory leaks
v
Well, if Gradle daemon has 12 GiB max memory, plus Kotlin daemon 4 GiB, that's already 16 GiB. Plus maybe some more for further either process and maybe also some other processes running. Doesn't sound too far-fetched that running this on a 16 GiB machine can easily fail, or the daemon being killed by OOM killer of OS is Linux for example. If it is, check the system logs, it should usually show OOM killer actions. If that is the case or something else killed the daemon, there is nothing Gradle could do or recognize besides "the daemon disappeared".
âž• 1
c
So roughly kotlin daemon + gradle daemon = how much ram I'd need? Should maxMetaspace be included in the equation?
v
Depends on many more factors. How many worker processes are used, how much RAM can each of it use, and so on. RAM needed can also be more than heap + metaspace. And also how much RAM the OS and other processes need and so on. Easiest is probably to just try out which values make you build stable and performant.
c
Yeah, I was hoping that maybe theres a bit more of an equation than trial and error. but im fine with that. i realize that theres a lot of factors, but one side of me is like "itd be great to just run a command and see what all of the possible arguments and stuff add up to" glad to know im not missing something simple. thanks all
Hm. Just ended getting this issue again. Running on CI on a 64GB machine. I'm going to try to see if we can find OOM killer logs for this ubuntu machine.
everytime the issue occurs... there are some recent lint tasks that were being ran... so i wonder if lint is just the culprit as i know it can be instensive.
FWIW. It seems like maybe it was only failing on CI vs our local dev machines (with less ram) is because of macOS swap?
v
For sure possible
Or because macOS does not have an OOM killer function like *nix