tl;dr: Do you have a Gradle build or plugin that w...
# community-support
a
tl;dr: Do you have a Gradle build or plugin that would benefit from a fine-grained caching tool? What would you use it for? I've been tinkering with a Kotlin Native project that compiles C code, but DevEx was bad because compilation was really slow, for two main reasons: 1. Any changes to the buildscripts, even if they were unrelated, meant a new build-cache key, which means a long, slow recompile of everything. 2. Build cache was too coarse, and the compile-C task would re-compile everything even if only 1 file out of 1000 changed. Or the changes in the sources were not relevant (like only changing a comment). I kept getting annoyed because any changes in the buildscripts would trigger recompilation of everything, which was really slow. So, I created a custom caching daemon! It supports caching individual operations (e.g. compiling a single file, or creating an archive). The cache key only considers the actual inputs, so it's not sensitive to buildscript changes. Another benefit: the parallelism can be more tightly controlled, no matter how many subprojects/tasks/files there are. It's still messy, but as a POC it's working well in my hobby project. I'm considering splitting it out as a separate library, but before I do I really wanted to get some more information: If I released a per-file caching library as Gradle plugin, how would you use it?
👀 1
Technical details How it works: I launch a separate JVM process and use RocksDB to individually cache operations. Because the cacher runs in a separate process the classpath is isolated, and can be used as part of the cache key - no more buildscript sensitivity! It uses unix domain sockets for communication (in theory it could use network sockets as well, for remote caching). Alternatives This wasn't the first approach I tried. I tried some alternatives, but they were lacking: • Incremental tasks are nice, but they can't reload historical results, and are too sensitive to buildscript changes. • Build cache is too coarse-grained (I wanted per-file, not per-task, caching), and is also too sensitive to unrelated buildscript changes.
j
Did you consider integrating an existing caching tool like ccache?
a
yes, ccache was a source of inspiration. I didn't want to use it though, because I don't want manual install prerequisites (i.e. no "to use this plugin you must first install `ccache`"). I wanted something that just worked, i.e. a regular JVM dependency. And I find custom tools management in Gradle requires a lot of manual work for little things that come for free for a JVM dependency (determining the right file to download based on the OS/arch, checking for new versions, caching the download, unpacking, checking the install is valid, deleting old/unused versions). Also, I'm using the run_konan util distributed with Kotlin/Native. I wasn't confident I could set that up correctly to work with ccache ([despite the ccache docshttps://ccache.dev/manual/4.11.2.html#_using_ccache_with_other_compiler_wrappers)). Finally, I liked the challenge :)
m
How does the cache key computation work? Do I have to manually hash the input files, etc..?
Am I right that it's looking like this? • The buildscript classpath changes • Gradle invalidates the task • reruns it • the task implementation sees that the actual inputs (excluding buildscript classpath) haven't changed and fetches the matching outputs • Gradle sees the task as
SUCCESS
(despite no work actually executing)
This is a compelling proposition but the friction is high. Basically, everyone would have to learn this mental model, which is probably going to come as a surprise to many. It makes the debugging/reading of build scans, etc.. harder
Also probably fingerprints the inputs twice? Unless you can get the Gradle values somehow?
a
How does the cache key computation work? Do I have to manually hash the input files, etc..?
In my project I use clang to pre-process each of the C source files, which is ideal for checksumming (changes in unused header files won't be reflected in the pre-processed output, and other changes (like comments) will be ignored.) So, yes, if the files are marked as inputs then fingerprints twice. Although, slightly differently. Gradle fingerprints the 'plain' file content, while my custom cache can use a more specialised fingerprinting. My custom cacher doesn't have to just be used in tasks though (it could be used in the configuration phase), and the files don't have to be registered as inputs (so Gradle won't finger print them), and add a custom
outputs.upToDateWhen { cacher.isUpToDate(files) }
check.
m
files don't have to be registered as inputs (so Gradle won't finger print them)
Just bypass everything 😄
I like this
I would experiment this. Not sure I'd deploy it but if can help advocate about the fact that invalidating all the time is ultra painful, it's worth it
v
Two points: 1. Yes, more fine-grained cacheability is definitely needed, there is an open feature request by me: https://github.com/gradle/gradle/issues/31482. Feel free to thumbs-up it. I only speak about work-action cacheability there increase the likeliness it gets attention and for me it would anyway be appropriate to pack in one work-action what should be cached together and that would also bring the benefit of parallel execution of the action. 2. If your task result gets invalidated by each change to the build script even if it is unrelated, then only because it is not unrelated but your build script is input for the task logic. As long as your buildscript is not adding to the task logic, it is not part of the input. If you for example have
Copy code
val foo by tasks.registering(Help::class) {
    outputs.file("gradlew")
    outputs.cacheIf { true }
}
then this task will stay up-to-date or can also be taken from cache, no matter what you change outside this definition in the build script (unless you there you add to the logic of that task. If on the other hand you have
Copy code
val foo by tasks.registering(Help::class) {
    outputs.file("gradlew")
    outputs.cacheIf { true }
    doLast {
        println("foo")
    }
}
then the build script is defining a part of the task logic and thus becomes an input for the task. Gradle cannot (or at least does not yet) differentiate which part of the build script changed, whether it is a part of the logic for that task or whether it is an unrelated change. You can also see what is part of the cache key and thus what is considered an input by using
-Dorg.gradle.caching.debug=true
. If your build script does not contribute to the logic of the task but still is considered an input for up-to-dateness and cache-key, then I'd say that is a bug you should report and that needs to be fixed.
m
I think the typical case is a convention plugin. All the tasks are just in your
build-logic:runtimeClasspath
so any change there invalidates everything
v
He did not complain that any change in the classpath invalidates and did not speak about convention plugins. That would be true for 3rd party tasks (not for built-in tasks) and could be mitigated by splitting things into multiple projects if necessary. He did complain that any change in the buildscript invalidates the task, and that is just not correct if the buildscript does not contribute to the build logic or otherwise is a bug. And even if it is about a precompiled script plugin and not a buildscript, the same applies. The shown example coming from a convention plugin behaves just the same as none of the convention plugin build contributes to the task logic. The tasks logic is only coming from the built-in
Help
task, so the inputs will not change even if you change the convention plugin. Even moving that task from the build script to the convention plugin it will stay up-to-date as the inputs are identical. If you of course talk about a custom task or a 3rd party task and so the runtime classpath of the task changes, then of course it is an input and that is good, because a change in runtime classpath can mean a different result, so the output must not be reused.
m
Agreed, the description of the problem could be more accurate. But that shouldn't hide the fact that it is a very widespread problem.
v
Again, I don't see where the problem is. If you change the project where a task is implemented, it must be out-of-date. If you stuff 100 tasks in one project, then of course all tasks are out-of-date if you change only one of them. If you don't want that, split the tasks in multiple projects. The complaint here was, that changing the buildscript makes the task out-of-date and that is simply wrong unless the buildscript influences that task logic in which case this is a necessity. And if it did not it is a bug. That more fine-grained cacheability would be nice is out of the question. I'd also like if an incremental task is non-incremental for some reason, that the individual actions can be served from cache as described in my feature request. 🙂
m
Replace
buildscript
by
buildscript classpath
and consider you have a single one in your root project or settings
This
buildscript classpath
contains all your tasks implementations
Changing one of them is invalidating all of your build
This is my problem and I believe OP's problem as well but can't speak to OP obviously
v
Ah, right, splitting into multiple projects will probably only help if the tasks are used in different projects. Having all plugins of the whole build in a single classpath is anyway bad-practice in my opinion as you know. 😄 Here you have another reason for it. Needless out-of-date tasks just because one plugin was changed. 🙂
m
Having all plugins of the whole build in a single classpath is anyway bad-practice in my opinion as you know.
Ah, I didn't know that!
But if you don't do that, you end up either with broken conflict resolution and/or malfunctioning BuildServices
So it's like chosing between 2 evisl
v
Ah, I didn't know that!
Oh, sorry, thought you were part of the last discussion about this here. My opinion is, just use what you use where you use it and only do things like
apply false
if there is a technical necessity, but not just declaring all in the root project just for the sake of doing it.
m
I'm part of many discussions 😄 Might have missed that one
Not doing the
apply false
thing is pretty dangerous
The BuildService issue only makes it a nogo for me
And TBH I think 90% of builds have
apply false
somewhere, it's in the Google recommendations, etc..
v
Nah, I remembered right, this is the discussion I referred to: https://gradle-community.slack.com/archives/CAHSN3LDN/p1754935180286129 🙂
Not doing the
apply false
thing is pretty dangerous As well as doing the
apply false
thing is pretty dangerous Let's not again replicate that other discussion now here, this thread is hijacked enough. 😄 Whatever you do, you do it wrong, and you will sooner or later hit problems, not matter which way you follow as there is no patent recipe that works for all cases.
m
Exactly, you can choose your illness 😅
In french we have this saying of chosing between "la peste" and "le cholera"
v
Same in German, "Pest oder Cholera"
🇩🇪 1
🤝 1
m
I'll stick that a long term solution of "single buildscript classpath for wiring (no dependencies)" + "isolated workers" would be really nice
no more illnesses
v
No, just different ones 🙂
m
"single buildscript classpath for wiring (no dependencies)" + "isolated workers" means you have a healthy life. The only illness is that we need to convert all existing plugins to use classloader isolation
But it's not a illness, ti's more like exercise to keep you fit
a
"single buildscript classpath for wiring (no dependencies)" + "isolated workers" means you have a healthy life
Except classloader isolation causes OOM https://github.com/gradle/gradle/issues/18313. Process isolation doesn't leak, but it's often quite resource intensive. 😬
m
@Adam you can do classloader isolation without the leaking Worker API
Or the Worker API can be fixed too
a
you can do classloader isolation without the leaking Worker API
Oh, do you mean with a manual classloader? We had to do that recently in Dokka
m
Yea, I'm doing the same everywhere now
Kotlin is also doing the same, let me dig that link
TBH I don't blame them considering all the issues with BuildServices
a
yeah, makes sense. In which case: why bother using the Worker API at all? :) (Although, a word of caution: I was warned off using ClassLoadersCachingBuildService since Dokka heavily uses coroutines, and modifying the classloader apparently affects coroutines mechanisms?)
m
why bother using the Worker API at all?
parallelism for people without configuration cache
Ultimately I wish the Worker API goes away
It duplicates a lot of the
Task
concepts
without the caching
modifying the classloader apparently affects coroutines mechanisms
I'd say there should be a way to make this work. I'd be willing to investigate this if you have a reproducer
v
Sorry, but that's a bit non-sense. I don't think worker api will go anywhere. It is not only about parallelism. It is also useful if you need to execute something with a different JDK, or otherwise have a need to do something in a different process that can be reused though. And the parallelism is also important when using CC, because CC only runs tasks in parallel while work items from within the same task action can run in parallel using the worker api. The missing cacheability for work items is bad I agree, thus the issue I linked above. The worker api does not really duplicate the task concepts, it is a sub-unit of the work of a task.
Also once the memory leak is fixed, it can be used for the classloader isolation without the need for manually fiddling around with classloaders.
m
Yes, process isolation is also a use case although not one I am interested in
while work items from within the same task action can run in parallel using the worker api.
you might as well use coroutines for that or an Executor or whatever primitive you want. This problem has been solved without Workers
once the memory leak is fixed, it can be used for the classloader isolation without the need for manually fiddling around with classloaders
Yea, that'd be good. Although I will probably keep doing the manual thing because the Worker classloaders are not completely isolated. They will force the kotlin-stdlib version for an example
v
Of course, you can also build your software without Gradle, that problem was solved long before Gradle. It just makes it easier and needs less reinventing the wheel. 🙂 And coroutines are useless, because I would never write a public plugin in Kotlin. 🙂
a
I'd say there should be a way to make this work. I'd be willing to investigate this if you have a reproducer
I'll ask. I think it was related to ThreadLocals. It was more of a theoretical concern than something we found.
👍 1
m
Oh yea, ThreadLocals + classloaders can be fun 😅
a
Yes, process isolation is also a use case although not one I am interested in
(more back to my original topic) What about a single, separate process that could run workers, and the process was shared for all subprojects? Would that be interesting for you?
m
TBH not sure. Managing one JVM heap is already some work 😅
v
What about a single, separate process that could run workers, and the process was shared for all subprojects? Would that be interesting for you?
Sounds like a shared build service which you can give work to do. 🙂
m
What about a single, separate process that could run workers, and the process was shared for all subprojects?
Sorry was in a bit of a rush earlier. To ellaborate a bit more, I have no specific interest in process isolation unless it is otherwise technically required. The two things I want: • use dependencies that potentially conflict with other plugins • do not invalidate my task if something else in the buildscript classpath changes. I think both of those can be fixed without spawning a separate process. If a separate process helps making this a reality then I'd consider it but it's not a requirement per se.
👍 1
a
> I'd say there should be a way to make this work. I'd be willing to investigate this if you have a reproducer Follow up about classloader caching being delicate: Both Analysis Api and Coroutines have 'shutdown' mechanisms. • https://github.com/JetBrains/kotlin/blob/v2.2.20-RC/analysis/analysis-api-standalone/src/org/jetbrains/kotlin/analysis/api/standalone/StandaloneAnalysisAPISessionBuilder.kt#L251-L268https://github.com/Kotlin/kotlinx.coroutines/blob/1.10.2/kotlinx-coroutines-core/jvm/src/Dispatchers.kt#L81-L84 So, care is needed to avoid bad situations (e.g. KT-74931): • A cached classloader can't be re-used if AA or Coroutines are shutdown. • AA or Coroutines can't be shutdown until the classloader is unused. • A cached classloader mustn't be dropped until the shutdown mechanisms have completed.
👀 1
m
Looks like KT-73438 is JetBrain internal
fixed 1
thank you 1
The coroutines is more or less ok-ish I think? It your usual static state problem. If your some of your libs rely on some static state that can be left in a non-recoverable state then it's bad
The good news is that if you control the code you can make sure
shutdown
is not called
If you don't well, it's more problematic. 😄
But I'd be surprised if a library pulls the rugs under my
<http://Dispatchers.IO|Dispatchers.IO>
a
I think both of those can be fixed without spawning a separate process. If a separate process helps making this a reality then I'd consider it but it's not a requirement per se.
for my use-case (caching C compilations) the decision to use a separate process was mainly influenced by wanting to use an embedded DB. Using a separate process avoids headaches with merging or concurrent access.
👍 1
m
Yup, I hear you. In that case, it's definitely understandable
It's great that we have solution like this to finally move on that problem of fine grained caching.
I just wish ultimately we have a no-compromise solution that doesn't require spawning a separate process. I think we can save that
(thanks for updating that link, I can see it now 🙂 )
> to workaround having to call
disposeGlobalStandaloneApplicationServices
so as not to leak a worker classloader That sounds like the bigger issue TBH? Static state is dangerous but could be OK (or else we wouldn't have
<http://Dispatchers.IO|Dispatchers.IO>
) but leaking memory is not OK
a
The good news is that if you control the code you can make sure shutdown is not called
I think shutdown has to be called, otherwise the coroutines will keep running, preventing the classloader from being gc'ed
m
Yea, that's the concerning issue
OkHttp has shutdown hooks as well except there's also an idle timeout
If coroutines are not doing anything for <insert configurable amount of time here> then stop all the Executors
In the very worst case, I think
disposeGlobalStandaloneApplicationServices
could probably be made so that it is recoverable?
Just store a global static flag
disposed
and restart the machinery next time someone comes in?
"Just" 😅
a
> I just wish ultimately we have a no-compromise solution that doesn't require spawning a separate process. I think we can save that Yeah, I would like to avoid a separate process. I've been thinking about it. In theory, it's possible to merge RocksDB databases. • task1 could open main-db in rw mode. • Then task2 would try to open main-db in rw mode, fail (because it's open in task1), then open it as a read-only viewer. • task2 would open a 'tmp-db'. If a value can't be read from the main-db, then it caches the result in tmp-db. • task2 would complete, and leave an instruction to the current owner of main-db: "Hey, merge this db when you're able /path/to/tmp-db". • task1 watches for 'merge instructions', and merges tmp-db into main-db. • task1 completes.
😮 1
but that ^ seems more complicated/brittle than a separate process
👍 1
m
What's the name of the axiom? Ockham's razor? Simple is better than complex
a
grug brained programmer? 😀 https://grugbrain.dev/ "apex predator of grug is complexity"
😃 1
m
Bookmarked this page ^^
Copy code
grug hear screams from young grugs at horror of many line of code and pointless variable and grug prepare defend self with club
yes!!
1
I'm curious why grug no like visitor pattern though
🤔 1
a
Yeah, I’m not 100% against the visitor pattern, despite what the grugbrain essay says. (In general, I’m not 100% for or against anything in software development.) However, I often think it’s better to encode operations directly in a tree rather than have another side thingie that performs operations over the tree. Sometimes this mixes concerns a bit, but I don’t mind that in this case.
from an interview
thank you 1
m
Grug smol braim really answer for everything!