Trystan
09/20/2023, 5:08 PMMartijn Visser
09/20/2023, 5:46 PMTrystan
09/20/2023, 5:49 PMFlinkUserCodeClassLoader
we end up in a second registry
a. one way around it seems to be
metrics.reporter.prom.scope.variables.excludes: job_id
classloader.parent-first-patterns.additional: io.prometheus
but that doesn’t always work and feels increasingly hackyTrystan
09/20/2023, 6:10 PMMartijn Visser
09/20/2023, 6:27 PMTrystan
09/20/2023, 6:37 PMChesnay Schepler
09/20/2023, 7:00 PMgrouping doesn’t add labels, it appends group key/values to the metric itself, not labelsCan you elaborate on this? At least from Flinks perspective we are not attaching labels to a singular measurement but the time series itself. Afaict in the prometheus client it's not even possible to attach a label to a single measurement.
Chesnay Schepler
09/20/2023, 7:02 PMTrystan
09/20/2023, 7:04 PMflinkCounterOne = register.addGroup("key1").addGroup("value1").counter("counter_one");
this is what i get when scraping flink’s prom reporter:
flink_taskmanager_job_task_operator_key1_value_counter_one
(i’m excluding some labels it attaches by default, like job_id, operator_name, etc)
what i want is this:
flink_taskmanager_job_task_operator_counter_one{key1="value1"}
Chesnay Schepler
09/20/2023, 7:04 PMaddGroup("key1", "value1")
Trystan
09/20/2023, 7:10 PMTrystan
09/20/2023, 7:23 PMflink_taskmanager_job_task_operator_key1_counter_one{job_id="b6626816beb11a6539b3b7dc06f57cdf",key1="value1"...}
so that’s still not quite it - key1
is added to the metric name itself. but this is closer! i definitely did have a typo when originally testing the key/value addGroup (which gave me something unexpected) - that’s a very subtle difference in what the two versions of that method doTrystan
09/20/2023, 7:27 PMChesnay Schepler
09/20/2023, 7:30 PMWe could add an option to change this behavior such that the key is not added to the metric name.is added to the metric name itself.key1
Trystan
09/20/2023, 7:31 PMChesnay Schepler
09/20/2023, 7:32 PMChesnay Schepler
09/20/2023, 7:33 PMit still doesn’t solve registering custom prom metrics with the collector, either, for cases like a shared lib that has its own internal metricsThis is ultimately something that you'll need to solve by extending the reporter I think. There is no good story in Flink for generic cluster lifecycle hooks (== stuff outside outside of jobs) that'd allow you to register things like this.
Trystan
09/20/2023, 7:37 PMWhat is preventing you from creating a new metic group on the fly at runtime with whatever label values you like?because once the metric is registered it’s bound with key/value pairs already. what we need is essentially to give it the keys ahead of time and values at runtime. example “raw” prometheus usage declaration:
public static final Counter counterOne = Counter.build()
.name("counter_one_total")
.help("Count of events")
.labelNames("schema", "source")
.register();
and usage:
counterOne.labels(event.schema, event.source).inc();
(this is an example with two labels - schema, source and the values being populated during runtime via event props)Trystan
09/20/2023, 7:41 PMThis is ultimately something that you’ll need to solve by extending the reporter I think. There is no good story in Flink for generic cluster lifecycle hooks (== stuff outside outside of jobs) that’d allow you to register things like this.this is helpful, thank you! i’m still not sure precisely how to tie code loaded by the
FlinkUserCodeClassLoader
to something controlled outside of that scope. prom seems to expect a global, jvm-wide singleton registry. which is why i was attempting to spin up a server “inside” the job such that the user prom metrics and a job-local http server wound up in the same class loading “scheme” (i’m probably not using the correct words for this…)Chesnay Schepler
09/20/2023, 7:42 PMbecause once the metric is registered it’s bound with key/value pairs already. what we need is essentially to give it the keys ahead of time and values at runtime.The current way to achieve this is creating separate groups/counters for each label value, It's not great admittedly (in particular since you need to cache all the groups!), but it should be at least possible.
Chesnay Schepler
09/20/2023, 7:52 PMi’m still not sure precisely how to tie code loaded by theThe simplest way to achieve this should be adding the metrics library (aka, the prometheus client and maybe some utils around that) into the lib/ directory of the Flink distribution; then you should be able to call that from the job code without any classloading problems, so long as you make sure that the job jar doesn't bundle the prometheus client (or you added these classes to the parent-first classloading patterns).to something controlled outside of that scope.FlinkUserCodeClassLoader
Trystan
09/20/2023, 7:56 PMclassloader.parent-first-patterns.additional: io.prometheus
that fixes it most of the time - once in a blue moon we have an duplicate registration issue when the job restarts 😞 there’s also one particular job which doesn’t start at all (immediate duplicate metrics error from prom). one thing we did not do, though, was add prom directly to flink’s lib/ dir - we are copying flink-metrics-prometheus
to the lib/ dir. maybe that’s wrong…Chesnay Schepler
09/20/2023, 8:02 PMwe are copyingThat should be fine-ish. I would advise against it though personally because you aren't in control of the prometheus client library version; wouldn't want your code to break because we updated the client.to the lib/ dir. maybe that’s wrong…flink-metrics-prometheus
once in a blue moon we have an duplicate registration issue when the job restartsDo you ensure that you unregister all metrics when a job fails/restarts, e.g., in the close() method of user-defined functions?
Trystan
09/20/2023, 8:32 PMflink-metrics-prometheus
in the plugins dir!
we don’t unregister all metrics right now. flink is an unusual case (for us) where the jvm lifecycle isn’t tied to the metric lifecycle. typically the jvm just crashes and that’s that. also, the user job jar is presumably just loaded once, and started N times so metrics shouldn’t re-register, right? the classloader has already loaded those statically-defined metrics once
i want to look into clearing metrics but the same issue arises - we don’t have a lifecycle hook to unregister. and if we did, i suspect we’d be clearing flink’s registered metrics and that might be a problem, too (unless it re-registers when incrementing if not present?). definitely an option to test, though
edit: i do see that the prom reporter itself clears the metrics registry on close
, but that’s likely on flink runtime exit, right? not job exitChesnay Schepler
09/21/2023, 7:25 AMjust loaded once, and started N times so metrics shouldn’t re-register, right2 things: • Since you said that they are statically-defined it should work, although you do leak some memory if you don't clean them up because the user classloader cannot be cleaned up if the prom registry has a reference to an object that comes from user-code. • There are some scenarios where job classes are unloaded from a TM and later reloaded. So long as a taskmanager has a slot for the job the calsses will stay, meaning that If the job just restarts this doesn't happen. If the job is rescaled then this might no longer hold; scale down could leave 1 TM not being involved in a job, it cleans up the job classloader, and then later the job is running on it again.
we don’t have a lifecycle hook to unregistersee
RuntimeContext#registerUserCodeClassLoaderReleaseHookIfAbsent
i suspect we’d be clearing flink’s registered metricsSo this depends. If io.prometheus is in the parent-first classloading patterns then closing the registry does indeed affect Flink metrics, because the reporter will be backed by the same registry. If you don't add it to the parent-first pattern and adjust the packaging instead (prom client in lib/ and not in user-jar) then they should be separate registries. You can unregister individual metrics though.
edit: i do see that the prom reporter itself clears the metrics registry onThis is indeed the exit from the runtime, but it also removes individual metrics in, but that’s likely on flink runtime exit, right? not job exit (edited)close
notifyOfRemovedMetric()
.Trystan
09/21/2023, 7:48 PMTrystan
09/22/2023, 7:12 PMChesnay Schepler
09/22/2023, 7:14 PMTrystan
09/22/2023, 7:18 PMChesnay Schepler
09/22/2023, 7:22 PMTrystan
09/22/2023, 7:32 PMTrystan
09/22/2023, 10:01 PMdefaultRegistry
(which our libs all use), is no longer used by the prom reporter - so our backdoor hook into flink/prom’s registry will also be disconnected. time to find another solution! it’s probably for the best; this has always felt a bit hacky 🙂
thanks again for all your help, it’s really helped clarify some of the order of operations and interplay between flink components that i was conceptually missing!