Hey All Any of you know the best way to identify hot keys in Apache Flink #troubleshooting

Hey All, Any of you know the best way to identify ...

Joris Basiglio

09/19/2024, 9:55 PM

Hey All, Any of you know the best way to identify hot keys in Flink? Does Flink publish any metric for hot partition? I've tried manually publishing metric within the operator to our observability system but the keyspace is so big (billions of keys) that our observability system is incapable of presenting the data properly.

D. Draco O'Brien

09/20/2024, 4:11 AM

I am not aware of hot partition metrics out of the box. You can monitor the backpressure status of tasks or operators. High backpressure could indicate that task is processing data slower than it’s being produced

Copy code

taskmanager.job.<job_id>.total.backpressure-time, operator.<subtask_index>.backpressured-time-ns

You can also monitor buffer usage, especially if you’re using network shuffling. Uneven distribution of data across keys can cause some tasks to have a lot more buffered data waiting to be processed than others.

Copy code

taskmanager.network.<task_id>.outbound.<channel_index>.buffers-in-use, taskmanager.network.total.num-network-buffers

Large differences between event time and processing time can hint at inefficiencies, possibly caused by skewed data processing.

Copy code

operator.<subtask_index>.latency

It’s not a direct direct measure of hot keys, but monitoring the size of keyed state can give you an rough idea of which keys are retaining more state, which might correlate with processing load.

Copy code

state.backend.*

So these are some of the related metrics you can sample and aggregate

D. Draco O'Brien

09/20/2024, 4:17 AM

Given your issue with the keyspace size, one approach could be to sample the keys and their associated processing metrics periodically rather than tracking every s key. You can implement custom logic within your Flink application to sample and aggregate this data before emitting it to your observability system. I would probably opt for a moving window of recent keys and their processing times, periodically emitting the top N keys with the highest processing times or backpressure. You could also use Flink’s Metrics API to create custom metrics of aggregated info about keys without individually identifying each one. You could bucket keys into ranges and track processing statistics per bucket.

Joris Basiglio

09/20/2024, 1:23 PM

Thanks for your response, I recently turned off

block-cache

on RocksDB due to a memory leak. After doing that, I noticed some performance degradation which lead me to think that we have hot key issue (but the block-cache feature was probably hiding the symptoms). I like your idea of aggregating those metrics in Flink I'll look more deeply into that. I've also noticed that Flink publishes

state.backend.rocksdb.metrics.block-cache-hit

which could actually be a good proxy for hot keys in my case. I might revert the block-cache disabling change and see how this metric look like for a while.

D. Draco O'Brien

09/20/2024, 1:28 PM

I see. Yes, that metric

Copy code

state.backend.rocksdb.metrics.block-cache-hit

could definitely provide some additional insights

Open in Slack

Previous Next