hey, dose any one now if flink autoscaler work wit...
# troubleshooting
r
hey, dose any one now if flink autoscaler work with the Partitioner interface ? i got a case when there is scaling down event , and the numPartitions i am getting is equal the the default and not the number on the scale down parllllisim , so i am getting out of bound index down the road
Copy code
@Public
@FunctionalInterface
public interface Partitioner<K> extends java.io.Serializable, Function {

    /**
     * Computes the partition for the given key.
     *
     * @param key The key.
     * @param numPartitions The number of partitions to partition into.
     * @return The partition index.
     */
    int partition(K key, int numPartitions);
}
đŸ‘€ 1
d
The autoscaler does not directly modify or interact with the Partitioner interface.
The autoscaler does not guarantee an immediate or atomic update of all parallelism parameters across all running tasks simultaneously. There can be a brief period where different parts of the system see different views of the parallelism, leading to potential inconsistencies or errors if your partitioner logic assumes a static parallelism.
There are some things you can do to reduce improve graceful scaling. like draining tasks from TaskManagers before removing them.
r
the problem i got some custom logic and i need to know the max partitions…
d
You may also want to adjust restart strategies and delay times to allow the system to stabilize before a scaling events before resuming operations.
Keep in mind that the actual numPartitions in your partitioner should align with the parallelism of the operator where the partitioning is happening.
Are you looking for the parallelism of a specific operator or the whole job?
r
but where can i get this info when i am implemeting the interdface
Screenshot 2024-08-11 at 12.58.07 PM.jpg
its a custom parititin , but i need to know what is the max in order to to the right job
d
in order to what?
r
to tell the operaotr how to split the data
d
Ok, well this type of thing would typically be configured when setting up the job graph using StreamedExecutionEnvironment and then application logic designed accordingly. That’s the usual approach.
but if you need something more dynamic you could look at other coordination mechanism instead of partitioner interface such as Broadcast Streams to communicate the changes in parallelism.
It’s kind of looking like you need broadcast streams to me assuming that you actually need to manipulate the partitioning strategy. It’s a complex undertaking and if you can avoid it you should. But if you need to do it I think broadcast streams gives you the most flexibility.
Now I dont think you can have the partitioner interface itself listen to the broadcasts, but what you can do instead is design your data flow so that a control operator, which listens to the broadcast stream, updates a shared state (e.g., a BroadcastState). Other operators, which perform the actual partitioning, can then query this shared state to dynamically adjust their behavior.
The control operator that updates the shared state can be either a custom operator or a ProcessFunction. Up to you.
The reason the partitioner interface is not suitable is because it’s designed to be a pure function that decides how to assign records to partitions based on the provided key & numPartitions. It operates within the confines of a single task instance and is not designed to factor in a larger context. That’s its intended design.