What is the equivalent of `MapState` in <increment...
# random
k
What is the equivalent of
MapState
in incremental window aggregation? I tried using
HashMap
as accumulator but the performance seems really bad.
k
Flink always stores the accumulator in
ValueState
. So what you’re running into is (probably) the slowness of Kryo being used to serialize/deserialize the map, especially if the map gets big. But note that an aggregator is supposed to aggregate (reduce) things, which is why using a map isn’t common. Perhaps you’d get better performance by not using an aggregator, in which case Flink will store elements in ListState, which would be more performant as the list size increases.
k
Thanks so much for the reply Ken, that's very useful information! My use case is to count the number of unique values in the window. So the list of events is something like
[a, a, b, a, c, c, a]
, in which case the correct result would be 3 (
[a, b, c]
). I tried to use HashMap (and Scala Set) to keep the set of seen values in the state so that they're not counted twice, but as mentioned, the performance was not great. The window size is 20 minutes and there are a lot of events, so using
ListState
seems a little dangerous. Might work well though, I haven't tested it!
k
You could create a custom serializer for a set, which would help with performance.
👍 1