Kimmo Sääskilahti
11/02/2023, 7:03 AMMapState
in incremental window aggregation? I tried using HashMap
as accumulator but the performance seems really bad.Ken Krugler
11/09/2023, 10:01 PMValueState
. So what you’re running into is (probably) the slowness of Kryo being used to serialize/deserialize the map, especially if the map gets big.
But note that an aggregator is supposed to aggregate (reduce) things, which is why using a map isn’t common. Perhaps you’d get better performance by not using an aggregator, in which case Flink will store elements in ListState, which would be more performant as the list size increases.Kimmo Sääskilahti
11/10/2023, 6:38 AM[a, a, b, a, c, c, a]
, in which case the correct result would be 3 ([a, b, c]
). I tried to use HashMap (and Scala Set) to keep the set of seen values in the state so that they're not counted twice, but as mentioned, the performance was not great.
The window size is 20 minutes and there are a lot of events, so using ListState
seems a little dangerous. Might work well though, I haven't tested it!Ken Krugler
11/10/2023, 4:32 PM