Apache Flink

I'm building a streaming processing application, that should store a lot of state data. Question is, how to better store states? Is it better to have one huge map state, with all data in it (by key), or having a lot of smaller-size map states (multiple states by key)?

We can’t really tell without knowing the kind of operations you perform on that data, how often you checkpoint, what kind of checkpoint you use, where the checkpoint is stored, etc. Generally, keyBy is used to shard (or split across task slots) incoming data - so, that could be a blind approach for scale.