Hi all I wanted to understand a few things in Flink s Sessio Apache Flink #troubleshooting

Hi all, I wanted to understand a few things in Fli...

Guruguha Marur Sreenivasa

04/11/2023, 4:55 PM

Hi all, I wanted to understand a few things in Flink's Session clusters. 1. We observed that the JVM Metaspace keeps increasing as we deploy / cancel jobs on the cluster. Is this expected? Is there any documentation that explains this behavior? We tend to deploy new jobs almost every day and they are long running streaming pipelines. We want to capacity plan and make sure that enough memory is allocated on the job manager so it doesn't crash (it had happened before due to metaspace issue) 2. When we restart a particular pipeline, the checkpointed state is lost and we send duplicate data again as the

MapState

we use is empty on restart and it starts build up again. We are saving checkpoints on S3. Below is my checkpointing config:

Copy code

this.env.enableCheckpointing(30000, CheckpointingMode.EXACTLY_ONCE);
    this.env.setStateBackend(new HashMapStateBackend());
    final CheckpointConfig config = env.getCheckpointConfig();
    config.setExternalizedCheckpointCleanup(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
    config.setCheckpointStorage(checkPointBucket);
    config.setTolerableCheckpointFailureNumber(applicationConfiguration.getCheckpointFailureTolerance());

3. Are there ways to optimize the pipelines so there is less data transfer across task managers as streaming data progresses through to further stages in the pipeline?

Joao Boto

04/12/2023, 8:48 AM

For 1: If you are using JdbcDrivers this is kind of expected, see this to solve

Open in Slack

Previous Next