Nishant Goenka
09/13/2023, 12:17 PM2023-09-13 10:59:00,320 INFO org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] - Job 3ca752ff5e242a8ff553d49e2579a146 is submitted.
2023-09-13 10:59:00,320 INFO org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] - Submitting Job with JobId=3ca752ff5e242a8ff553d49e2579a146.
2023-09-13 10:59:02,115 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Recovered 0 pods from previous attempts, current attempt id is 1.
2023-09-13 10:59:02,115 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Recovered 0 workers from previous attempt.
2023-09-13 10:59:02,118 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Received JobGraph submission 'marketing-campaign-external-flink-awsqa' (3ca752ff5e242a8ff553d49e2579a146).
2023-09-13 10:59:02,118 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Submitting job 'marketing-campaign-external-flink-awsqa' (3ca752ff5e242a8ff553d49e2579a146).
2023-09-13 10:59:02,127 INFO org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner [] - JobMasterServiceLeadershipRunner for job 3ca752ff5e242a8ff553d49e2579a146 was granted leadership with leader id 00000000-0000-0000-0000-000000000000. Creating new JobMasterServiceProcess.
2023-09-13 10:59:02,134 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at <akka://flink/user/rpc/jobmanager_2> .
2023-09-13 10:59:02,140 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Initializing job 'marketing-campaign-external-flink-awsqa' (3ca752ff5e242a8ff553d49e2579a146).
2023-09-13 10:59:02,222 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Using restart back off time strategy FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=2147483647, backoffTimeMS=1000) for marketing-campaign-external-flink-awsqa (3ca752ff5e242a8ff553d49e2579a146).
2023-09-13 10:59:02,245 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Created execution graph 9226eef4fce52095502263706ce44307 for job 3ca752ff5e242a8ff553d49e2579a146.
2023-09-13 10:59:02,253 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Running initialization on master for job marketing-campaign-external-flink-awsqa (3ca752ff5e242a8ff553d49e2579a146).
2023-09-13 10:59:02,254 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Successfully ran initialization on master in 0 ms.
2023-09-13 10:59:02,433 INFO org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - Built 1 new pipelined regions in 0 ms, total 1 pipelined regions currently.
2023-09-13 10:59:02,436 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Using job/cluster config to configure application-defined state backend: EmbeddedRocksDBStateBackend{, localRocksDbDirectories=[/data/flink/state], enableIncrementalCheckpointing=UNDEFINED, numberOfTransferThreads=-1, writeBatchSize=-1}
2023-09-13 10:59:02,438 INFO org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend [] - Using predefined options: DEFAULT.
2023-09-13 10:59:02,439 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Using application-defined state backend: EmbeddedRocksDBStateBackend{, localRocksDbDirectories=[/data/flink/state], enableIncrementalCheckpointing=FALSE, numberOfTransferThreads=4, writeBatchSize=2097152}
2023-09-13 10:59:02,439 INFO org.apache.flink.runtime.state.StateBackendLoader [] - State backend loader loads the state backend as EmbeddedRocksDBStateBackend
2023-09-13 10:59:02,440 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Using job/cluster config to configure application-defined checkpoint storage: org.apache.flink.runtime.state.storage.FileSystemCheckpointStorage@192e05a9
2023-09-13 10:59:02,920 WARN org.apache.hadoop.metrics2.impl.MetricsConfig [] - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
2023-09-13 10:59:02,947 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - Scheduled Metric snapshot period at 10 second(s).
2023-09-13 10:59:02,947 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - s3a-file-system metrics system started
2023-09-13 10:59:03,028 WARN org.apache.hadoop.util.NativeCodeLoader [] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2023-09-13 10:59:06,151 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - No checkpoint found during restore.
2023-09-13 10:59:06,152 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Starting job 3ca752ff5e242a8ff553d49e2579a146 from savepoint <s3://mkt-offline-store-qa/flink-savepointing/marketing-campaign-external-flink/savepoint-3ca752-d7b49e220c9e> ()
2023-09-13 10:59:06,221 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Job 3ca752ff5e242a8ff553d49e2579a146 reached terminal state FAILED.
org.apache.flink.runtime.client.JobInitializationException: Could not start the JobMaster.
at org.apache.flink.runtime.jobmaster.DefaultJobMasterServiceProcess.lambda$new$0(DefaultJobMasterServiceProcess.java:97)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.util.concurrent.CompletionException: java.lang.RuntimeException: java.io.FileNotFoundException: Cannot find checkpoint or savepoint file/directory '<s3://mkt-offline-store-qa/flink-savepointing/marketing-campaign-external-flink/savepoint-3ca752-d7b49e220c9e>' on file system 's3'.
at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source)
... 4 more
Gyula Fóra
09/13/2023, 12:21 PMCannot find checkpoint or savepoint file/directory '<s3://mkt-offline-store-qa/flink-savepointing/marketing-campaign-external-flink/savepoint-3ca752-d7b49e220c9e>' on file system 's3'.
Gyula Fóra
09/13/2023, 12:22 PMNishant Goenka
09/13/2023, 12:26 PMGyula Fóra
09/13/2023, 12:27 PMGyula Fóra
09/13/2023, 12:28 PMGyula Fóra
09/13/2023, 12:28 PMNishant Goenka
09/13/2023, 12:29 PMNishant Goenka
09/13/2023, 12:35 PM2023-09-13 09:47:53,125 INFO org.apache.flink.fs.s3.common.writer.S3Committer [] - Committing flink-savepointing/marketing-campaign-external-flink/savepoint-3ca752-d7b49e220c9e/_metadata with MPU ID jtSeD5KVKTXhRVbIf1OhMDkCA8bnumK5trQL_1FB2dkDThDEYaiyBCSAEZVz1W36CuUgNuYlbZuT6urPdyQRKyiukShxFCtGGg5APgZzU5Zjx23937BqXT.ho39WtUzvHulLutyQk.xmBdETNU5z9e3Tt683l_NKrvJaMLWDzVg-
Gyula Fóra
09/13/2023, 1:22 PMGyula Fóra
09/13/2023, 1:22 PMNishant Goenka
09/13/2023, 1:23 PMNishant Goenka
09/13/2023, 4:02 PMCaused by: java.io.FileNotFoundException: Cannot find checkpoint or savepoint file/directory '<s3://mkt-offline-store-qa/flink-savepointing/marketing-campaign-external-flink/savepoint-7d3aa2-40c908beb3c4>' on file system 's3'.
Nishant Goenka
09/13/2023, 4:07 PMspec:
restartNonce: 123
flinkVersion: v1_16
flinkConfiguration:
kubernetes.operator.periodic.savepoint.interval: 10m
kubernetes.operator.savepoint.history.max.count: "3"
job:
jarURI: local:///opt/flink/usrlib/flink-job.jar
parallelism: 10
upgradeMode: savepoint
savepointTriggerNonce: 123
Abhishek Joshi
09/14/2023, 4:58 AMGyula Fóra
09/14/2023, 7:08 AM