Hi team, any suggestion to recover from this error...
# troubleshooting
j
Hi team, any suggestion to recover from this error? We recently removed a class that defines our state earlier, now during upgrade we see this error because it still tries to look up the old class from the previous checkpoint. We tried with
execution.savepoint.ignore-unclaimed-state: true
but no luck. We are on Flink 1.15. Thanks!
Copy code
2023-04-12 21:04:57,615 WARN  org.apache.flink.streaming.api.operators.BackendRestorerProcedure [] - Exception while restoring operator state backend for CoBroadcastWithNonKeyedOperator_fda18ba392f9dc50769c0bd716347531_(1/8) from alternative (1/1), will
retry while more alternatives are available.
org.apache.flink.runtime.state.BackendBuildingException: Failed when trying to restore operator state backend
        at org.apache.flink.runtime.state.DefaultOperatorStateBackendBuilder.build(DefaultOperatorStateBackendBuilder.java:83) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createOperatorStateBackend(HashMapStateBackend.java:160) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$operatorStateBackend$0(StreamTaskStateInitializerImpl.java:277) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.operatorStateBackend(StreamTaskStateInitializerImpl.java:286) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:174) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:265) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:703) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:679) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:646) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) ~[flink-dist-1.15.3.jar:1.15.3]
        at java.lang.Thread.run(Unknown Source) ~[?:?]
Copy code
Caused by: java.io.IOException: Could not find class 'xxxCollection' in classpath.
        at org.apache.flink.util.InstantiationUtil.resolveClassByName(InstantiationUtil.java:775) ~[xxx.jar:?]
        at org.apache.flink.util.InstantiationUtil.resolveClassByName(InstantiationUtil.java:750) ~[xxx.jar:?]
        at org.apache.flink.api.java.typeutils.runtime.PojoSerializerSnapshotData.readSnapshotData(PojoSerializerSnapshotData.java:214) ~[xxx.jar:?]
        at org.apache.flink.api.java.typeutils.runtime.PojoSerializerSnapshotData.createFrom(PojoSerializerSnapshotData.java:135) ~[xxx.jar:?]
        at org.apache.flink.api.java.typeutils.runtime.PojoSerializerSnapshot.readSnapshot(PojoSerializerSnapshot.java:129) ~[xxx.jar:?]
        at org.apache.flink.api.common.typeutils.TypeSerializerSnapshot.readVersionedSnapshot(TypeSerializerSnapshot.java:175) ~[xxx.jar:?]
        at org.apache.flink.api.common.typeutils.TypeSerializerSnapshotSerializationUtil$TypeSerializerSnapshotSerializationProxy.deserializeV2(TypeSerializerSnapshotSerializationUtil.java:174) ~[xxx.jar:?]
        at org.apache.flink.api.common.typeutils.TypeSerializerSnapshotSerializationUtil$TypeSerializerSnapshotSerializationProxy.read(TypeSerializerSnapshotSerializationUtil.java:145) ~[xxx.jar:?]
        at org.apache.flink.api.common.typeutils.TypeSerializerSnapshotSerializationUtil.readSerializerSnapshot(TypeSerializerSnapshotSerializationUtil.java:77) ~[xxx.jar:?]
        at org.apache.flink.runtime.state.metainfo.StateMetaInfoSnapshotReadersWriters$CurrentReaderImpl.readStateMetaInfoSnapshot(StateMetaInfoSnapshotReadersWriters.java:237) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.runtime.state.OperatorBackendSerializationProxy.read(OperatorBackendSerializationProxy.java:134) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.runtime.state.OperatorStateRestoreOperation.restore(OperatorStateRestoreOperation.java:82) ~[flink-dist-1.15.3.jar:1.15.3]
        at org.apache.flink.runtime.state.DefaultOperatorStateBackendBuilder.build(DefaultOperatorStateBackendBuilder.java:80) ~[flink-dist-1.15.3.jar:1.15.3]
        ... 17 more
Caused by: java.lang.ClassNotFoundException: xxxCollection
        at java.net.URLClassLoader.findClass(Unknown Source) ~[?:?]
        at java.lang.ClassLoader.loadClass(Unknown Source) ~[?:?]
        at org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:68) ~[xxx.jar:?]
        at org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:52) ~[xxx.jar:?]
        at java.lang.ClassLoader.loadClass(Unknown Source) ~[?:?]
        at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:172) ~[flink-dist-1.15.3.jar:1.15.3]
        at java.lang.Class.forName0(Native Method) ~[?:?]
        at java.lang.Class.forName(Unknown Source) ~[?:?]
        at org.apache.flink.util.InstantiationUtil.resolveClassByName(InstantiationUtil.java:773) ~[xxx.jar:?]
        at org.apache.flink.util.InstantiationUtil.resolveClassByName(InstantiationUtil.java:750) ~[xxx.jar:?]
        at org.apache.flink.api.java.typeutils.runtime.PojoSerializerSnapshotData.readSnapshotData(PojoSerializerSnapshotData.java:214) ~[xxx.jar:?]
        at org.apache.flink.api.java.typeutils.runtime.PojoSerializerSnapshotData.createFrom(PojoSerializerSnapshotData.java:135) ~[xxx.jar:?]
        at org.apache.flink.api.java.typeutils.runtime.PojoSerializerSnapshot.readSnapshot(PojoSerializerSnapshot.java:129) ~[xxx.jar:?]
        at org.apache.flink.api.common.typeutils.TypeSerializerSnapshot.readVersionedSnapshot(TypeSerializerSnapshot.java:175) ~[xxx.jar:?]
        at org.apache.flink.api.common.typeutils.TypeSerializerSnapshotSerializationUtil$TypeSerializerSnapshotSerializationProxy.deserializeV2(TypeSerializerSnapshotSerializationUtil.java:174) ~[xxx.jar:?]
        at org.apache.flink.api.common.typeutils.TypeSerializerSnapshotSerializationUtil$TypeSerializerSnapshotSerializationProxy.read(TypeSerializerSnapshotSerializationUtil.java:145) ~[xxx.jar:?]
        at org.apache.flink.api.common.typeutils.TypeSerializerSnapshotSerializationUtil.readSerializerSnapshot(TypeSerializerSnapshotSerializationUtil.java:77) ~[xxx.jar:?]