Hi team, we have a Flink session cluster running 1...
# troubleshooting
g
Hi team, we have a Flink session cluster running 1.18 with Java 17. We're using an ARM based image to run on AWS graviton instances. When I try to submit a job to the cluster, I get the following error:
Copy code
org.apache.flink.util.FlinkException: Failed to execute job 'flink-poc'.
        at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2253)
        at org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:189)
        at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2219)
        ...
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
        ...
Caused by: org.apache.flink.runtime.rest.util.RestClientException: [org.apache.flink.runtime.rest.handler.RestHandlerException: Could not upload job files.
        at org.apache.flink.runtime.rest.handler.job.JobSubmitHandler.lambda$uploadJobGraphFiles$4(JobSubmitHandler.java:201)
        at java.base/java.util.concurrent.CompletableFuture.biApply(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture$BiApply.tryFire(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.apache.flink.util.FlinkException: Could not upload job files.
        at org.apache.flink.runtime.client.ClientUtils.uploadJobGraphFiles(ClientUtils.java:86)
        at org.apache.flink.runtime.rest.handler.job.JobSubmitHandler.lambda$uploadJobGraphFiles$4(JobSubmitHandler.java:195)
        ... 10 more
Caused by: java.io.IOException: PUT operation failed: Could not transfer error message
        at org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:357)
        at org.apache.flink.runtime.blob.BlobClient.uploadFile(BlobClient.java:406)
        at org.apache.flink.runtime.client.ClientUtils.uploadUserJars(ClientUtils.java:113)
        at org.apache.flink.runtime.client.ClientUtils.uploadAndSetUserJars(ClientUtils.java:105)
        at org.apache.flink.runtime.client.ClientUtils.uploadJobGraphFiles(ClientUtils.java:83)
        ... 11 more
Caused by: java.io.IOException: Could not transfer error message
        at org.apache.flink.runtime.blob.BlobUtils.readExceptionFromStream(BlobUtils.java:348)
        at org.apache.flink.runtime.blob.BlobOutputStream.receiveAndCheckPutResponse(BlobOutputStream.java:161)
        at org.apache.flink.runtime.blob.BlobOutputStream.finish(BlobOutputStream.java:107)
        at org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:354)
        ... 15 more
Caused by: java.lang.ClassNotFoundException: com.amazonaws.services.s3.model.AmazonS3Exception
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(Unknown Source)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(Unknown Source)
        at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
        at java.base/java.lang.Class.forName0(Native Method)
        at java.base/java.lang.Class.forName(Unknown Source)
        at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:78)
        ...
]
        at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:646)
        at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$6(RestClient.java:626)
        at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150)
        ... 4 more
Can someone help with this issue? We have another cluster running flink 1.18 with Java 11 and it submits fine on that cluster.
d
check that version of the AWS SDK you are using is compatible with Java 17
some older versions might not be compatible. Check what version of AWS SDK you have and release notes
Check classpath configuration that its not missing in Java 17 envr. You may need to explicitly include the AWS SDK Jars in your flick applications dependencies or the Flink lib directory.
Since your using arm-based image make sure that the AWS SDK your using is compiled for ARM architecture. Some libraries might only provide x86/x65 binaries which can lead to runtime issues. Or it might be a separate library you need for arm that may or may not be available.
Update dependencies
Copy code
<dependency>
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>s3</artifactId>
    <version>COMPATIBLE_VERSION</version>
</dependency>
it should be a version known to support Java 17 & ARM (compiled for ARM)
g
Thanks for the pointers! I don't have an s3 dependency in my pom though.
d
its just an example
replace with the actual library and artifact id in question
g
Right - the exception points to
com.amazonaws.services.s3.model.AmazonS3Exception
- which I assume is from ``com.amazonaws.services.s3`` artifact. In fact, I don't have any amazon services related libraries in my pom.xml at all
d
you can also try putting the relevant library in the libs directory and restarting
πŸ‘€ 1
Copy code
Caused by: java.lang.ClassNotFoundException: com.amazonaws.services.s3.model.AmazonS3Exception
three possibilities 1) the library jar with this class is not being loaded at runtime. 2) the library itself is not compatible with your environment or Java 17
so one thing to quickly check if if its actually on the classpath right? Its actually being loaded.
Thats kind of starting point or reading the release notes to determine if its compatible or not
g
Makes sense, trying these out.
The only library I see is
flink-s3-fs-presto-1.18.0.jar
which is available on the cluster in the plugins in directory - which I think is already part of the classpath when we submit jobs.