Hello wave I m trying to deploy my Flink application in a Ku Apache Flink #troubleshooting

Hello! :wave: I’m trying to deploy my Flink applic...

Ricardo Correia

03/31/2023, 5:00 PM

Hello! 👋 I’m trying to deploy my Flink application in a Kubernetes cluster via the Flink Kubernetes operator. The problem that I’m facing now is in getting the JAR file from an S3 bucket. I’ve extended the image of the Kubernetes operator so that I add the S3 plugins:

Copy code

FROM apache/flink-kubernetes-operator:1.4.0

ENV FLINK_PLUGINS_DIR=/opt/flink/plugins

COPY flink-s3-fs-hadoop-1.15.4.jar $FLINK_PLUGINS_DIR/s3-fs-hadoop/
COPY flink-s3-fs-presto-1.15.4.jar $FLINK_PLUGINS_DIR/s3-fs-presto/

And I can confirm that the pods of the operator contain these plugins. The problem is that when I try to reference the JAR file that is in the S3 bucket:

Copy code

job:
    jarURI: '<s3://mybucket/flink-application-15.jar>'

In standalone deployment I get the following error in the JM:

Copy code

Caused by: java.net.MalformedURLException: unknown protocol: s3

In a session job deployment I get the following error in the deployment:

Copy code

The LocalStreamEnvironment cannot be used when submitting a program through a client, or running in a TestEnvironment context.

Has anyone faced a similar issue or knows why I’m getting this error? 🙏

Martijn Visser

03/31/2023, 5:19 PM

Is that the same issue as described here? https://stackoverflow.com/questions/75338015/getting-jar-file-from-s3-using-flink-kubernetes-operator

Ricardo Correia

03/31/2023, 7:49 PM

Yes! Actually that’s I question I posted initially when started working with the operator. Initially I chose to bypass this problem by downloading the jar file and putting in the volume. But the workflow has changed since then and I’m more interested in having the jar file served directly from the bucket instead

Gyula Fóra

04/01/2023, 3:43 AM

The podtemplate example shows how to download jars in an init container for Application deployments https://github.com/apache/flink-kubernetes-operator/blob/72926b8222e8b0b61c72f93afb869a8639a224e7/examples/pod-template.yaml#L75

Gyula Fóra

04/01/2023, 3:44 AM

For the session jobs we might have to change the logic to use the Flink Filesystem logic to access the jars , maybe that's your problem

Gyula Fóra

04/01/2023, 3:44 AM

You could try making this improvement in the operator :)

Ricardo Correia

04/03/2023, 8:36 AM

I’ve been able to do it with downloading the jar file into the filesystem and run it that way. But now what I’m trying to achieve is having an s3 link on the JarURI. Is that possible?

Gyula Fóra

04/03/2023, 10:02 AM

It would probably require some Flink/operator change

Ricardo Correia

04/04/2023, 10:27 AM

Hum… Strange 🤔 Since with the same plugins enabled in JM it’s possible to get the checkpoints, savepoints and high-availability from s3, I assumed that the process would be the same for JAR files

Gyula Fóra

04/04/2023, 10:49 AM

Any logic that uses the Flink’s FilesSystem abstractions are covered by the plugins. But I am afraid that accessing the jars etc uses simple Java File tooling at the moment

Ricardo Correia

04/10/2023, 3:38 PM

Hum… okay. That’s unfortunate Thank you for your help 🙇

15 Views

Open in Slack

Previous Next