Hi I am struggling to use S3 as FileSink locally in python I Apache Flink #troubleshooting

Hi, I am struggling to use S3 as FileSink locally ...

piby 180

04/09/2023, 10:58 PM

Hi, I am struggling to use S3 as FileSink locally (in python). I have added the jar list in my code as follows:

Copy code

jar_list = """
    file:///home/ubuntu/environment/flink/lib/flink-sql-connector-kafka-1.17.0.jar;
    file:///home/ubuntu/environment/flink/lib/flink-sql-parquet-1.17.0.jar;
    file:///home/ubuntu/environment/flink/lib/flink-connector-files-1.17.0.jar;
    file:///home/ubuntu/environment/flink/lib/flink-s3-fs-hadoop-1.17.0.jar;
    file:///home/ubuntu/environment/flink/lib/hadoop-mapreduce-client-core-3.3.5.jar
"""

t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode())
t_env.get_config().set("parallelism.default", "1")
t_env.get_config().set("pipeline.jars", jar_list)
t_env.get_config().set("pipeline.classpaths", jar_list)
t_env.get_config().set("fs.s3a.aws.credentials.provider", "com.amazonaws.auth.profile.ProfileCredentialsProvider")
t_env.get_config().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")

I have also read somwhere that flink-s3-fs-hadoop jar needs to be placed in plugins dir in a separate folder and not in lib dir. So I have also tried setting up plugins dir but it didn't work

Copy code

os.environ["FLINK_PLUGINS_DIR"] = "/home/ubuntu/environment/flink/plugins"

Here is the error I get

Copy code

Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 's3a'. The scheme is directly supported by Flink through the following plugin(s): flink-s3-fs-hadoop. Please ensure that each plugin resides within its own subfolder within the plugins directory. See <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/plugins/> for more information. If you want to use a Hadoop file system for that scheme, please add the scheme to the configuration fs.allowed-fallback-filesystems. For a full list of supported file systems, please see <https://nightlies.apache.org/flink/flink-docs-stable/ops/filesystems/>.

Max Dubinin

04/10/2023, 7:23 AM

you should have the flink-s3-fs-hadoop plugin under

/libexec/opt

, copy it to

/libexec/plugins

under a dedicated folder and restart the cluster

➕ 1

Max Dubinin

04/10/2023, 7:24 AM

it worked for me without any configuration (using s3 instead s3a though)

3 Views

Open in Slack

Previous Next