Hi, I am struggling to use S3 as FileSink locally ...
# troubleshooting
p
Hi, I am struggling to use S3 as FileSink locally (in python). I have added the jar list in my code as follows:
Copy code
jar_list = """
    file:///home/ubuntu/environment/flink/lib/flink-sql-connector-kafka-1.17.0.jar;
    file:///home/ubuntu/environment/flink/lib/flink-sql-parquet-1.17.0.jar;
    file:///home/ubuntu/environment/flink/lib/flink-connector-files-1.17.0.jar;
    file:///home/ubuntu/environment/flink/lib/flink-s3-fs-hadoop-1.17.0.jar;
    file:///home/ubuntu/environment/flink/lib/hadoop-mapreduce-client-core-3.3.5.jar
"""

t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode())
t_env.get_config().set("parallelism.default", "1")
t_env.get_config().set("pipeline.jars", jar_list)
t_env.get_config().set("pipeline.classpaths", jar_list)
t_env.get_config().set("fs.s3a.aws.credentials.provider", "com.amazonaws.auth.profile.ProfileCredentialsProvider")
t_env.get_config().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
I have also read somwhere that flink-s3-fs-hadoop jar needs to be placed in plugins dir in a separate folder and not in lib dir. So I have also tried setting up plugins dir but it didn't work
Copy code
os.environ["FLINK_PLUGINS_DIR"] = "/home/ubuntu/environment/flink/plugins"
Here is the error I get
Copy code
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 's3a'. The scheme is directly supported by Flink through the following plugin(s): flink-s3-fs-hadoop. Please ensure that each plugin resides within its own subfolder within the plugins directory. See <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/plugins/> for more information. If you want to use a Hadoop file system for that scheme, please add the scheme to the configuration fs.allowed-fallback-filesystems. For a full list of supported file systems, please see <https://nightlies.apache.org/flink/flink-docs-stable/ops/filesystems/>.
m
you should have the flink-s3-fs-hadoop plugin under
/libexec/opt
, copy it to
/libexec/plugins
under a dedicated folder and restart the cluster
1
it worked for me without any configuration (using s3 instead s3a though)