How do I use the hdfs with flink docker? (using fl...
# troubleshooting
m
How do I use the hdfs with flink docker? (using flink session deployment mode on k8s) The documentation doesn’t help too much.. I tried to include hadoop jars in the /opt/flink/lib folder and it didn’t helped either, also tried to put them under /opt/flink/plugins/hdfs with no success. Any ideas?
Copy code
2023-05-31 10:49:46,504 ERROR org.apache.flink.runtime.security.SecurityUtils              [] - Error occur when instantiate security context with: org.apache.flink.runtime.security.contexts.HadoopSecurityContextFactory
java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.security.UserGroupInformation
    at org.apache.flink.runtime.security.contexts.HadoopSecurityContextFactory.createContext(HadoopSecurityContextFactory.java:60) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.security.SecurityUtils.installContext(SecurityUtils.java:91) [flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.security.SecurityUtils.install(SecurityUtils.java:58) [flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManagerProcessSecurely(TaskManagerRunner.java:526) [flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManagerProcessSecurely(TaskManagerRunner.java:510) [flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.main(TaskManagerRunner.java:468) [flink-dist-1.17.0.jar:1.17.0]
2023-05-31 10:49:46,784 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - Terminating TaskManagerRunner with exit code 1.
org.apache.flink.util.FlinkException: Failed to start the TaskManagerRunner.
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:488) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.lambda$runTaskManagerProcessSecurely$5(TaskManagerRunner.java:530) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManagerProcessSecurely(TaskManagerRunner.java:530) [flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManagerProcessSecurely(TaskManagerRunner.java:510) [flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.main(TaskManagerRunner.java:468) [flink-dist-1.17.0.jar:1.17.0]
Caused by: java.io.IOException: Could not create FileSystem for highly available storage path (<hdfs://hadoop-hadoop-hdfs-nn:9870/flink-checkpoints/k8-cluster-id>)
    at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:102) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:86) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createZooKeeperHaServices(HighAvailabilityServicesUtils.java:89) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:137) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.startTaskManagerRunnerServices(TaskManagerRunner.java:195) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.start(TaskManagerRunner.java:293) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:486) ~[flink-dist-1.17.0.jar:1.17.0]
    ... 5 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be load
    at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:543) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:409) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.core.fs.Path.getFileSystem(Path.java:274) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:99) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:86) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createZooKeeperHaServices(HighAvailabilityServicesUtils.java:89) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:137) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.startTaskManagerRunnerServices(TaskManagerRunner.java:195) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.start(TaskManagerRunner.java:293) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:486) ~[flink-dist-1.17.0.jar:1.17.0]
    ... 5 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Cannot support file system for 'hdfs' via Hadoop, because Hadoop is not in the classpath, or some classes are missing from the classpath.
    at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:189) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:526) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:409) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.core.fs.Path.getFileSystem(Path.java:274) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:99) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:86) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createZooKeeperHaServices(HighAvailabilityServicesUtils.java:89) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:137) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.startTaskManagerRunnerServices(TaskManagerRunner.java:195) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.start(TaskManagerRunner.java:293) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:486) ~[flink-dist-1.17.0.jar:1.17.0]
    ... 5 more
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.flink.runtime.util.HadoopUtils
    at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:84) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:526) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:409) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.core.fs.Path.getFileSystem(Path.java:274) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:99) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:86) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createZooKeeperHaServices(HighAvailabilityServicesUtils.java:89) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:137) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.startTaskManagerRunnerServices(TaskManagerRunner.java:195) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.start(TaskManagerRunner.java:293) ~[flink-dist-1.17.0.jar:1.17.0]
    at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:486) ~[flink-dist-1.17.0.jar:1.17.0]
    ... 5 more
g
you can create flink image with hadoop
m
@Gaurav Miglani you mean install hadoop from the dockerfile and provide _HADOOP_CLASSPATH_ env variable?
g
yes
m
@Gaurav Miglani I believe providing jars to the same env should do the job, do you know which jars do I need?