Hey folks Currently seeing this kind of an error ```Sink Com Apache Flink #troubleshooting

Hey folks, Currently seeing this kind of an error...

Jalil Alchy

04/06/2023, 8:49 PM

Hey folks, Currently seeing this kind of an error:

Copy code

Sink: Committer (1/1)#0 (0c5abffd2f065e1edef3036b20ec42a5) switched from RUNNING to FAILED with failure cause: java.nio.file.AccessDeniedException: <path>/part-8ad796e1-3c85-4b54-9eed-eaaf06621185-0.snappy.parquet: initiate MultiPartUpload on <path>/part-8ad796e1-3c85-4b54-9eed-eaaf06621185-0.snappy.parquet: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: KC4SFHVXBGJC6XFW; S3 Extended Request ID: VQXppEZpXzKyDoc7u32AHkMAr608cMLxLOh4QIWwZVWIg0JBYgkMTxAT4M6+xmyVrzmMmnyMcUucHlBZmFBzSs/GrYJ+LmmlADwr4wjTjDs=; Proxy: null), S3 Extended Request ID: VQXppEZpXzKyDoc7u32AHkMAr608cMLxLOh4QIWwZVWIg0JBYgkMTxAT4M6+xmyVrzmMmnyMcUucHlBZmFBzSs/GrYJ+LmmlADwr4wjTjDs=:AccessDenied

My sink is configured like so:

Copy code

return DeltaSink.forRowData(
                        new Path("s3a://<bucket/<path>"),
                        new Configuration() {
                            {
                                set(
                                        "spark.hadoop.fs.s3a.aws.credentials.provider",
                                        "com.amazonaws.auth.DefaultAWSCredentialsProviderChain");
                            }
                        },
                        FULL_SCHEMA_ROW_TYPE)
                .withMergeSchema(true)
                .build();

Writes to my local work, but writes to S3 are failing. The credentials on the machine are admin for the account. Any thoughts?

Jeremy Ber

04/06/2023, 8:54 PM

is this s3 bucket in a different region than your app’s default credential region by any chance?

Jalil Alchy

04/06/2023, 8:54 PM

It should not be, let me check.

Jalil Alchy

04/06/2023, 8:55 PM

Copy code

$ $AWS_REGION
us-east-1: command not found

And the bucket is in us-east-1

Jeremy Ber

04/06/2023, 8:55 PM

and you’re just running flink off your local machine, or is this on a hosted environment?

Jalil Alchy

04/06/2023, 8:56 PM

This is running out of an EC2 instance of mine

Jalil Alchy

04/06/2023, 8:56 PM

I am able to do things like upload my jar file to that same bucket from the instance.

Jeremy Ber

04/06/2023, 8:56 PM

in a VPC by any chance?

Jalil Alchy

04/06/2023, 8:56 PM

The EC2 instance is in a VPC, but it is able to access S3 (it seems)

Jalil Alchy

04/06/2023, 8:57 PM

(as well as my data sources and our confluent sink).

Jeremy Ber

04/06/2023, 8:58 PM

i wonder if its just the permissions--i see specifically it calls out MultiPartUpload--do you have permissions to do so?

Jalil Alchy

04/06/2023, 9:01 PM

The credentials configured in ~/.aws is for that account. Bucket owner (the account) has List, Write ACLs. The role I am authing as has AdministratorAccess

Copy code

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "*",
      "Resource": "*"
    }
  ]
}

Jalil Alchy

04/06/2023, 9:02 PM

So I should have permissions is I guess my thought.

Jalil Alchy

04/06/2023, 9:04 PM

Flink, running locally, will ultimately lean on the config in my ~/.aws folder right?

Jalil Alchy

04/06/2023, 9:05 PM

It seems to for the AWS Clients I spin up for things like secret manager.

Jeremy Ber

04/06/2023, 9:05 PM

it depends on the connector--i know for FileSink you need to specify AWS credentials in the flink config somewhere

Jeremy Ber

04/06/2023, 9:05 PM

https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/s3/#configure-access-credentials

Jeremy Ber

04/06/2023, 9:06 PM

might be worthwhile also going through this checklist: https://repost.aws/knowledge-center/emr-s3-403-access-denied

Jalil Alchy

04/06/2023, 9:14 PM

So I checked VPC Endpoint permissions, that all seems sorted, when I run AWS CLI commands that seems sorted too. I wonder if the issue is that the account the EC2 instance is in is different than the S3 Bucket. This shouldn't be an issue since it is supposed to use the DefaultAWSCreds which should ultimately use the saml creds I have on the host, but maybe something funky is going on.

Jeremy Ber

04/06/2023, 9:17 PM

Yeah if you can test with a bucket in the same account that might be a quick way to prove

Jalil Alchy

04/06/2023, 9:24 PM

Not as much of an option unfortunately What I did do was apply this policy to the bucket:

Copy code

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Dev Desk Access",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<other_aws_account>:root"
      },
      "Action": "s3:*",
      "Resource": "arn:aws:s3:::<my_bucket>/*"
    }
  ]
}

Jalil Alchy

04/06/2023, 9:28 PM

But am still seeing the same issue.

Krzysztof Chmielewski

04/06/2023, 9:35 PM

@Jalil Alchy You are using Environment Variables -

AWS_ACCESS_KEY_ID

and

AWS_SECRET_ACCESS_KEY

right?

Jalil Alchy

04/06/2023, 9:37 PM

No I have tried to set the

spark.hadoop.fs.s3a.aws.credentials.provider"

instead so that it can pick up off of my IAM creds using the DefaultAWSCredentialsProviderChain

Jalil Alchy

04/06/2023, 9:37 PM

If I spin up an

AmazonS3

client and write a file, that works directly.

Krzysztof Chmielewski

04/06/2023, 9:37 PM

isnt DefaultAWSCredentialsProviderChain AWS credentials provider chain that looks for credentials in this order: • Environment Variables -

AWS_ACCESS_KEY_ID

and

AWS_SECRET_ACCESS_KEY

(RECOMMENDED since they are recognized by all the AWS SDKs and CLI except for .NET), or

AWS_ACCESS_KEY

and

AWS_SECRET_KEY

(only recognized by Java SDK) ?

Jalil Alchy

04/06/2023, 9:39 PM

It also looks in your

~/.aws/

config and can use the env value

AWS_PROFILE

to pick a profile from that config.

👍 1

Jalil Alchy

04/06/2023, 9:39 PM

At least that is how all my AWS SDK clients I am using are configured (and working)

Krzysztof Chmielewski

04/06/2023, 9:39 PM

i have a demo app on my local machine that writes data to S3 using DeltaSink, I was using ENV variables though. No issues there I have not tried

AWS_PROFILE

Krzysztof Chmielewski

04/06/2023, 9:40 PM

Can try tommorrow maybe, since Im runnign tests for SQL support for Delta Connector anyways https://github.com/delta-io/connectors/issues/238

Jalil Alchy

04/06/2023, 9:40 PM

I can try using ENV Variables, but end state needs to be able to write from a deployed application using IAM.

👍 1

Jalil Alchy

04/06/2023, 9:51 PM

Even using the env variables, same issue.

Jalil Alchy

04/06/2023, 9:53 PM

Something seems very wrong. Mind sharing with me the code you used to create the delta sink? Just want to make sure there isn't a major discrepancy.

Jalil Alchy

04/06/2023, 9:55 PM

Could this be caused by anything related to shading?

Krzysztof Chmielewski

04/07/2023, 12:09 PM

@Jalil Alchy I will try to find some time to take a look. Could you tell me few things: 1. what is delta-connector version you are using 2. what is flink version you are using 3. Do you run Flink from docker/k8s on EC2 or "standalone" app? I run it using connector 0.6.0 on Flink 1.15.3 and 1.16.1 regarding my setup, I was running this on local docker, from docker-compose -> writes to S3 docker-compose setup both for Task and Job managers:

Copy code

environment:
    - AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
    - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}

The DeltaSink setup is very simple:

Copy code

public static DeltaSink<RowData> createDeltaSink(
        String deltaTablePath,
        RowType rowType) {

    Configuration conf = new Configuration();
    conf.set("delta.checkpointInterval", "1000");
    return DeltaSink
        .forRowData(
            new Path(deltaTablePath),
            conf,
            rowType)
        .withPartitionColumns("age")
        .build();
}

deltaTablePath is path to S3. Im using a "fat jar" to submit my demo app. my dependencies in pom.xml looks like this (im using aws profile when building the fat jar.

Copy code

<dependencies>
		<dependency>
			<groupId>io.delta</groupId>
			<artifactId>delta-flink</artifactId>
			<version>${delta.version}</version>
		</dependency>
		<dependency>
			<groupId>io.delta</groupId>
			<artifactId>delta-standalone_${scala.main.version}</artifactId>
			<version>${delta.version}</version>
		</dependency>

		<!-- <https://mvnrepository.com/artifact/org.apache.flink/flink-connector-files> -->
		<dependency>
			<groupId>org.apache.flink</groupId>
			<artifactId>flink-connector-files</artifactId>
			<version>${flink.version}</version>
			<scope>provided</scope>
		</dependency>

		<dependency>
			<groupId>org.apache.flink</groupId>
			<artifactId>flink-clients</artifactId>
			<version>${flink.version}</version>
		</dependency>

		<dependency>
			<groupId>org.apache.flink</groupId>
			<artifactId>flink-parquet</artifactId>
			<version>${flink.version}</version>
		</dependency>

		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-client</artifactId>
			<version>${hadoop-version}</version>
		</dependency>

		<dependency>
			<groupId>org.apache.flink</groupId>
			<artifactId>flink-hadoop-fs</artifactId>
			<version>${flink.version}</version>
		</dependency>

		<dependency>
			<groupId>org.apache.logging.log4j</groupId>
			<artifactId>log4j-slf4j-impl</artifactId>
			<version>${log4j.version}</version>
			<scope>provided</scope>
		</dependency>
		<dependency>
			<groupId>org.apache.logging.log4j</groupId>
			<artifactId>log4j-api</artifactId>
			<version>${log4j.version}</version>
			<scope>provided</scope>
		</dependency>
		<dependency>
			<groupId>org.apache.logging.log4j</groupId>
			<artifactId>log4j-core</artifactId>
			<version>${log4j.version}</version>
			<scope>provided</scope>
		</dependency>

		<dependency>
			<groupId>log4j</groupId>
			<artifactId>log4j</artifactId>
			<version>1.2.17</version>
			<scope>provided</scope>
		</dependency>
	</dependencies>

	<profiles>
		<profile>
			<id>aws</id>
			<properties>
				<flink.scope>compile</flink.scope>
			</properties>
			<dependencies>
				<dependency>
					<groupId>org.apache.hadoop</groupId>
					<artifactId>hadoop-aws</artifactId>
					<version>3.1.0</version>
				</dependency>
				<dependency>
					<groupId>org.apache.hadoop</groupId>
					<artifactId>hadoop-client</artifactId>
					<version>${hadoop-version}</version>
				</dependency>
				<dependency>
					<groupId>org.apache.flink</groupId>
					<artifactId>flink-file-sink-common</artifactId>
					<version>${flink.version}</version>
				</dependency>
			</dependencies>
		</profile>
	</profiles>

Krzysztof Chmielewski

04/07/2023, 12:17 PM

Copy code

<hadoop-version>3.1.0</hadoop-version>

Krzysztof Chmielewski

04/07/2023, 12:18 PM

Could this be caused by anything related to shading?

I dont think so, if there would be a problem with shading/versions you woudl have ClassNotFound or ClassDefNotFound or MethodNotFound exceptions I think

Jalil Alchy

04/07/2023, 1:59 PM

1. what is delta-connector version you are using a. 0.6.0 2. what is flink version you are using a. 1.15.2, maybe this is the issue 3. Do you run Flink from docker/k8s on EC2 or "standalone" app? a. standalone.

Jalil Alchy

04/07/2023, 2:01 PM

Also using a thick jar, most of our dependencies are similar, although I am building with Bazel

Krzysztof Chmielewski

04/07/2023, 2:02 PM

so if this is a standalone process in your case, i wonder if the ENV variables are visible for flink process 🤔

Jalil Alchy

04/07/2023, 2:02 PM

They should be, but I can log them.

Jalil Alchy

04/07/2023, 2:02 PM

by standalone I mean I am running start-cluster.sh, then flink run <path_to_jar>

👍 1

Krzysztof Chmielewski

04/07/2023, 2:03 PM

yeah, logging them would be a good thing to check

Krzysztof Chmielewski

04/07/2023, 2:03 PM

log them from Flink process, from your main method for example Also all Flink nodes are on the same EC2?

Krzysztof Chmielewski

04/07/2023, 2:04 PM

For Delta Sink/Source both TM and JB will have interactions with s3

Jalil Alchy

04/07/2023, 2:06 PM

Yes

👍 1

Jalil Alchy

04/07/2023, 2:35 PM

It does seem the env variable is not available.

Krzysztof Chmielewski

04/07/2023, 2:40 PM

how you set them on that EC2?

Jalil Alchy

04/07/2023, 2:41 PM

I believe they were set via exports

Jalil Alchy

04/07/2023, 2:41 PM

in this terminal

Krzysztof Chmielewski

04/07/2023, 2:41 PM

maybe this will help https://docs.aws.amazon.com/cloud9/latest/user-guide/env-vars.html I bet you need something like ~/.bashrc or ~/.bash_profile

Krzysztof Chmielewski

04/07/2023, 2:42 PM

"set via exports" <- this will work only for currnet "bash" session I think. You are running flink from that same session? Maybe try to set them "globally"

Jalil Alchy

04/07/2023, 2:47 PM

Yes, but I have set it globally, and still seeing the same issue (primarily for AWS_PROFILE and AWS_REGION) I wonder if it is more or less running against a different .aws config.

Jalil Alchy

04/07/2023, 2:47 PM

Which is strange because the aws clients I manually create work.

Jalil Alchy

04/07/2023, 2:56 PM

I think at the very least it is becoming more clear that Hadoop isn't using my ~/.aws/config

Jalil Alchy

04/07/2023, 7:08 PM

You would think that running this on Kinesis Data Analytics using IAM would pretty much insulate all the credential issues with S3 I was seeing, but even then I am getting the same access denied 😅 Going to try attaching a secret directly in the env.

Krzysztof Chmielewski

04/07/2023, 7:12 PM

but even then I am getting the same access denied 😂

Krzysztof Chmielewski

04/13/2023, 3:36 PM

@Jalil Alchy any luck/progress with that one?

Jalil Alchy

04/13/2023, 3:37 PM

Not exactly yet, but actually funny you mention it, it did seem to be a permission isssue when running in KDA. I think the reason why my EC2 instance wasn't working (this just came up) was because the s3 bucket used s3 managed kms encryption, and so it didn't necessarily have access to KMS keys (although it should have) still tweaking to work with this.

Jalil Alchy

04/13/2023, 3:37 PM

So I did get it working from KDA, but not from ec2 yet.

Jalil Alchy

04/13/2023, 3:38 PM

Although I think it may be KMS related permissions, it isn't the first time I have seen S3 effectively hide KMS permission issues, but it wasn't on my mind until today.

👍 1

7 Views

Open in Slack

Previous Next