Hey on my Flink application hosted on AWS managed Flink I ge Apache Flink #troubleshooting

Hey, on my Flink application (hosted on AWS manage...

Louis Cameron Booth

08/16/2024, 3:01 PM

Hey, on my Flink application (hosted on AWS managed Flink) I get these warnings:

Copy code

AccessDenied: ...1722425503684/: Access Denied

Bulk delete operation failed to delete all objects; failure count = 1

org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteSupport.translateDeleteException(MultiObjectDeleteSupport.java:92)

I thought this was todo with a dependecy i wasn’t using

flink-s3-fs-hadoop

, so I removed that and still see these thousands of warning logs. I can’t see where in my code I would be calling this application or trying to perform a delete in this way.

Ahmed Hamdy

08/16/2024, 3:12 PM

Could you provide more context about the trace? Flink uses the s3 filesystem to manage checkpoints, savepoints, ha and so it doesn't specifically need to part of your code.

Louis Cameron Booth

08/16/2024, 3:12 PM

yes!

Louis Cameron Booth

08/16/2024, 3:15 PM

Copy code

{
  "content": {
    "host": "/aws/kinesis-analytics/flink",
    "message": "AccessDenied: 7b3d0a3681dfd0b918d30888438b019f-946374002659-1722425503684/: Access Denied",
    "attributes": {
      "messageType": "WARN",
      "service": "stream-deactivation-us-ash",
      "locationInformation": "org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteSupport.translateDeleteException(MultiObjectDeleteSupport.java:107)",
      "logger": "org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteSupport",
      "messageSchemaVersion": "1",
      "host": "/aws/kinesis-analytics/flink",
      "id": "38442495305608014481312845103217598507977785145102172165",
      "applicationVersionId": "4",
      "threadName": "s3a-transfer-3db4bd0e0168751d35dc925c1fa9414b79d097b8-unbounded-pool2-t86",
      "timestamp": 1723821108370
    }
  }
}

Louis Cameron Booth

08/16/2024, 3:16 PM

Copy code

{
  "content": {
    "message": "Bulk delete operation failed to delete all objects; failure count = 1",
    "attributes": {
      "messageType": "WARN",
      "locationInformation": "org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteSupport.translateDeleteException(MultiObjectDeleteSupport.java:92)",
      "logger": "org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteSupport",
      "messageSchemaVersion": "1",
      "applicationVersionId": "4",
      "threadName": "s3a-transfer-3db4bd0e0168751d35dc925c1fa9414b79d097b8-unbounded-pool2-t98",
      "timestamp": 1723821288239
    }
  }
}

Louis Cameron Booth

08/22/2024, 11:42 AM

I believe I have to give

s3:Delete*

IAM permissions to my roles, but I’m not sure I understand which s3 buckets the flink job is trying to write to 🤔

Louis Cameron Booth

08/22/2024, 12:13 PM

Confused why I am getting

Bulk delete operation failed to delete all objects; failure count = 1

errors from the

org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteSupport

logger. As a test I gave my Flink job on aws full delete permissions, I am still getting the warning log every minute or so:

Copy code

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Statement1",
      "Effect": "Allow",
      "Action": [
        "s3:Delete*"
      ],
      "Resource": "*"
    }
  ]
}

Just wondering where this comes from, and which s3 bucket it is trying to write to…

Ahmed Hamdy

08/22/2024, 12:21 PM

Ideally we would have gotten a better stackTrace to understand what exactly is failing to delete, but again this might be an operation of syncing checkpoints or ha, in this case the service probably doesn't use the service execution role you provide because it wouldn't be targeting your buckets anyway, it is hard to confirm or deny with such shallow stack trace.

Louis Cameron Booth

08/22/2024, 12:24 PM

That makes sense, I can try lowering the log level to info and seeing if there is anything interesting present there

Louis Cameron Booth

08/22/2024, 1:07 PM

It appears to be related to this log

Copy code

{
  "id": "AgAAAZF6LZOSU5DtngAAAAAAAAAYAAAAAEFaRjZMWm5CQUFCVFhEaUl6Z0VwTEFBQQAAACQAAAAAMDE5MTdhMmQtYWUwYS00YmNlLThlYTEtODg3ZTUzODI1Y2Jh",
  "content": {
    "timestamp": "2024-08-22T13:01:32.946Z",
    "message": "Committing 7b3d0a3681dfd0b918d30888438b019f-946374002659-1722425503684/checkpoints/7b3d0a3681dfd0b918d30888438b019f/chk-17/_metadata with MPU ID afQOxbeG84HLeNBL5Mi_nxzS046xQ.s7W9t2c9E7Hf5CaHpThEHvy_.fzikHtaCuKIILFFB.80XEYOK59WcEgboxanYI7ari20JWcN7.aR1jRqv3uRac56ZBzbnkCvsRpHb5KcRh6mhylUELdXBkNf._tDYq93PHF4C3qQCNej4-",
    "attributes": {
      "messageType": "INFO",
      "locationInformation": "org.apache.flink.fs.s3.common.writer.S3Committer.commit(S3Committer.java:67)",
      "logger": "org.apache.flink.fs.s3.common.writer.S3Committer",
      "messageSchemaVersion": "1",
      "id": "38453881722139690275353199166317417160391698146809085952",
      "applicationVersionId": "5",
      "threadName": "jobmanager-io-thread-1",
      "timestamp": 1724331692946
    }
  }
}

Ahmed Hamdy

08/22/2024, 1:08 PM

Yeah that indeed looks like a checkpoints file

Louis Cameron Booth

08/22/2024, 1:09 PM

alright

Louis Cameron Booth

08/22/2024, 1:09 PM

do you know how i can fix the warning, is there some sort of internal permissions i need to change?

Ahmed Hamdy

08/22/2024, 1:17 PM

you would probably need to contact the Service team since they manage the bucket and the configuration. Not sure if could be fixed from your side or not. also I can't tell if that affects the job in any means or not but a warn message every minute is pretty noisy

Louis Cameron Booth

08/22/2024, 2:12 PM

Okay thanks

Louis Cameron Booth

08/22/2024, 2:12 PM

yea it is a lot of logs

Louis Cameron Booth

08/22/2024, 2:12 PM

I don’t have AWS support unfortunately so I guess I will have to ask on re:Post

✅ 1

8 Views

Open in Slack

Previous Next