I have problem running ingestion from S3 buckets. ...
# ingestion
b
I have problem running ingestion from S3 buckets. I have followed the documentation to create IAM role and policy to gate request to access metatdata from S3(https://datahubproject.io/docs/deploy/aws -> IAM policies for UI-based ingestion), however the ingestion request fails with the following message:
Copy code
'[2022-09-05 08:32:57,596] ERROR    {datahub.entrypoints:188} - Command failed with An error occurred (AccessDenied) when calling the '
           'ListObjects operation: Access Denied. Run with --debug to get full trace\n'
I have created an iamserviceaccount associated with the kubernetes cluster called acryl-datahub-actions, having an the following policy: { “Version”: “2012-10-17", “Statement”: [ { “Effect”: “Allow”, “Action”: [ “s3:*” ], “Resource”: [ “arnawss3:::cdca-dev-us-east-1-product-metrics”, “arnawss3:::cdca-dev-us-east-1-product-metrics/*” ] } ] } The receipe that I am trying is the following: sink: type: datahub-rest config: server: ‘http://datahub-datahub-gms:8080’ source: type: s3 config: profiling: enabled: false path_spec: include: ‘s3://my-bucket/table/sh_date=2021-06-23/test.parquet’ env: DEV aws_config: aws_region: us-east-1 P.S in the policy I have given all the permission for S3, which will eventually I will narrow down.
d
can you somehow validate if the service account and iam role are set up correctly? The above permission seems to be ok, and I think you should not get
ListObjects operation: Access Denied
if it is appropriately set.
b
I have created a service account with the following command: eksctl create iamserviceaccount \ --name acryl-datahub-actions \ --cluster datahub \ --attach-policy-arn arnawsiam:<<account id>>policy/policy1 \ --approve \ --override-existing-serviceaccounts and then added the following line in values.yaml: acryl-datahub-actions: enabled: true serviceAccount: name: acryl-datahub-actions
However when I run kubectl get pods/<datahub-acryl-datahub-actions>-o yaml I get the following default for serviceAccount and serviceAccountName
so apparently the acryl-datahub-actions serviceaccount that I have created did not get associated with the datahub-acryl-datahub-actions pod
Any ideas how to fix it?