breezy-controller-54597
02/24/2022, 8:34 AMbreezy-controller-54597
02/24/2022, 8:36 AMPy4JJavaError: An error occurred while calling o42.parquet.
: java.nio.file.AccessDeniedException: <s3a://xxxxx/part-00000-6a55c272-ff55-45b8-b60c-18bd27d799f2-c000.snappy.parquet>: getFileStatus on <s3a://xxxxx/part-00000-6a55c272-ff55-45b8-b60c-18bd27d799f2-c000.snappy.parquet>: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden;
breezy-controller-54597
02/24/2022, 8:37 AMsource:
type: data-lake
config:
env: "PROD"
platform: "S3"
base_path: "<s3://xxxxx/>"
profiling:
enabled: false
aws_config:
aws_region: "xxx"
sink:
type: "datahub-rest"
config:
server: <http://localhost:8080>
numerous-camera-74294
02/24/2022, 9:09 AMnumerous-camera-74294
02/24/2022, 9:09 AMbreezy-controller-54597
02/25/2022, 12:38 AMloud-island-88694
chilly-holiday-80781
02/25/2022, 6:28 PMbreezy-controller-54597
02/28/2022, 1:02 AM---- (full traceback above) ----
File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/entrypoints.py", line 105, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/home/ec2-user/.local/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/ec2-user/.local/lib/python3.7/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/ec2-user/.local/lib/python3.7/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ec2-user/.local/lib/python3.7/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ec2-user/.local/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ec2-user/.local/lib/python3.7/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/telemetry/telemetry.py", line 196, in wrapper
raise e
File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/telemetry/telemetry.py", line 190, in wrapper
res = func(*args, **kwargs)
File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/cli/ingest_cli.py", line 87, in run
pipeline.run()
File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/ingestion/run/pipeline.py", line 182, in run
self.source.get_workunits(), 10 if self.preview_mode else None
File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/ingestion/source/data_lake/__init__.py", line 538, in get_workunits
yield from self.get_workunits_s3()
File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/ingestion/source/data_lake/__init__.py", line 507, in get_workunits_s3
yield from self.ingest_table(aws_file, relative_path)
File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/ingestion/source/data_lake/__init__.py", line 386, in ingest_table
table = self.read_file(full_path)
File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/ingestion/source/data_lake/__init__.py", line 252, in read_file
df = self.spark.read.parquet(file)
File "/home/ec2-user/.local/lib/python3.7/site-packages/pyspark/sql/readwriter.py", line 353, in parquet
return self._df(self._jreader.parquet(_to_seq(self._spark._sc, paths)))
File "/home/ec2-user/.local/lib/python3.7/site-packages/py4j/java_gateway.py", line 1305, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/home/ec2-user/.local/lib/python3.7/site-packages/pyspark/sql/utils.py", line 128, in deco
return f(*a, **kw)
File "/home/ec2-user/.local/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o42.parquet.
: java.nio.file.AccessDeniedException: <s3a://xxxxx/part-00000-6a55c272-ff55-45b8-b60c-18bd27d799f2-c000.snappy.parquet>: getFileStatus on <s3a://xxxxx/part-00000-6a55c272-ff55-45b8-b60c-18bd27d799f2-c000.snappy.parquet>: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: xxxxx; S3 Extended Request ID: xxxxx), S3 Extended Request ID: xxxxx
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:174)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:117)
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1887)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1854)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1794)
at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1700)
at org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:2572)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:47)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:376)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:286)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:758)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: xxxxx; S3 Extended Request ID: xxxxx), S3 Extended Request ID: xxxxx
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1639)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1264)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1045)
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1872)
... 22 more
chilly-holiday-80781
02/28/2022, 2:37 PMbreezy-controller-54597
03/01/2022, 12:12 AMbreezy-controller-54597
03/01/2022, 12:27 AMIf profiling, make sure that permissions for **s3a://** access are set because Spark and Hadoop use the s3a:// protocol to interface with AWS (schema inference outside of profiling requires s3:// access).
chilly-holiday-80781
03/01/2022, 1:51 AMs3:ListBucket
, s3:GetObject
, and s3:ListBucketMultipartUploads
?chilly-holiday-80781
03/01/2022, 1:51 AMbreezy-controller-54597
03/01/2022, 3:20 AM{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:*",
"s3-object-lambda:*"
],
"Resource": "*"
}
]
}
chilly-holiday-80781
03/01/2022, 2:35 PMchilly-holiday-80781
03/01/2022, 2:36 PMbreezy-controller-54597
03/09/2022, 1:49 AMchilly-holiday-80781
03/09/2022, 1:50 AM