< here> A couple people have been asking for tips on ingesti DataHub #ingestion

<!here> A couple people have been asking for tips ...

chilly-holiday-80781

06/23/2021, 11:33 PM

<!here> A couple people have been asking for tips on ingesting metadata from AWS S3, so we've put together a guide on using AWS Glue to crawl S3 buckets, which can then be ingested into DataHub. This setup prevents us from having to crawl large S3 buckets directly and also leverages Glue's powerful built-in classifiers. We now also support ingesting jobs and pipelines from Glue by default, so you'll be able to view the complete flow of information in DataHub. Feel free to message me with any questions!

🎉 3

🙌 3

brash-carpenter-51184

11/19/2021, 10:25 AM

@chilly-holiday-80781 I followed your guide to ingest metadata from AWS S3 to DataHub (I use the public bucket you mentioned in the guide). When I try to ingest, I get "Pipeline finished successfully" but there is no data ingested. Could this be a problem with missing permissions? I run DataHub with docker setup and the IAM Role has full access for Glue and S3. Do you have some documentation about needed permissions to follow your guide?

chilly-holiday-80781

11/19/2021, 2:45 PM

That’s odd – your IAM role should be fine here. Could you try creating a bucket of your own and ingesting?

brash-carpenter-51184

11/24/2021, 9:46 AM

Thanks for your answer. Meanwhile I could figured out that this only happens when I define allowed table patterns in the recipe. If I remove table pattern section it's working.

chilly-holiday-80781

11/24/2021, 1:27 PM

Interesting – can you send me some examples of allowed tables and the patterns you're using?

brash-carpenter-51184

11/25/2021, 7:42 AM

I tried

database_pattern:

allow:

- "flight-database"

table_pattern:

allow:

- "avro"

and

database_pattern:

allow:

- "datahub-database"

table_pattern:

allow:

- "csv"

Is used s3://crawler-public-us-east-1/flight/avro/ as source for flight-database and s3://crawler-public-us-east-1/flight/2016/csv/ for datahub-database

chilly-holiday-80781

11/25/2021, 3:48 PM

Alright, I'll take a look

thank you 1

3 Views

Open in Slack

Previous Next