<!here> A couple people have been asking for tips ...
# ingestion
c
<!here> A couple people have been asking for tips on ingesting metadata from AWS S3, so we've put together a guide on using AWS Glue to crawl S3 buckets, which can then be ingested into DataHub. This setup prevents us from having to crawl large S3 buckets directly and also leverages Glue's powerful built-in classifiers. We now also support ingesting jobs and pipelines from Glue by default, so you'll be able to view the complete flow of information in DataHub. Feel free to message me with any questions!
🎉 3
🙌 3
b
@chilly-holiday-80781 I followed your guide to ingest metadata from AWS S3 to DataHub (I use the public bucket you mentioned in the guide). When I try to ingest, I get "Pipeline finished successfully" but there is no data ingested. Could this be a problem with missing permissions? I run DataHub with docker setup and the IAM Role has full access for Glue and S3. Do you have some documentation about needed permissions to follow your guide?
c
That’s odd – your IAM role should be fine here. Could you try creating a bucket of your own and ingesting?
b
Thanks for your answer. Meanwhile I could figured out that this only happens when I define allowed table patterns in the recipe. If I remove table pattern section it's working.
c
Interesting – can you send me some examples of allowed tables and the patterns you're using?
b
I tried
database_pattern:
  
allow:
    
- "flight-database"
table_pattern:
  
allow:
    
- "avro"
and
database_pattern:
  
allow:
    
- "datahub-database"
table_pattern:
  
allow:
    
- "csv"
Is used s3://crawler-public-us-east-1/flight/avro/ as source for flight-database and s3://crawler-public-us-east-1/flight/2016/csv/ for datahub-database
c
Alright, I'll take a look
thank you 1