Hi Folks, am trying to ingest tables metadata from...
# troubleshoot
a
Hi Folks, am trying to ingest tables metadata from Athena, it has loaded first DB’s tables, but the tables from the DB i specified in the Yaml file did not get load up to Datahub. Its’s showing me the below error while trying to load
l
Can you please paste the full log?
a
sure
@loud-island-88694 Please find the log attached as well as the YAML
Copy code
source:
  type: athena
  config:
    database: "default"
    aws_region: "us-west-2"
    s3_staging_dir: "<s3://aws-athena-query-results-xxxxxxxxx-us-west-2/>" # "s3://<bucket-name>/prefix/"
    # The s3_staging_dir parameter is needed because Athena always writes query results to S3.
    # See <https://docs.aws.amazon.com/athena/latest/ug/querying.html>
    # However, the athena driver will transparently fetch these results as you would expect from any other sql client.
    work_group: "primary"
    # table_pattern/schema_pattern is same as above

sink:
  type: "datahub-rest"
  config:
    server: "<http://localhost:8080>"
thanks for helping me out!! Appreciate this
l
Can you try disabling views altogether for now using ``view_pattern.deny``
set the deny pattern to
*
Views are not supported by this connector and it should've skipped it ideally but there seems to be a bug.
b
You had no config about include_views?
Can you try with the following config?
Copy code
type: athena
  config:
    database: "default"
    aws_region: "us-west-2"
    s3_staging_dir: "<s3://aws-athena-query-results-xxxxxxxxx-us-west-2/>" # "s3://<bucket-name>/prefix/"
    # The s3_staging_dir parameter is needed because Athena always writes query results to S3.
    # See <https://docs.aws.amazon.com/athena/latest/ug/querying.html>
    # However, the athena driver will transparently fetch these results as you would expect from any other sql client.
    work_group: "primary"
    # table_pattern/schema_pattern is same as above
    include_views: False
sink:
  type: "datahub-rest"
  config:
    server: "<http://localhost:8080>"
a
@big-carpet-38439 I tried the yaml as below but getting syntax error
Copy code
source:
  type: athena
  config:
    database: "default"
    aws_region: "us-west-2"
    s3_staging_dir: "<s3://aws-athena-query-results-552020479886-us-west-2/>" # "s3://<bucket-name>/prefix/"
    # work_group: "primary"
  client.
    work_group: "primary"
    include_views: False

sink:
  type: "datahub-rest"
  config:
    server: "<http://localhost:8080>"
Error is :
Copy code
ScannerError: while scanning a simple key
  in "<file>", line 11, column 3
could not find expected ':'
  in "<file>", line 12, column 15
@loud-island-88694 May i know where to put the config you mentioned pls ?
l
In the ingestion recipe under
source:
a
“view_pattern.deny”: “*”. --> like this correct ?
b
Interesting - somehow the yaml is malformed
let me verify why
this yaml [should] work
Copy code
sink: 
  config: 
    server: "<http://localhost:8080>"
  type: datahub-rest
source: 
  config: 
    aws_region: us-west-2
    database: default
    include_views: false
    s3_staging_dir: "<s3://aws-athena-query-results-552020479886-us-west-2/>"
    work_group: primary
  type: athena
a
sure @big-carpet-38439.. Will try now
@big-carpet-38439..First thing..it works thanks much….second thing is its writing for all the DB in Athena, i just want tables from default db and hence given givn the same in config…can you suggest me few pls ?
@loud-island-88694 thanks a lot for helping me out…really appreciating this community
b
yeah let me take another look at the configs!
a
cool thanks @big-carpet-38439
b
Okay so you should be able to use "table_pattern" to extract specific tables
let me show an example
Copy code
sink: 
  config: 
    server: "<http://localhost:8080>"
  type: datahub-rest
source: 
  config: 
    aws_region: us-west-2
    database: default
    table_pattern: 
      allow: - <your-regex-here> 
    include_views: false
    s3_staging_dir: "<s3://aws-athena-query-results-552020479886-us-west-2/>"
    work_group: primary
  type: athena
where you replace <your-regex-here> with a valid regex pattern
if it matches, it will extract table information
a
@big-carpet-38439 Cool, Will try this. Is there anything that does this for Database as well.So that i can pull all tbl’s for a specific DB pattern alone?