DataHub #ingestion

few-sunset-37169

08/24/2022, 2:49 PM

Hello all. I have been following the guidelines at https://datahubproject.io/docs/generated/ingestion/sources/dbt/#dbt-query_tag-automated-mappings. In particular, I have included the following meta_mapping in my recipe (see attached image). I have also tried with a .* regex as well. (Premature Enter, apologies). The resulting Glossary Term in DataHub shows up as a "{{ $match }}" Glossary Term.

lemon-engine-23512

08/24/2022, 7:40 PM

Hello All. Want to know what is the difference between below metadata ingestion methods 1. Adding a custom source 2. Python/ rest emitter code 3. Creating mcp wrappers

colossal-sandwich-50049

08/24/2022, 9:45 PM

Hello, are there any issues/"gotchas" to having kafka emitters from multiple regions and/or aws accounts emitting data to datahub? E.g. if I have two AWS accounts, X and Y, where account Y has 3 regions (A, B, C), with Datahub being hosted in region C of this account, would any of the following cause issues: • Using the kafka emitter from AWS account X to emit data to Datahub • Using the kafka emitter from regions A and B in account Y to emit data to Datahub I assume the answer is that it will not cause issues (aside from needing fancy devops work), but wanted to confirm with the community. Thanks! cc: @great-toddler-2251

silly-finland-62382

08/25/2022, 3:37 AM

Hey @big-carpet-38439

silly-finland-62382

08/25/2022, 3:37 AM

@little-megabyte-1074

silly-finland-62382

08/25/2022, 3:38 AM

@witty-plumber-82249 Hope nyou are doing wel

silly-finland-62382

08/25/2022, 3:38 AM

I am facing issue while doing spark lineage , I cannot see schema of dataset write to datahub using spark lineage

silly-finland-62382

08/25/2022, 3:40 AM

#ingestion

miniature-policeman-55414

08/25/2022, 4:23 AM

Hi folks, Is there a work around solution for Looker Lookml, Dashboards state ful ingestion for the current version? It seems that the current version doesn't support this.

few-carpenter-93837

08/25/2022, 7:28 AM

Hey guys, can anyone point the right direction, if I want to add

ca_certificate_path

into the conf of sink, do I need to basically just export the cert from the site? And then specify the path?

alert-fall-82501

08/25/2022, 8:45 AM

Hi Team - I am working getting metadata from s3 delta lake to datahub ,In config file I am giving hardcore aws credential , I dont want give those credential in every config file . can anybody suggest on this ? .. How I can provide aws credential ?

alert-fall-82501

08/25/2022, 11:04 AM

Copy code

source:
  type: s3
  config:
    path_specs:
      -
        include: "<s3://xx.lakehouse.xxxx.dev/eventsData/us-west-1/partner={table}/year={partition[0]}/month={partition[1]}/day={partition[2]}/*.parquet>"
    aws_config:
      aws_access_key_id: ~/.aws/credentials
      aws_secret_access_key: ~/.aws/credentials
      aws_region: us-west-1
    env: "dev"
    profiling:
      enabled: false
      
sink:
  type: "datahub-rest"
  config:
    server: "<http://localhost:8080>"

alert-fall-82501

08/25/2022, 11:07 AM

In above file I dont want to use hardcore aws credential in above file . I have saved credential to $HOME /.aws/credentials file but it is not working after calling this . can anybody suggest on this ?

silly-finland-62382

08/25/2022, 12:23 PM

hey Team,

silly-finland-62382

08/25/2022, 12:23 PM

We are using spark lineage to ingest data using spark on datahub, but we see datahub, spark is able to ingest the spark config but not able to see schema of data ingested using spark Can you please let me know, I found this bug , spark lineage not able to ingest schema of data on datahub using spark lineage

careful-insurance-60247

08/25/2022, 2:49 PM

how do I update the datahub python module on the docker image when its already running

gentle-camera-33498

08/25/2022, 3:16 PM

Hello everyone, What is the reason 'SnapshotClasses' doesn't expect to receive ContainerClass aspect? Because of that, I have to emit a new MetadataWorkUnit just to atach a resource to a countainer.

careful-insurance-60247

08/25/2022, 3:42 PM

Running into a issue ingesting from a mssql source.

Copy code

File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/ingestion/run/pipeline.py", line 185, in __init__
    self.config.source.dict().get("config", {}), self.ctx
File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/ingestion/source/sql/mssql.py", line 177, in create
    return cls(config, ctx)
File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/ingestion/source/sql/mssql.py", line 123, in __init__
    for inspector in self.get_inspectors():
File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/ingestion/source/sql/mssql.py", line 215, in get_inspectors
    engine = create_engine(url, **self.config.options)
File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/__init__.py", line 525, in create_engine
    return strategy.create(*args, **kwargs)
File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/strategies.py", line 54, in create
    u = url.make_url(name_or_url)
File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/url.py", line 229, in make_url
    return _parse_rfc1738_args(name_or_url)
File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/url.py", line 288, in _parse_rfc1738_args
    return URL(name, **components)
File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/url.py", line 71, in __init__
    self.port = int(port)

ValueError: invalid literal for int() with base 10: '1433?TrustServerCertificate=True&isolation_level=READ+UNCOMMITTED&driver=ODBC+Driver+17+for+SQL+Server&ssl=True&Trusted_Connection=True'

silly-finland-62382

08/25/2022, 6:56 PM

Hey Team, We are using spark lineage to ingest data using spark on datahub, but we see datahub, spark is able to ingest the spark config but not able to see schema of data ingested using spark Can you please let me know, I found this bug , spark lineage not able to ingest schema of data on datahub using spark lineage

cuddly-arm-8412

08/26/2022, 1:37 AM

hi,team.Is there an interface to delete metadata, including clear es related data？

great-account-95406

08/26/2022, 5:02 AM

Hi team! Is there a way to collect metrics about UI ingestions success for notification systems?

silly-finland-62382

08/26/2022, 5:30 AM

hey,Team Is there any planning for developing spark lineage using databricks ?

alert-fall-82501

08/26/2022, 5:33 AM

Hi Team - I am working on to ingest metdata from Hive metastore DataBricks . Can anybody has sample config file for same ?

square-yak-42039

08/26/2022, 8:58 AM

Hi. I try to ingest metadata into sink and write it as a file. My Datahub instance is in docker containers. Can you tell where is the default path for this file? Which container?

square-solstice-69079

08/26/2022, 1:33 PM

The new GUI ingestion in the town hall demo looks really good! Looking forward to get all CLI ingestions showing and all the extra details!

thank you 2

modern-monitor-68945

08/26/2022, 2:17 PM

Hi everyone! Regarding airflow integration via acryl-datahub-airflow-plugin. Version of plugin should be the same as version of datahub (0.8.43) or older version (0.8.35.6) will work too? Recent versions have accumulation-tree dependency which cannot be built on bitnami airflow images due to lack of gcc

polite-jordan-17005

08/26/2022, 5:38 PM

Hi, I am looking to use the same format of path_specs.include from s3 ingestion to ingest data use delta-lake receipt to support ingestion of multiple tables. Is this supported yet? I have tried to provide the info using both

base_path

and

path_spec: include

but doesn't seem be working. Thank you for the help in advance!

silly-finland-62382

08/26/2022, 6:32 PM

Hey, As part of databricks integration with datahub using spark lineage This documentation shared by @careful-pilot-86309 on channel wont able to help , because I am not able to see any pipeline created after setting config as shown in this file

nutritious-printer-9873

08/27/2022, 6:06 AM

Hi, I just followed the document about using simple_add_dataset_terms to add glossary terms.

Copy code

transformers:
  - type: simple_add_dataset_terms
    config:
      term_urns:
        - urn:li:glossaryTerm:PII
    - type: pattern_add_dataset_schema_terms
      config:
        term_pattern:
           rules:
              email: [urn:li:glossaryTerm:PII]

It works. I’m able to see the term via https://my-datahub.com/glossaryTerm/urn:li:glossaryTerm:PII and the dataset properties, but it’s not listed in the UI > Governs > Glossary. Also realized the terms created manually have a different urn format:

Copy code

urn:li:glossaryTerm:30c3a9e3-6561-4d45-b5db-a12cf999d31f

Your thought?

lemon-engine-23512

08/27/2022, 8:27 AM

hello team, I came across this https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/dataset_schema.py. I believe we can use this to ingest any schema files we have. but is there a way to make this easier, incase we have hundreds of columns wouldn't defining each in? schemafieldclass be tedious