https://datahubproject.io logo
Join SlackCommunities
Powered by
# ingestion
  • f

    few-sunset-37169

    08/24/2022, 2:49 PM
    Hello all. I have been following the guidelines at https://datahubproject.io/docs/generated/ingestion/sources/dbt/#dbt-query_tag-automated-mappings. In particular, I have included the following meta_mapping in my recipe (see attached image). I have also tried with a .* regex as well. (Premature Enter, apologies). The resulting Glossary Term in DataHub shows up as a "{{ $match }}" Glossary Term.
    h
    g
    m
    • 4
    • 7
  • l

    lemon-engine-23512

    08/24/2022, 7:40 PM
    Hello All. Want to know what is the difference between below metadata ingestion methods 1. Adding a custom source 2. Python/ rest emitter code 3. Creating mcp wrappers
    h
    • 2
    • 17
  • c

    colossal-sandwich-50049

    08/24/2022, 9:45 PM
    Hello, are there any issues/"gotchas" to having kafka emitters from multiple regions and/or aws accounts emitting data to datahub? E.g. if I have two AWS accounts, X and Y, where account Y has 3 regions (A, B, C), with Datahub being hosted in region C of this account, would any of the following cause issues: • Using the kafka emitter from AWS account X to emit data to Datahub • Using the kafka emitter from regions A and B in account Y to emit data to Datahub I assume the answer is that it will not cause issues (aside from needing fancy devops work), but wanted to confirm with the community. Thanks! cc: @great-toddler-2251
    h
    • 2
    • 2
  • s

    silly-finland-62382

    08/25/2022, 3:37 AM
    Hey @big-carpet-38439
    l
    • 2
    • 1
  • s

    silly-finland-62382

    08/25/2022, 3:37 AM
    @little-megabyte-1074
  • s

    silly-finland-62382

    08/25/2022, 3:38 AM
    @witty-plumber-82249 Hope nyou are doing wel
  • s

    silly-finland-62382

    08/25/2022, 3:38 AM
    I am facing issue while doing spark lineage , I cannot see schema of dataset write to datahub using spark lineage
    b
    l
    a
    • 4
    • 31
  • s

    silly-finland-62382

    08/25/2022, 3:40 AM
    #ingestion
  • m

    miniature-policeman-55414

    08/25/2022, 4:23 AM
    Hi folks, Is there a work around solution for Looker Lookml, Dashboards state ful ingestion for the current version? It seems that the current version doesn't support this.
    m
    • 2
    • 3
  • f

    few-carpenter-93837

    08/25/2022, 7:28 AM
    Hey guys, can anyone point the right direction, if I want to add
    ca_certificate_path
    into the conf of sink, do I need to basically just export the cert from the site? And then specify the path?
    d
    • 2
    • 4
  • a

    alert-fall-82501

    08/25/2022, 8:45 AM
    Hi Team - I am working getting metadata from s3 delta lake to datahub ,In config file I am giving hardcore aws credential , I dont want give those credential in every config file . can anybody suggest on this ? .. How I can provide aws credential ?
    d
    • 2
    • 6
  • a

    alert-fall-82501

    08/25/2022, 11:04 AM
    Copy code
    source:
      type: s3
      config:
        path_specs:
          -
            include: "<s3://xx.lakehouse.xxxx.dev/eventsData/us-west-1/partner={table}/year={partition[0]}/month={partition[1]}/day={partition[2]}/*.parquet>"
        aws_config:
          aws_access_key_id: ~/.aws/credentials
          aws_secret_access_key: ~/.aws/credentials
          aws_region: us-west-1
        env: "dev"
        profiling:
          enabled: false
          
    sink:
      type: "datahub-rest"
      config:
        server: "<http://localhost:8080>"
  • a

    alert-fall-82501

    08/25/2022, 11:07 AM
    In above file I dont want to use hardcore aws credential in above file . I have saved credential to $HOME /.aws/credentials file but it is not working after calling this . can anybody suggest on this ?
    d
    • 2
    • 4
  • s

    silly-finland-62382

    08/25/2022, 12:23 PM
    hey Team,
    h
    • 2
    • 1
  • s

    silly-finland-62382

    08/25/2022, 12:23 PM
    We are using spark lineage to ingest data using spark on datahub, but we see datahub, spark is able to ingest the spark config but not able to see schema of data ingested using spark Can you please let me know, I found this bug , spark lineage not able to ingest schema of data on datahub using spark lineage
    h
    • 2
    • 2
  • c

    careful-insurance-60247

    08/25/2022, 2:49 PM
    how do I update the datahub python module on the docker image when its already running
    g
    • 2
    • 1
  • g

    gentle-camera-33498

    08/25/2022, 3:16 PM
    Hello everyone, What is the reason 'SnapshotClasses' doesn't expect to receive ContainerClass aspect? Because of that, I have to emit a new MetadataWorkUnit just to atach a resource to a countainer.
    g
    • 2
    • 11
  • c

    careful-insurance-60247

    08/25/2022, 3:42 PM
    Running into a issue ingesting from a mssql source.
    Copy code
    File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/ingestion/run/pipeline.py", line 185, in __init__
        self.config.source.dict().get("config", {}), self.ctx
    File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/ingestion/source/sql/mssql.py", line 177, in create
        return cls(config, ctx)
    File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/ingestion/source/sql/mssql.py", line 123, in __init__
        for inspector in self.get_inspectors():
    File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/ingestion/source/sql/mssql.py", line 215, in get_inspectors
        engine = create_engine(url, **self.config.options)
    File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/__init__.py", line 525, in create_engine
        return strategy.create(*args, **kwargs)
    File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/strategies.py", line 54, in create
        u = url.make_url(name_or_url)
    File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/url.py", line 229, in make_url
        return _parse_rfc1738_args(name_or_url)
    File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/url.py", line 288, in _parse_rfc1738_args
        return URL(name, **components)
    File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/url.py", line 71, in __init__
        self.port = int(port)
    
    ValueError: invalid literal for int() with base 10: '1433?TrustServerCertificate=True&isolation_level=READ+UNCOMMITTED&driver=ODBC+Driver+17+for+SQL+Server&ssl=True&Trusted_Connection=True'
    d
    • 2
    • 7
  • s

    silly-finland-62382

    08/25/2022, 6:56 PM
    Hey Team, We are using spark lineage to ingest data using spark on datahub, but we see datahub, spark is able to ingest the spark config but not able to see schema of data ingested using spark Can you please let me know, I found this bug , spark lineage not able to ingest schema of data on datahub using spark lineage
    h
    • 2
    • 2
  • c

    cuddly-arm-8412

    08/26/2022, 1:37 AM
    hi,team.Is there an interface to delete metadata, including clear es related data?
    h
    • 2
    • 2
  • g

    great-account-95406

    08/26/2022, 5:02 AM
    Hi team! Is there a way to collect metrics about UI ingestions success for notification systems?
    f
    • 2
    • 6
  • s

    silly-finland-62382

    08/26/2022, 5:30 AM
    hey,Team Is there any planning for developing spark lineage using databricks ?
    d
    • 2
    • 1
  • a

    alert-fall-82501

    08/26/2022, 5:33 AM
    Hi Team - I am working on to ingest metdata from Hive metastore DataBricks . Can anybody has sample config file for same ?
    s
    • 2
    • 13
  • s

    square-yak-42039

    08/26/2022, 8:58 AM
    Hi. I try to ingest metadata into sink and write it as a file. My Datahub instance is in docker containers. Can you tell where is the default path for this file? Which container?
    b
    h
    • 3
    • 6
  • s

    square-solstice-69079

    08/26/2022, 1:33 PM
    The new GUI ingestion in the town hall demo looks really good! Looking forward to get all CLI ingestions showing and all the extra details!
    thank you 2
    h
    • 2
    • 1
  • m

    modern-monitor-68945

    08/26/2022, 2:17 PM
    Hi everyone! Regarding airflow integration via acryl-datahub-airflow-plugin. Version of plugin should be the same as version of datahub (0.8.43) or older version (0.8.35.6) will work too? Recent versions have accumulation-tree dependency which cannot be built on bitnami airflow images due to lack of gcc
    d
    • 2
    • 3
  • p

    polite-jordan-17005

    08/26/2022, 5:38 PM
    Hi, I am looking to use the same format of path_specs.include from s3 ingestion to ingest data use delta-lake receipt to support ingestion of multiple tables. Is this supported yet? I have tried to provide the info using both
    base_path
    and
    path_spec: include
    but doesn't seem be working. Thank you for the help in advance!
    h
    g
    +3
    • 6
    • 27
  • s

    silly-finland-62382

    08/26/2022, 6:32 PM
    Hey, As part of databricks integration with datahub using spark lineage This documentation shared by @careful-pilot-86309 on channel wont able to help , because I am not able to see any pipeline created after setting config as shown in this file
    h
    • 2
    • 4
  • n

    nutritious-printer-9873

    08/27/2022, 6:06 AM
    Hi, I just followed the document about using simple_add_dataset_terms to add glossary terms.
    Copy code
    transformers:
      - type: simple_add_dataset_terms
        config:
          term_urns:
            - urn:li:glossaryTerm:PII
        - type: pattern_add_dataset_schema_terms
          config:
            term_pattern:
               rules:
                  email: [urn:li:glossaryTerm:PII]
    It works. I’m able to see the term via https://my-datahub.com/glossaryTerm/urn:li:glossaryTerm:PII and the dataset properties, but it’s not listed in the UI > Governs > Glossary. Also realized the terms created manually have a different urn format:
    Copy code
    urn:li:glossaryTerm:30c3a9e3-6561-4d45-b5db-a12cf999d31f
    Your thought?
    h
    f
    f
    • 4
    • 5
  • l

    lemon-engine-23512

    08/27/2022, 8:27 AM
    hello team, I came across this https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/dataset_schema.py. I believe we can use this to ingest any schema files we have. but is there a way to make this easier, incase we have hundreds of columns wouldn't defining each in? schemafieldclass be tedious
    h
    • 2
    • 2
1...646566...144Latest