https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • a

    alert-fall-82501

    08/02/2022, 12:03 PM
    Hi Team -Please suggest on above issue
  • a

    alert-football-80212

    08/02/2022, 1:50 PM
    Hi all, is there any way in kafka Ingestion recipe to add domain I cant fine it under transformers. Thank you!
    g
    m
    • 3
    • 3
  • b

    big-zoo-81740

    08/02/2022, 1:56 PM
    Hey all, not sure if this is the right place to be asking this, but I'm setting up lookml ingestion and keep getting an error saying that the directory is not found. I'm using the directory of my lookml repo/files using the
    base_folder
    config option,
    /home/ubuntu/github/myreponame
    , but I keep getting an error saying it can't find the directory or it doesn't exist. The folder has r/w permissions, so datahub should be able to read from it. Is there something I am obviously doing wrong? Does the repo need to be located in a specific folder in order for the
    base_folder
    config to be able to read from it?
    g
    m
    • 3
    • 4
  • g

    green-lion-58215

    08/02/2022, 8:33 PM
    Hello all, I am trying to ingest deltalake table metadata to datahub using the new delta-lake ingestion process. Can someone let me know how I can use an AWS role to configure s3 permission? I don’t want to use aws access key and secret key. I am using a lambda function to call the datahub rest. So ideally I want to use the lambda’s role to access s3. an example would be helpful that shows how the aws_role is to be provided.
    g
    • 2
    • 8
  • p

    proud-accountant-49377

    08/02/2022, 3:16 PM
    Hi everyone! I have a problem, the ingested data (glossaryterms) does not appear in the front end, it only appears if the glossaryterm is created via API. But actually ,in glossary data appears correctly. How can I fix it? Thanks😊
    plus1 1
    l
    b
    • 3
    • 2
  • g

    gifted-knife-16120

    08/03/2022, 10:19 AM
    hi team, i can't run this command
    datahub delete --urn "urn:li:assertion:35b8d904a367f5d02b07b20bac478408" --soft
    error msg:
    OperationalError: ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: Unknown aspect status for entity assertion\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)\n\tat com.linkedin.metadata.resources.entity.AspectResource.ingestProposal(AspectResource.java:133)
    m
    • 2
    • 2
  • r

    rich-salesmen-55640

    08/03/2022, 11:55 AM
    Hi there, a question regarding custom SQLAlchemy DBs: I am wondering how the process of adding a custom SQLAlchemydriver to the Docker quickstart looks like. The info I got was to put the driver file into the
    datahub-actions
    image. However, unixODBC seems not to be present in that image. I saw
    datahub-gms
    references MySQLs JDBC driver. Would that be the appropriate image to put a custom driver file? Does both JDBC or ODBC work or only the first?
    m
    • 2
    • 4
  • e

    echoing-alligator-70530

    08/03/2022, 1:44 PM
    Hi there, Is there a way to delete by source for eg I had done a file based lineage ingestion and there are a lot of entities is there a quick way to delete that lineage?
    g
    m
    • 3
    • 4
  • a

    alert-football-80212

    08/03/2022, 1:58 PM
    Hi all, when I use datahub cli to ingest a recipe how can i control on the env (PROD,DEV,STG) datahub ingest -c recipe.yml
    b
    • 2
    • 2
  • a

    alert-football-80212

    08/03/2022, 3:09 PM
    Hi all, I did ingestion to kafka with topic and schema and when enter to the topic in the UI I see under schema "no data" Does anyone know why?
    Copy code
    source:
      type: "kafka"
      config:
        # Coordinates
        env: $ENV
        connection:
          bootstrap: $KAFKA_BOOTSTRAP_SERVER
          consumer_config:
            security.protocol: "SASL_SSL"
            sasl.mechanism: "PLAIN"
            sasl.username: "user_name"
            sasl.password: $KAFKA_PASSWORD
          schema_registry_url: $SCHEMA_REGISTRY_URL
        topic_patterns:
          allow:
            - $TOPIC_NAME
        topic_subject_map:
          topicName-value: $SCHEMA_NAME
    transformers:
      - type: "simple_add_dataset_ownership"
        config:
          owner_urns:
            - some_owner_name
    h
    • 2
    • 1
  • m

    mysterious-pager-59554

    08/03/2022, 3:27 PM
    Hi Team, Can the source "data lake files" be used to ingest data from Azure Data Lake or is it just limited to AWS (edited)
    g
    m
    g
    • 4
    • 4
  • f

    full-toddler-4661

    08/03/2022, 5:03 PM
    Hello! I am trying to run ingestion for business-glossary metadata to a locally hosted instance of DataHub and running into some issues that I haven’t seen before. I am using a recipe file and an actual business glossary file. Was able to run fine last month but having issues now. Is there anybody available to help?
    h
    • 2
    • 2
  • c

    colossal-sandwich-50049

    08/03/2022, 9:24 PM
    Hello, I am looking to understand the codebase in the
    metadata-models
    repo and have the following question: I notice that
    upstreamLineage
    is a dataset aspect but that it is not listed in the aspects list for dataset in the `entity-registry.yaml`; can someone advise how this is the case? • https://github.com/datahub-project/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/dataset/UpstreamLineage.pdl • https://github.com/datahub-project/datahub/blob/master/metadata-models/src/main/resources/entity-registry.yml Edit: in general, I see that the aspects under
    metadata-models/src/main/pegasus/com/linkedin/dataset
    correspond to the aspects below, but the
    entity-registry.yaml
    doesn't seem to correspond with this • https://datahubproject.io/docs/generated/metamodel/entities/dataset#aspects
    m
    c
    • 3
    • 6
  • a

    alert-fall-82501

    08/04/2022, 6:24 AM
    Please suggest on this
  • a

    alert-fall-82501

    08/04/2022, 6:26 AM
    source: type: "delta-lake" config: base_path: "s3a://dt.lakehouse.uevents/eventsData/us-west-1" s3: aws_config: aws_access_key_id: aws_secret_access_key: sink: type: "datahub-rest" config: server: "http://localhost:8080"
  • a

    alert-fall-82501

    08/04/2022, 6:26 AM
    config file
  • a

    alert-fall-82501

    08/04/2022, 7:25 AM
    Copy code
    /usr/lib/python3/dist-packages/paramiko/transport.py:219: CryptographyDeprecationWarning: Blowfish has been deprecated
      "class": algorithms.Blowfish,
    [2022-08-03 14:24:08,362] INFO     {datahub.cli.ingest_cli:170} - DataHub CLI version: 0.8.41.2
    [2022-08-03 14:24:08,410] INFO     {datahub.ingestion.run.pipeline:163} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://localhost:8080>
    [2022-08-03 14:24:08,749] ERROR    {logger:26} - Please set env variable SPARK_VERSION
    [2022-08-03 14:24:08,875] ERROR    {datahub.ingestion.run.pipeline:127} - 's3'
    [2022-08-03 14:24:08,876] INFO     {datahub.cli.ingest_cli:119} - Starting metadata ingestion
    [2022-08-03 14:24:08,876] INFO     {datahub.cli.ingest_cli:137} - Finished metadata ingestion
    
    Failed to configure source (delta-lake) due to 's3'
    a
    g
    m
    • 4
    • 31
  • c

    calm-dinner-63735

    08/04/2022, 11:33 AM
    trying to ingest glue data using python code , i did install pip install ‘acryl-datahub[glue]’ but still getting the error
    g
    g
    m
    • 4
    • 11
  • c

    calm-dinner-63735

    08/04/2022, 12:20 PM
    Hi , I am getting below error while ingesting data form glue using python code, the error is inside thread
    h
    • 2
    • 3
  • f

    full-toddler-4661

    08/03/2022, 7:46 PM
    Also having issues with the quickstart command via the CLI
    q
    • 2
    • 3
  • b

    brave-pencil-21289

    08/04/2022, 12:31 PM
    While doing the reingestion to mssql server we are getting error "failed to validate record with class".
    h
    • 2
    • 1
  • g

    gifted-knife-16120

    08/04/2022, 3:17 PM
    hi all. assume I am login as root user. how can I see the token key and value? thanks
  • l

    little-spring-72943

    08/04/2022, 4:16 PM
    Can I override "Platform" value using recipe? I am ingesting data as Hive from Azure but actually we have data in Delta Lake format. I understand Delta Lake is now support with AWS only. For future compatibility/migration (when Azure is supported), I would like to override "hive" as "delta-lake" while still ingesting using hive connector now.
    b
    • 2
    • 2
  • g

    green-lion-58215

    08/04/2022, 7:57 PM
    Hi all, is it possible to ingest all tables under a given s3 path using delta-lake source? For me the ingestion only works when I give each table’s folder path as “base_path”. I assumed, using base path, we can ingest multiple tables nested under a root folder?
  • d

    dazzling-insurance-83303

    08/04/2022, 8:14 PM
    Hello: Just to confirm, ingestion happens for one database at a time, as shown below, right?
    Copy code
    source:
        type: postgres
        config:
          # Coordinates
          host_port: db_server-001:5432
          database: db001
    I cannot cycle through DBs in postgres cluster/instance by providing the following syntax, right?
    Copy code
    source:
        type: postgres
        config:
          # Coordinates
          host_port: db_server-001:5432
          database:
            - db001
            - db002
            - db003
    Thanks!
  • r

    rapid-fall-7147

    08/04/2022, 8:56 PM
    Hi guys is there an example yaml available to show how to enable field level lineage.
    b
    • 2
    • 1
  • c

    cool-actor-73767

    08/04/2022, 10:58 PM
    Hi All! Is there anyway for delete all dashboards and charts (Metabase/Redash) from metadata? I saw that is possible using datahub CLI and Curl command on https://datahubproject.io/docs/how/delete-metadata. datahub delete --entity_type dashboard --platform redash I received this error: Failed to execute search query with b'<html>\n<head>\n<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>\n<title>Error 401 Unauthorized to perform this action.</title>\n</head>\n<body><h2>*HTTP ERROR 401 Unauthorized to perform this action*.</h2>\n<table>\n<tr><th>URI&lt;/th&gt;&lt;td&gt;/entities&lt;/td&gt;&lt;/tr&gt;\n&lt;tr&gt;&lt;th&gt;STATUS</th><td>401</td></tr>\n<tr><th>MESSAGE:</th><td>*Unauthorized to perform this action*.</td></tr>\n<tr><th>SERVLET:</th><td>restliRequestHandler</td></tr>\n</table>\n<hr/><a href="https://eclipse.org/jetty">Powered by Jetty:// 9.4.46.v20220331</a><hr/>\n\n</body>\n</html>\n' I think that this error ocurrs because my ingestion process use UI and Token to connect GMS, but datahub delete command doesn't allow pass Token. Then I trying using Curl passing Token - curl "http://localhost:8080/entities?action=delete" -X POST --header 'Authorization: Bearer {token}' --data '{"urn": "urnlidashboard:(*redash,10)*"}' . Worked deleting dash by dash: *--data '{"urn": "urnlidashboard:(redash,10)"}' . Th*is way I don't know how to delete all dash at once. How to pass a URN to delete all using Curl? Is there other way? Remember that I need Token to connect GMS. Any help I appreciate. tks!
    g
    • 2
    • 3
  • m

    microscopic-mechanic-13766

    08/05/2022, 10:33 AM
    Good morning everyone, one quick question: Do recipes allow to ingest from a list of databases or to indicate the databases you don't want to ingest?? For example lets say in my source there are 3 DBs: DB1, DB2 and DB3. Could it be possible to do the following?
    Copy code
    source:
      type: <source>
      config:
        host_port: <host>:<port>
        database: DB1, DB2
    or
    Copy code
    source:
      type: <source>
      config:
        host_port: <host>:<port>
        database: !DB3
    d
    • 2
    • 2
  • f

    few-grass-66826

    08/05/2022, 12:19 PM
    Hi guys is it possible to load stats only for some tables in snowflake ? If no how can I disable it for all?
    g
    • 2
    • 1
  • f

    future-student-30987

    08/05/2022, 1:19 PM
    Hi guys, how can I delete all dashboards and charts from Metabase and Redash, I'd tried: datahub delete --entity_type dashboard --platform redash datahub delete --entity_type dashboard --platform metabase however I catch this error: Failed to execute search query - Error 401 Unauthorized to perform this action. Is there any way to fix or something? Thanks!
    g
    • 2
    • 2
1...585960...144Latest