https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • k

    kind-dawn-17532

    05/25/2022, 7:34 PM
    What is the correct way to publish “createdby” information to datahub? Be it about an entity or lineage aspect?
  • c

    cool-painting-92220

    05/25/2022, 10:32 PM
    Hi all! I have a MySQL database and Snowflake database with identical tables - the MySQL database has information entered in the column descriptions, and some of this data is pulled into Snowflake as our central location, where our usage is highest. I'm looking to pull the column descriptions from the MySQL tables and apply that metadata to the respective Snowflake tables that have already been loaded into DataHub. What would be the best approach for this?
  • r

    rich-policeman-92383

    05/26/2022, 10:08 AM
    Hello While browsing a big dataset lineage on datahub we are getting below error:
    Copy code
    May 26, 2022 @ 15:18:59.000	/datahub_datahub-gms.3.uh37zjgealu56t32nr2ct4836	java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to retrieve entities of type UsageType
    May 26, 2022 @ 15:18:59.000	/datahub_datahub-gms.3.uh37zjgealu56t32nr2ct4836	Caused by: java.lang.RuntimeException: Failed to retrieve entities of type UsageType
    May 26, 2022 @ 15:18:59.000	/datahub_datahub-gms.3.uh37zjgealu56t32nr2ct4836	Caused by: java.lang.RuntimeException: Failed to batch load Usage Stats
    May 26, 2022 @ 15:18:59.000	/datahub_datahub-gms.3.uh37zjgealu56t32nr2ct4836	Caused by: java.lang.RuntimeException: Failed to load Usage Stats for resource urn:li:dataset:(urn:li:dataPlatform:hive,edw_base.clickstream_base_dly,PROD)
    May 26, 2022 @ 15:18:59.000	/datahub_datahub-gms.3.uh37zjgealu56t32nr2ct4836	Caused by: com.linkedin.r2.RemoteInvocationException: com.linkedin.r2.RemoteInvocationException: Failed to get response from server for URI <http://localhost:8080/usageStats>
    May 26, 2022 @ 15:18:59.000	/datahub_datahub-gms.3.uh37zjgealu56t32nr2ct4836		at com.linkedin.restli.internal.client.ExceptionUtil.wrapThrowable(ExceptionUtil.java:135)
    May 26, 2022 @ 15:18:59.000	/datahub_datahub-gms.3.uh37zjgealu56t32nr2ct4836	Caused by: com.linkedin.r2.RemoteInvocationException: Failed to get response from server for URI <http://localhost:8080/usageStats>
    May 26, 2022 @ 15:18:59.000	/datahub_datahub-gms.3.uh37zjgealu56t32nr2ct4836	Caused by: java.util.concurrent.TimeoutException: Exceeded request timeout of 10000ms
  • g

    great-cpu-72376

    05/31/2022, 1:41 PM
    Hi, I am trying datahub integration with airflow, in particular, inlet and outlet. I defined an operator in this way:
    Copy code
    test_pull_task = PythonOperator(
        task_id="test_pull_task",
        python_callable=test,
        op_kwargs=test_operator_task.output,
        inlets={"dataset": [Dataset(platform="file",name="/test/inlet/input.txt")]},
        outlets={"dataset": [Dataset(platform="file",name="/test/outlet/output.txt")]}
    )
    in the task log I see this:
    Copy code
    [2022-05-31, 13:34:40 UTC] {_lineage_core.py:80} INFO - Emitted from Lineage: DataJob(id='test_pull_task', urn=<datahub.utilities.urns.data_job_urn.DataJobUrn object at 0x7efdb7ac1e50>, flow_urn=<datahub.utilities.urns.data_flow_urn.DataFlowUrn object at 0x7efdbb025790>, name=None, description=None, properties={'task_id': "'test_pull_task'", '_outlets': '[]', 'label': "'test_pull_task'", '_downstream_task_ids': '[]', '_inlets': '[]', 'email': "['***.it.sgn@u-blox.com']", '_task_type': "'PythonOperator'", '_task_module': "'***.operators.python'", 'execution_timeout': 'None', 'depends_on_past': 'False', 'wait_for_downstream': 'False', 'sla': 'None', 'trigger_rule': "'all_success'"}, url='<http://localhost:8080/taskinstance/list/?flt1_dag_id_equals=test_lineage_drop_partition&_flt_3_task_id=test_pull_task>', tags={'drop', 'maya', 'postgresql', 'dba', 'partition'}, owners={'it-app-svc'}, inlets=[], outlets=[], upstream_urns=[<datahub.utilities.urns.data_job_urn.DataJobUrn object at 0x7efdb7ad5b50>])
    inlet and outlet are empty, why? I have copied these lines from the example https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub_provider/example_dags/lineage_backend_demo.py Another question, if I want to pass the dataset arrays dynamically, what should I do?
  • c

    calm-dinner-63735

    06/03/2022, 12:59 PM
    seems like this UI is broken or i am missing something
  • h

    high-hospital-85984

    06/06/2022, 5:16 PM
    Could someone help me understand is we're hitting scaling issues or if there is some other problem?
  • a

    abundant-painter-6

    06/07/2022, 2:33 PM
    image.png
  • n

    nutritious-bird-77396

    06/10/2022, 7:59 PM
    I see both
    clientId
    and
    telemetryClientId
    has been added to the database. I am assuming the name of the aspect has been changed from one release to the other not sure I don't see any evidence from Git on this assumption. @early-lamp-41924 Would it be safer to update
    clientId
    aspect to
    telemetryClientId
    ?
  • p

    plain-napkin-77279

    06/13/2022, 6:13 AM
    Hello team, I am using the version 0.8.36 of Datahub, and i am having a problem with analytics .... I had this error after a fresh installation, and even after ingesting my metadata to Datahub. It started form the begging, and this are my gms logs, any suggesting please
  • h

    hallowed-machine-2603

    06/15/2022, 1:37 AM
    Hi Team, I install DataHub using datahub docker quickstar, but I can find some containers (mce-consumer, mae-consumer). So I pull the image and run linkedin/mae-consumer and mce-consumer, but I get the error message. connection refused. I think that I allowed improper port at mce/mae-consumer. I have two question. 1. How can I mae/mce-consumer container for using quickstart 2. If I use docker image directly in DockerHub, what is the proper command for running mae/mce-consumer container thx 🙂 ps. There are containers which are installed through quickstart below.
  • m

    millions-notebook-72121

    06/23/2022, 9:40 AM
    Hi - I am actually trying to ingest the exact same sample glossary file but I'm not even getting it to ingest fine, I am having the below errors. The recipe file is also identical (with the difference of the sink that is our datahub instance). I may be missing something trivial, not sure if you've run into this?
  • q

    quiet-arm-91745

    06/28/2022, 4:12 PM
    hi, i'm trying to use airflow lineage to datahub. but there's none while the dag are success. i'm using cloud composer. i have install airflow plugin, add datahub_conn_id and add configuration. when trying
    lineage_backend_demo
    got this log
    Copy code
    [2022-06-28, 16:05:30 UTC] {_lineage_core.py:67} INFO - Emitted from Lineage: DataFlow(urn=<datahub.utilities.urns.data_flow_urn.DataFlowUrn object at 0x7f2a4526f550>, id='datahub_lineage_backend_demo', orchestrator='airflow', cluster='prod', name=None, description="An example DAG demonstrating the usage of DataHub's Airflow lineage backend.\n\n", properties={'timezone': "'UTC'", 'start_date': '1656201600.0', 'fileloc': "'/home/airflow/gcs/dags/lineage_backend_demo.py'", 'tags': "['example_tag']", 'catchup': 'False', 'is_paused_upon_creation': 'None', '_default_view': "'tree'", '_access_control': 'None'}, url='<https://xxxxxx-dot-asia-southeast1.composer.googleusercontent.com/tree?dag_id=datahub_lineage_backend_demo>', tags={'example_tag'}, owners={'airflow'})
    [2022-06-28, 16:05:30 UTC] {_lineage_core.py:80} INFO - Emitted from Lineage: DataJob(id='run_data_task', urn=<datahub.utilities.urns.data_job_urn.DataJobUrn object at 0x7f2a45210e80>, flow_urn=<datahub.utilities.urns.data_flow_urn.DataFlowUrn object at 0x7f2a4526ff70>, name=None, description=None, properties={'_downstream_task_ids': '[]', 'label': "'run_data_task'", '_inlets': '["Dataset(platform=\'snowflake\', name=\'mydb.schema.tableA\', env=\'PROD\')", "Dataset(platform=\'snowflake\', name=\'mydb.schema.tableB\', env=\'PROD\')"]', 'task_id': "'run_data_task'", 'execution_timeout': '300.0', 'email': "['<mailto:jdoe@example.com|jdoe@example.com>']", '_outlets': '["Dataset(platform=\'snowflake\', name=\'mydb.schema.tableC\', env=\'PROD\')"]', '_task_type': "'BashOperator'", '_task_module': "'airflow.operators.bash'", 'depends_on_past': 'False', 'wait_for_downstream': 'False', 'trigger_rule': "'all_success'", 'sla': 'None'}, url='<https://xxxxxxxx-dot-asia-southeast1.composer.googleusercontent.com/taskinstance/list/?flt1_dag_id_equals=datahub_lineage_backend_demo&_flt_3_task_id=run_data_task>', tags={'example_tag'}, owners={'airflow'}, inlets=[<datahub.utilities.urns.dataset_urn.DatasetUrn object at 0x7f2a4522d5b0>, <datahub.utilities.urns.dataset_urn.DatasetUrn object at 0x7f2a4522d580>], outlets=[<datahub.utilities.urns.dataset_urn.DatasetUrn object at 0x7f2a4522d550>], upstream_urns=[])
    is there any steps that i missed? thanks before
  • n

    nutritious-bird-77396

    06/30/2022, 9:57 PM
    @helpful-optician-78938 Any updates on this? I ran with one of the latest version of client as well
    0.8.39.1rc8
    still the same error
  • s

    steep-midnight-37232

    07/05/2022, 2:15 PM
    image.png
  • s

    steep-soccer-91284

    07/20/2022, 9:22 AM
    I've faced problem with quickstarting Datahub
    👀 1
  • w

    witty-butcher-82399

    07/22/2022, 12:50 PM
    Has anyone experienced this issue?
  • f

    faint-translator-23365

    08/01/2022, 7:51 PM
    I was trying to setup LDAP for datahub-frontend, I was able to configure using com.sun.security.auth.module.LdapLoginModule and also org.eclipse.jetty.server.server.plus.jaas.spi.LdapLoginModule but these modules doesn't have the option to get email, first name, and other user attributes, hence I cannot get the users list inside datahub-frontend so I'm not able to create groups. Can anyone please tell how to get these user attributes or is there any Java module that can do this, please share the sample configuration if possible, thanks! Slack Conversation
  • a

    ancient-apartment-23316

    08/02/2022, 7:01 PM
    Hello @incalculable-ocean-74010, yes I used it for deployment. When I deployed for the first time everything worked without ingress, I just used the external IP address of the k8s service and it was accessible from the Internet. Currently it not working for some reason. Also, I tried to use ingress as described in the documentation you provided. But I don’t have to use https, http is fine for my purposes, so I removed port 433:
    Copy code
    ingress:
        enabled: true
        annotations:
          <http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: alb
          <http://alb.ingress.kubernetes.io/scheme|alb.ingress.kubernetes.io/scheme>: internet-facing
          <http://alb.ingress.kubernetes.io/target-type|alb.ingress.kubernetes.io/target-type>: instance
          <http://alb.ingress.kubernetes.io/subnets|alb.ingress.kubernetes.io/subnets>: subnet-1, subnet-2
    #      <http://alb.ingress.kubernetes.io/certificate-arn|alb.ingress.kubernetes.io/certificate-arn>: <<certificate-arn>>
          <http://alb.ingress.kubernetes.io/inbound-cidrs|alb.ingress.kubernetes.io/inbound-cidrs>: 0.0.0.0/0
          <http://alb.ingress.kubernetes.io/listen-ports|alb.ingress.kubernetes.io/listen-ports>: '[{"HTTP": 80}]'
    #      <http://alb.ingress.kubernetes.io/actions.ssl-redirect|alb.ingress.kubernetes.io/actions.ssl-redirect>: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
        hosts:
          - host: "<http://dev-datahub.qwerty.com|dev-datahub.qwerty.com>"
    #        redirectPaths:
    #          - path: /*
    #            name: ssl-redirect
    #            port: use-annotation
            paths:
              - /*
    and the datahub is still inaccessible to me from the internet. I think maybe this option could help me? https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/subcharts/datahub-frontend/values.yaml#L48
  • q

    quick-pizza-8906

    08/03/2022, 3:24 PM
    Hello, I am trying to query Datahub using below query:
    Copy code
    query count($urn: String!) {
      corpGroup(urn: $urn) {
          relationships(input: {
            types: ["OwnedBy"]
            direction: INCOMING
          }) {
            total
          }
        }
      searchAcrossEntities(input: {
        types: [],
        query: "*",
        filters: [
          {
            field: "owners",
            value: $urn
          }
        ]
      }) {
        total
      }
    }
    }
    Basically I want to count datasets owned by the group, unfortunately the counts from
    corpGroup
    and
    searchAcrossEntities
    do not match - it seems
    corpGroup
    counts in also softly-removed sets. Is it intended? Can it be somehow avoided? It seems
    searchAcrossEntities
    is giving the correct count but I need
    corpGroup
    to give the correct count as well, as I noticed the problem by running query:
    Copy code
    query getGroupsCount {
      search(
        input: {type: CORP_GROUP, query: "*", filters: [<some filters here>]}
      ) {
        searchResults {
          entity {
            urn
            ... on CorpGroup {
              name
              properties {
                displayName
              }
              relationships(input: {
                types: ["OwnedBy"]
                direction: INCOMING
              }) {
                total
              }
            }
          }
        }
      }
    }
    And it is impossible to run such aggregation with
    searchAcrossEntities
    I understand. To add to that I run first query against demo and I got even more absurd result (see below). Any hints/tips what am I doing wrong here?
    m
    • 2
    • 1
  • f

    faint-translator-23365

    08/04/2022, 4:58 PM
    Hi, when I'm trying to ingest LDAP as a source, I'm getting this error: server is not configured to pass through control 1.2.840.113556.1.4.319 , can anyone please help on this. Thanks! Slack Conversation
  • f

    famous-florist-7218

    08/05/2022, 2:37 AM
    Hi @big-carpet-38439, do you know why I delete some ingestion jobs on UI, but none of them were affected 🤔
  • a

    adamant-van-21355

    08/11/2022, 1:28 PM
    Hi again 👋🏼 current state of the latest release is a bit confusing. While there is a
    v0.8.43
    release announced, the helm chart version (and repository) still point to
    v0.8.42
    . 1. Why is that? Is this helm chart planned to be updated soon to
    v0.8.43
    ? 2. If no, is the
    v0.8.42
    helm release safe (bug-free) to upgrade? Thank you!
  • a

    aloof-leather-92383

    08/11/2022, 6:55 PM
    Hi, I'm attempting to do a kafka ingestion with a recipe .yml but I am receiving this error message. I installed the google package directly using pip to see if that would get rid of the error message but it is still there. I set up datahub using the quickstart and docker. Thank you!
  • a

    ambitious-cartoon-15344

    08/12/2022, 9:42 AM
    HI team, does anyone know how to mwaa(airflow) set datahub.cluster = "dev"? We're having problems and I haven't looked up the information
  • b

    busy-petabyte-37287

    08/12/2022, 2:12 PM
    Captura de Pantalla 2022-08-12 a la(s) 9.08.30.png
  • b

    busy-petabyte-37287

    08/12/2022, 2:13 PM
    image.png
  • g

    great-motherboard-71467

    08/16/2022, 1:00 PM
    Updating and answering to my self, for future purposes. I were able to authenticate to LDAP after reading the whole documentation https://docs.oracle.com/javase/8/docs/jre/api/security/jaas/spec/com/sun/security/auth/module/LdapLoginModule.html It`s allowed me to login after using following configuration:
    Copy code
    WHZ-Authentication {
      com.sun.security.auth.module.LdapLoginModule sufficient
      userProvider="<ldaps://ldaps.some.server.eu:636/cn=users,cn=accounts,dc=some,dc=domain,dc=com>"
      authzIdentity="{USERNAME}"
      userFilter="(&(objectClass=person)(uid={USERNAME}))"
      java.naming.security.authentication="simple"
      debug="true"
      useSSL="true";
    };
    as you can see there is a change which works in my case i replaced authIdentity with authzIdentity
    Copy code
    authzIdentity="{USERNAME}"
    Regarding to the documentation:
    authzIdentity=authz_id
    This option specifies an authorization identity for the user.
    authz_id
    is any string name. If it comprises a single special token with curly braces then that token is treated as a attribute name and will be replaced with a single value of that attribute from the user's LDAP entry. If the attribute cannot be found then the option is ignored. When this option is supplied and the user has been successfully authenticated then an additional
    UserPrincipal
    is created using the authorization identity and it is associated with the current
    Subject
    .
  • n

    nutritious-bird-77396

    08/17/2022, 4:07 PM
    Seeing timeout issues with linkedin.jfrog.io Anyone else facing this when building datahub-frontend?
  • b

    bland-barista-59197

    09/02/2022, 6:01 PM
    Hi Team Any help is appreciated. Here is my infrastructure setup • Datahub on GKE (Google Kubernetes Engine) private cluster • Enable Anthos Service Mesh. • Internal Loadbalancer. Everything works fine till this point. But after enforcing mTLS namespace wide I’m getting 503 with response
    upstream connect error or disconnect/reset before headers. reset reason: connection termination
  • b

    breezy-shoe-41523

    09/06/2022, 9:55 AM
    image.png
1...113114115...119Latest