https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • l

    lively-jackal-83760

    09/08/2022, 10:37 AM
    Hi guys. I'm working on my Spring boot app which uses io.acryl:datahub-client dependency. Now trying to update to the new version 0.8.44 and got this
    Copy code
    The calling method's class, org.springframework.beans.factory.xml.XmlBeanDefinitionReader, is available from the following locations:
    
        jar:file:/Users/dseredenko/.m2/repository/org/springframework/spring-beans/5.3.21/spring-beans-5.3.21.jar!/org/springframework/beans/factory/xml/XmlBeanDefinitionReader.class
        jar:file:/Users/dseredenko/.m2/repository/io/acryl/datahub-client/0.8.44/datahub-client-0.8.44.jar!/org/springframework/beans/factory/xml/XmlBeanDefinitionReader.class
    seems like datahub-client has its own springframework which conflicts with mine. Don't see any dependency in pom file of datahub-client and don't understand how to exclude spring from datahub-client. Can someone help?
    m
    • 2
    • 3
  • b

    bland-orange-13353

    09/08/2022, 11:41 AM
    This message was deleted.
    h
    a
    • 3
    • 2
  • a

    agreeable-belgium-70840

    09/08/2022, 12:33 PM
    So, I am having this issue: I there was a glossary term and it lost it's name, it is named just by the urn id. I hard-deleted it from the datahub cli but it is still visible in the datahub UI. I ran the restore-indices-job successfully, but I can still see it. I also did a search in the database for this particular urn, it is not there. How can I get rid of it from the UI? Thanks
    b
    • 2
    • 11
  • a

    astonishing-byte-5433

    09/08/2022, 5:55 PM
    Hey it seems like in docker-compose-without-neo4j.quickstart.yml:
    Copy code
    volumes:
        - ../mysql/init.sql:/docker-entrypoint-initdb.d/init.sql
    Throwing following error when running first time on a new machine and causing the mysql container to fail:
    Can't initialize batch_readline - may be the input source is a directory or a block device.
    After running the compose file a second time it doesn't throw this issue again and everything works fine. Also removed the volume from the script and it seems to run fine on a fresh machine.
  • c

    cuddly-butcher-39945

    09/08/2022, 2:42 PM
    Hey Gang! I'm facing an issue related to developing custom actions and getting Datahub to use these. I've attached my python scripts, as well as Some Notable errors I see when running ####->datahub --debug actions -c custom_action.yaml BTW, I did not see any relevant posts on slack related to this, hopefully this is not just PEBKAT :-)
    CustomActions.txt
    b
    • 2
    • 11
  • k

    kind-lunch-14188

    09/09/2022, 2:05 PM
    Hi All, I’m generating json file with schema:
    datahub get --urn "urn:li:dataset:(urn:li:dataPlatform:bigquery,project_id.dataset.table,DEV)" --aspect schemaMetadata  | jq >schema.json
    Then I was trying to upload this schema to new&“empty” datahub. To achieve that I tried with
    datahub ingest -c ingest_schema.dhub.yaml
    , where
    ingest_schema.dhub.yaml
    is as follow:
    Copy code
    # see <https://datahubproject.io/docs/generated/ingestion/sources/file> for complete documentation
    source:
      type: "file"
      config:
        filename: schema.json
    
    # see <https://datahubproject.io/docs/metadata-ingestion/sink_docs/datahub> for complete documentation
    sink:
      type: "datahub-rest"
      config:
        server: "<https://gms.topic-dev.name.dev>"
    anyway I’m still receiving the same error:
    Copy code
    Command failed with com.linkedin.pegasus2avro.usage.UsageAggregation is missing required field: bucket.
    In fact there is no
    UsageAggregation
    in generated json file. My good guess I’m generating json file in wrong way. Are you able to give me a hint how to do it correctly?
  • b

    brainy-tent-14503

    09/09/2022, 2:41 PM
    👋 Hello friends! - I am building and testing datahub locally and recently the
    spark-lineage
    smoke tests fail because of a difference in the browse path. I am working off the main branch with only a few slight fixes to get the test to run (see diff). I am wondering if the golden json files might be out of date from what is expected or if something else is off. For example the tests expected this
    Copy code
    'com.linkedin.common.BrowsePaths': {
    					'paths': ['/spark/spark_spark-master_7077/pythonhdfsin2hivecreatetable']
    				}
    however the actual looks like this with the
    /pythonhdfsin2hivecreatetable
    part missing.
    Copy code
    'com.linkedin.common.BrowsePaths': {
    					'paths': ['/spark/spark_spark-master_7077']
    				}
    the part that is missing is seen in the properties as
    appName
    so ... maybe I should update the expected golden json?
    • 1
    • 1
  • b

    billowy-truck-48700

    09/09/2022, 7:16 PM
    Hi All, I have question regarding SqlAlchemy UI Ingestion if I need to add teradatasqlalchemy dialect. I created own image based based on acryldata/datahub-actions image where I installed acryl-datahub[sqlalchemy] + teradatasqlalchemy and CLI ingestion works. But UI ingestion create virtual env for each ingestion type under this folder /tmp/datahub/ingest Is there a way how to define for datahub-actions what other python libraries to install as part of virtual env. creation?
    h
    b
    • 3
    • 2
  • b

    bitter-lizard-32293

    09/09/2022, 8:00 PM
    👋 Hey folks. We've been trying to build some functionalities on top of lineage search (the
    searchAcrossLineage
    graphQL query) and we've been seeing super high latencies (> 10s) which executing that query. We spent some time digging into things and it looks like we're spending the bulk of our time in the getLineage call in the ESGraphQueryDao class (as we use ES as our graph store too). I did find one minor bug in that the search lineage results were meant to be cached but that is actually not being done - https://github.com/datahub-project/datahub/pull/5892. This does help us fix repeated calls for the same URN, but first time calls are still taking a while. Does anyone have any recommendations on how we could tune / speed things up here? Ballparks wise our
    graph_service_v1
    index has around 36M docs (4.8GB on disk) and is currently running 1 shard and 1 replica (wonder if this is too low)
    l
    o
    • 3
    • 16
  • g

    great-branch-515

    09/11/2022, 7:06 PM
    @here We are running datahub on EKS (Kubernetes). We want to setup newrelic monitoring for datahub. Any pointers?
  • l

    late-rocket-94535

    09/12/2022, 7:45 AM
    Hi, team. I use transformer for owenership in yaml like
    Copy code
    transformers:
      - type: "simple_add_dataset_ownership"
        config:
          owner_urns:
            - "urn:li:corpGroup:edwh"
          ownership_type: "TECHNICAL_OWNER"
    and after last upgrading datahub (v0.8.44) I see errors like
    [2022-09-12, 07:08:52 UTC] {process_utils.py:173} INFO - [[34m2022-09-12, 07:08:52 UTC[0m] {[34mpipeline.py:[0m57} ERROR[0m -  failed to write record with workunit txform-urn:li:dataPlatform:postgres-edwh.raw_tn_marketing.t_customer-TEST-ownership with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: Failed to validate record with class com.linkedin.common.Ownership: ERROR :: /lastModified/actor :: "Provided urn " is invalid\n\n\tat com.linkedin.metadata.resources.entity.AspectResource.lambda$ingestProposal$3(AspectResource.java:149)', 'message': 'Failed to validate record with class com.linkedin.common.Ownership: ERROR :: /lastModified/actor :: "Provided urn " is invalid', 'status': 422}) and info {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: Failed to validate record with class com.linkedin.common.Ownership: ERROR :: /lastModified/actor :: "Provided urn " is invalid\n\n\tat com.linkedin.metadata.resources.entity.AspectResource.lambda$ingestProposal$3(AspectResource.java:149)', 'message': 'Failed to validate record with class com.linkedin.common.Ownership: ERROR :: /lastModified/actor :: "Provided urn " is invalid', 'status': 422}[0m
    Is it bug or not?
    h
    • 2
    • 3
  • s

    sparse-quill-63288

    09/12/2022, 12:27 PM
    Hi Everyone, I have configured my snowflake account to datahub. But I am getting a below error and also I am not able to see the lineage part could you please help me on it. I tried locating what is the error, unfortunately I was not able to locate it. Errors: Validation error of type FieldUndefined: Field 'latestVersion' in type 'GetSchemaBlameResult' is undefined @ 'getSchemaBlame/latestVersion' (code undefined) The variables input contains a field name 'categories' that is not defined for input object type 'GetSchemaBlameInput' (code undefined)
    h
    • 2
    • 3
  • w

    witty-journalist-16013

    09/12/2022, 3:50 PM
    On 0.8.43.6 and the dbt config with S3 connection doesn't seem to work. Throws a vaildation saying
    aws_connection
    is an extra field.
    h
    • 2
    • 1
  • w

    witty-journalist-16013

    09/12/2022, 3:52 PM
    Copy code
    '1 validation error for DBTConfig\n'
               'aws_connection\n'
               '  extra fields not permitted (type=value_error.extra)\n',
  • w

    witty-journalist-16013

    09/12/2022, 3:57 PM
    I'm using acryldata/datahub-actions 0.0.6 and it looks like that has 0.8.33 of the datahub cli tools in it?
  • e

    early-airplane-84388

    09/12/2022, 6:24 PM
    Hey Team Sharing the error from the from the #ingestion channel thread as @steep-midnight-37232 also reported it for 0.8.44. Getting similar error for multiple integrations.
    Copy code
    ~~~~ Execution Summary ~~~~
    
    RUN_INGEST - {'errors': [],
     'exec_id': '8cc88fff-6200-465e-9d09-703160d417fc',
     'infos': ['2022-09-12 13:20:11.799891 [exec_id=8cc88fff-6200-465e-9d09-703160d417fc] INFO: Starting execution for task with name=RUN_INGEST',
               '2022-09-12 13:20:11.801144 [exec_id=8cc88fff-6200-465e-9d09-703160d417fc] INFO: Caught exception EXECUTING '
               'task_id=8cc88fff-6200-465e-9d09-703160d417fc, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 121, in execute_task\n'
               '    self.event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
               '    return f.result()\n'
               '  File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
               '    raise self._exception\n'
               '  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
               '    result = coro.send(None)\n'
               '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 71, in execute\n'
               '    validated_args = SubProcessIngestionTaskArgs.parse_obj(args)\n'
               '  File "pydantic/main.py", line 521, in pydantic.main.BaseModel.parse_obj\n'
               '  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__\n'
               'pydantic.error_wrappers.ValidationError: 1 validation error for SubProcessIngestionTaskArgs\n'
               'debug_mode\n'
               '  extra fields not permitted (type=value_error.extra)\n']}
    Execution finished with errors.
    I'm running DataHub in GCP Kubernetes Engine and have also tried uninstalling and reinstalling with helm.
    f
    • 2
    • 2
  • t

    thousands-solstice-2498

    09/13/2022, 5:51 AM
    Hi Team, please advise. Error while executing config command with args '--command-config /tmp/connection.properties --bootstrap-server kafka-886515205-1-1200869063.scus.kafka-sams-edf-stg.ms-df-messaging.stg-az-southcentralus-8.prod.us.walmart.net:9092,kafka-886515205-2-1200869066.scus.kafka-sams-edf-stg.ms-df-messaging.stg-az-southcentralus-8.prod.us.walmart.net:9092,kafka-886515205-3-1200869069.scus.kafka-sams-edf-stg.ms-df-messaging.stg-az-southcentralus-8.prod.us.walmart.net:9092,kafka-886515205-4-1200869072.scus.kafka-sams-edf-stg.ms-df-messaging.stg-az-southcentralus-8.prod.us.walmart.net:9092,kafka-886515205-5-1200869075.scus.kafka-sams-edf-stg.ms-df-messaging.stg-az-southcentralus-8.prod.us.walmart.net:9092,kafka-886515205-6-1200869078.scus.kafka-sams-edf-stg.ms-df-messaging.stg-az-southcentralus-8.prod.us.walmart.net:9092 --entity-type topics --entity-name _schemas --alter --add-config cleanup.policy=compact'
  • g

    great-branch-515

    09/13/2022, 6:22 AM
    @here I am trying build this grafana dashboard https://github.com/datahub-project/datahub/tree/master/docker/monitoring/grafana/dashboards I am not finding some of the metrics. (for example metrics_com_linkedin_metadata_resources_entity_EntityResource_search_Mean) Where I can find list of datahub metrics emitted by each service?
  • f

    fresh-cricket-75926

    09/13/2022, 8:31 AM
    Hi All, while adding member to an group , i m getting below error in datahub gms and was unable to add. Can anyone explain us what might be the issue here. ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler:21 - Failed to execute DataFetcher java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to migrate group membership for group urnlicorpGroup:c9a62167-dab3-4abd-a401-148feb10a8c4 when adding group members at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606) at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1596) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) Caused by: java.lang.RuntimeException: Failed to migrate group membership for group urnlicorpGroup:c9a62167-dab3-4abd-a401-148feb10a8c4 when adding group members at com.linkedin.datahub.graphql.resolvers.group.AddGroupMembersResolver.lambda$get$1(AddGroupMembersResolver.java:63) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ... 5 common frames omittedERROR c.datahub.graphql.GraphQLController:98 - Errors while executing graphQL query: "mutation addGroupMembers($groupUrn: String!, $userUrns: [String!]!) {\n addGroupMembers(input: {groupUrn: $groupUrn, userUrns: $userUrns})\n}\n", result: {errors=[{message=An unknown error occurred., locations=[{line=2, column=3}], path=[addGroupMembers], extensions={code=500, type=SERVER_ERROR, classification=DataFetchingException}}], data={addGroupMembers=null}, extensions={tracing={version=1, startTime=2022-09-13T081908.065Z, endTime=2022-09-13T081908.083Z, duration=17505949, parsing={startOffset=221585, duration=201921}, validation={startOffset=384735, duration=150709}, execution={resolvers=[{path=[addGroupMembers], parentType=Mutation, returnType=Boolean, fieldName=addGroupMembers, startOffset=446211, duration=16737643}]}}}}, errors: [DataHubGraphQLError{path=[addGroupMembers], code=SERVER_ERROR, locations=[SourceLocation{line=2, column=3}]}] 081910.157 [ForkJoinPool.commonPool-worker-7] WARN org.elasticsearch.client.RestClient:65 - request [POST http://elasticsearch-master:9200/graph_service_v1/_search?typed_keys=true&amp;max_concurrent_shard_requests=5&amp;ignore_unavailable=false&amp;expand_wildcards=open&amp;allow_no_indices=true&amp;ignore_throttled=true&amp;search_type=query_then_fetch&amp;batched_reduce_size=512&amp;ccs_minimize_roundtrips=true] returned 2 warnings: [299 Elasticsearch-7.16.2-2b937c44140b6559905130a8650c64dbd0879cfb "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.16/security-minimal-setup.html to enable security."],[299 Elasticsearch-7.16.2-2b937c44140b6559905130a8650c64dbd0879cfb "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
    • 1
    • 1
  • l

    lively-jackal-83760

    09/13/2022, 11:43 AM
    Hi guys. Who knows how to shade or exclude springframework from package io.cryl:datahub-client on the latest version? Even don't see how and why the client library needs Spring, but it conflicted with my own
  • h

    hallowed-dog-79615

    09/13/2022, 2:57 PM
    Hi channel! We just updated our DataHub instance to v0.8.44 and found this behavior: we cannot see new users or policies anymore. These are the steps that we are following: 1. We generate an invite link as in previous months. 2. New users use it to register. 3. Then they notify me so I can assign them a group. Groups have associated policies. 4. I cannot find new users in the user list. Old users are there, but none of the new ones appear. 5. They can login into the platform, but receive a "unauthorized" page, as by defautl they have no access rights. 6. If I activate some "all users" policy, already existing but Inactive, they can access everythin. 7. Still cannot see them in user list. 8. I try to create a new policy a bit more restrictive than the default one, but after successfully creating it, it never appears in the list. Has anyone experienced something similar? Could you fix it? Thanks!
    b
    e
    • 3
    • 6
  • g

    great-branch-515

    09/02/2022, 10:12 AM
    @here Roles page does show any role. Any idea about it? There is no error in gms service logs and linkedin/datahub-gms:v0.8.44 is deployed
    f
    b
    • 3
    • 4
  • k

    kind-whale-32412

    09/13/2022, 11:42 PM
    Hi, I discovered a bug in DataHub where I get an exception (
    Result window is too large
    ) for paths that have more than 10,000 assets. I created an issue explaining it in details: https://github.com/datahub-project/datahub/issues/5928
    b
    • 2
    • 1
  • f

    famous-florist-7218

    09/14/2022, 8:11 AM
    Hi guys, Is there a way to enable Spark Lineage debug log? This setting doesn’t work for me 😞
    Copy code
    log4j.logger.datahub.spark=DEBUG
    log4j.logger.datahub.client.rest=DEBUG
    I’ve deployed a spark k8s operator with the latest datahub-spark-lineage jar (0.8.44-2). It has failed to emit metadata to GMS. The error is
    Application end event received, but start event missing for appId spark-xxx
    . So that why I need to enable the log4j debugging mode.
  • p

    purple-balloon-66501

    09/14/2022, 8:53 AM
    Good afternoon, advise in which direction to look: I do not see fresh runs in ingestion in the logs of the acrylic everything looks fine and at manual start are executed, but the frontend displays data only for yesterday.
    b
    • 2
    • 4
  • l

    limited-forest-73733

    09/14/2022, 10:46 AM
    Hey team! I just want to know is there any update on this one https://github.com/datahub-project/datahub/issues/4809 there is a compatibility issue with airflow:2.3.1.
  • w

    witty-lamp-55264

    09/14/2022, 12:46 PM
    Hello everyone, I am trying to setup datahub on our private cluster (using kubeadm), but everytime I try to install the helm chart I get:
    Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "PodDisruptionBudget" in version "policy/v1beta1"
    After searching, I found that kubernetes latest versions don't support some type of api's such as
    PodDisruptionBudget
    . I wanna see if there will be a newer version that works with the latest versions of kubernetes or is there another solution. Slack Conversation
    s
    • 2
    • 1
  • w

    witty-lamp-55264

    09/14/2022, 12:46 PM
    Any help would be appreciated, I am still stuck at this issue
    s
    • 2
    • 1
  • s

    salmon-angle-92685

    09/14/2022, 12:51 PM
    Hello, Schema information not being ingested for snowflake tables. I have the tables on the UI but not the columns metadata. I have as well the following erros (attached). Any ideia on how to fix this ? See here my ingestions yaml:
    Copy code
    source:
      type: snowflake
      config:
        account_id: ${DH_SNOWFLAKE_ACCOUNT_ID}
        warehouse: ${DH_SNOWFLAKE_WAREHOUSE}
        username: ${DH_SNOWFLAKE_USER}
        password: ${DH_SNOWFLAKE_PASSWORD}
        role: ${DH_SNOWFLAKE_ROLE}
        include_tables: True
        include_views: True
    
        ignore_start_time_lineage: true
    
        stateful_ingestion:
          enabled: True 
          remove_stale_metadata: True 
    
        profiling:
          enabled: true
        profile_pattern:
          allow:
            - 'DATABASE_NAME.SCHEMA_NAME.*'
        database_pattern:
          allow:
            - "OIP"
        schema_pattern:
          allow:
            - "SCHEMA_NAME"
        table_pattern:
          deny:
           - '.*\._AIRBYTE_.*'
    
    pipeline_name: "snowflake_ingestion"
    
    sink:
      type: datahub-rest
      config:
        server: ${DATAHUB_SERVER}
    Thank you guys in advance !
    d
    • 2
    • 4
  • b

    bright-diamond-60933

    09/14/2022, 4:02 PM
    When we enable standalone consumers, we are seeing duplicate elastic search indices and timestamps being appended to elastic search index names, causing the number of indices to explode. We observed this with both versions 0.8.41 and the latest 0.8.44. Is there a workaround for this? Is this expected behavior or a bug? Please see screenshot below. You will notice that the index
    mlfeatureindex_v2
    is duplicated with different timestamps appended to the name. Sometimes we also noticed that the pod doesn't even start up when the standalone consumers flag is enabled :
    b
    b
    • 3
    • 5
1...484950...119Latest