https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • b

    breezy-shoe-41523

    10/07/2022, 10:08 AM
    Hello team, i have some question about graphql response time i found out that graphql response time is too slow my dataset size is attached below and i have resource limit in my company’s cluster so my cluster setting to gms is
    Copy code
    datahub-gms:
      enabled: true
      replicaCount: 3
      resources:
       limits:
         cpu: 4
         memory: 8Gi
    and i found out that my gms gets faster when i increase limit but doesn’t reach that limit ( it only goes about ~400m) do you know why gms doesn’t use full resource limit ? why gms gets faster when limits grow even though it doesn’t use all that limit? any guide will help. thanks
    r
    • 2
    • 2
  • f

    fresh-cricket-75926

    10/07/2022, 12:43 PM
    Hi , we are trying to load root certificate for LDAP in datahub frontend , but didn't worked . Is there anyway where we can create truststore from the ldap certificates and configure datahub-frontend to use that truststore ?
    b
    r
    • 3
    • 9
  • w

    wonderful-author-3020

    10/07/2022, 4:19 PM
    Hello, I'm trying to create an access token for the user I've created, but I encountered some problems. I'm following https://datahubproject.io/docs/api/graphql/token-management/ which says that I should be able to create tokens for others, but every token I create ends up in my "Manage Access Token" panel. I've tried listing all the tokens - the
    actorUrn
    property is some other account, but the
    ownerUrn
    is always me.
  • a

    alert-traffic-45034

    10/07/2022, 4:51 PM
    hi every one, may i know anyone experiencing this while using athena source before?
    Copy code
    ImportError: cannot import name 'AthenaTableMetadata' from 'pyathena.model'
  • w

    witty-wall-84488

    10/07/2022, 6:21 PM
    Hi every one! What kind of method in GraphQL should i use for search all entities from specific path. This entities can be datasets, folder, datasets, and other. For e.g. i'd like to list all entities from Datasets folder located in Datasets/dev/tableau/some_project_name Looks like method above work with limited object types from EntityType's and dont include folder
    query search_across_entities($input: SearchInput!) {
    search(input: $input) {
    count
    total
    searchResults {
    entity {
    urn
    type
    ... on Dataset {
    name
    }
    }
    }
    }
    }
    variables =
    {
    "input": {
    "type": "DATASET",
    "query": "",
    "start": 0,
    "count": 1000,
    "filters": [{"field": "browsePaths", "value": "dev/tableau/some_project_name"} ]
    }
    }
  • m

    microscopic-room-90690

    10/08/2022, 8:06 AM
    Hi everyone, I fellow this link https://datahubproject.io/docs/quickstart/ and ran the command datahub docker quickstart and got this error on my M1 Pro Mac "no matching manifest for linux/arm64/v8 in the manifest list entries ............ Unable to run quickstart - the following issues were detected: - quickstart.sh or dev.sh is not running" and the following are some information might useful: datahub --version acryl-datahub, version 0.8.45.2 Darwin HW0015358 21.6.0 Darwin Kernel Version 21.6.0: Mon Aug 22 201952 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T6000 x86_64 I tried solve this problem refer to the threads before but it seemed not work. Is there any solution to deal with it?
    m
    c
    • 3
    • 10
  • f

    future-hair-23690

    10/10/2022, 7:12 AM
    Hi guys, am experiencing the issue where my profiling does not start. Does anybody have an idea what might be wrong? There is no error or debug msg, just nothing happens related to profiling. I am using MSSQL(pyodbc) on cli version 0.8.45.2 My config:
    Copy code
    source:
      type: mssql
      config:
        password: ---------
        database: sandbox_validation
        host_port: 'az-uk-mssql-accept-01.logex.cloud:1433'
        username: ------
        use_odbc: 'true'
        uri_args:
            driver: 'ODBC Driver 17 for SQL Server'
            Encrypt: 'Yes'
            TrustServerCertificate: 'Yes'
            ssl: 'True'
        env: STG
        profiling:
          enabled: true
          limit: 10000
          report_dropped_profiles: false
          profile_table_level_only: false
    
          include_field_null_count: true
          include_field_min_value: true
          include_field_max_value: true
          include_field_mean_value: true
          include_field_median_value: true
          include_field_stddev_value: true
          include_field_quantiles: true
          include_field_distinct_value_frequencies: true
          include_field_sample_values: true
          turn_off_expensive_profiling_metrics: false
          include_field_histogram: true
          catch_exceptions: false
          max_workers: 4
          query_combiner_enabled: true
          max_number_of_fields_to_profile: 100
          profile_if_updated_since_days: null
          partition_profiling_enabled: false
        schema_pattern:
          deny:
            - DS\\oleksii
            - ds*
            - Logex*
          allow:
            - dbo.*
            - dbo
    cheers!
    l
    • 2
    • 1
  • m

    microscopic-mechanic-13766

    10/10/2022, 8:41 AM
    Good morning everyone, I am trying to update the datahub version to
    linkedin/datahub-frontend-react:v0.8.45
    , but I keep getting the error shown here. Mention that the previous deployment (which was in version
    0.8.44
    ) worked perfectly, so it is not exactly that the certificate is in a bad format. Is this a known error ?? Note: the certificate that is failing is the one needed for the authentication via OIDC (which in my case is Keycloak)
    Front_container_error
    f
    • 2
    • 1
  • g

    gray-telephone-67568

    10/10/2022, 12:29 PM
    Hi , would like some help. I have added disable_ssl_verification: true in config section of the recipe for ingesting metadata as the GMS is on https right now but it still could not bypass the ssl and got this error [caused by SSLError(SSLCertVerificationError]. Any help would be greatly appreciated. Thank you.
    b
    • 2
    • 1
  • r

    red-analyst-79902

    10/10/2022, 2:11 PM
    Hello everyone! I am trying to ingest metadata from our Tableau Server, which requires trusted CA certificates deployed. I did deploy them on the Linux machine, but it might require having them in the keystore of the running containers of Datahub, but I am not aware how to do that.
    Copy code
    'failures': {'tableau-login': ["Unable to LoginReason: HTTPSConnectionPool(host='172.22.5.19', port=443): Max retries exceeded with url: /api/2.4/serverInfo (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1125)')))"]},
    Any experience with this?
    e
    g
    h
    • 4
    • 23
  • t

    thankful-morning-85093

    10/10/2022, 10:11 PM
    Hi Team, Getting "An unknown error occurred. (code 500)" when I log on to Datahub and the entire hive platform is giving the error below. This happened after I tried ingesting another datasource and that might have failed. I upgraded datahub to the latest version to try and fix things. I am also running re-indexing job to check if indexes were corrupted. Any pointers what might be wrong?
    c
    b
    • 3
    • 7
  • c

    clever-garden-23538

    10/10/2022, 10:16 PM
    Getting the following error log in GMS when I access the "Analytics" page in the UI. I had just recreated my DataHub instance and infra (ES and DB). this has to do with someone interacting with GMS before the ES instances have been set up, right?
    Copy code
    22:12:21.273 [Thread-1167] ERROR c.l.d.g.a.service.AnalyticsService:264 - Search query failed: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
    22:12:21.273 [Thread-1167] ERROR c.l.d.g.a.r.GetHighlightsResolver:35 - Failed to retrieve analytics highlights!
    java.lang.RuntimeException: Search query failed:
        at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:265)
        at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.getHighlights(AnalyticsService.java:236)
        at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.getHighlights(GetHighlightsResolver.java:58)
        at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.get(GetHighlightsResolver.java:33)
        at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.get(GetHighlightsResolver.java:24)
        at graphql.execution.ExecutionStrategy.fetchField(ExecutionStrategy.java:270)
        at graphql.execution.ExecutionStrategy.resolveFieldWithInfo(ExecutionStrategy.java:203)
        at graphql.execution.AsyncExecutionStrategy.execute(AsyncExecutionStrategy.java:60)
        at graphql.execution.Execution.executeOperation(Execution.java:165)
        at graphql.execution.Execution.execute(Execution.java:104)
        at graphql.GraphQL.execute(GraphQL.java:557)
        at graphql.GraphQL.parseValidateAndExecute(GraphQL.java:482)
        at graphql.GraphQL.executeAsync(GraphQL.java:446)
        at graphql.GraphQL.execute(GraphQL.java:377)
        at com.linkedin.datahub.graphql.GraphQLEngine.execute(GraphQLEngine.java:90)
        at com.datahub.graphql.GraphQLController.lambda$postGraphQL$0(GraphQLController.java:94)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
        at java.base/java.lang.Thread.run(Thread.java:829)
    Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
        at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
        at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)
        at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
        at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
        at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)
        at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:260)
        ... 17 common frames omitted
        Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [<http://compass-elasticsearch.us-west-2.prd.fa.tesla.services:80>], URI [/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
    {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"datahub_usage_event","node":"QkcIA9AKTCGOho3ag0da_Q","reason":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}}],"caused_by":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.","caused_by":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}}},"status":400}
            at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
            at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
            at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
            at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
            ... 21 common frames omitted
    Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
        at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
        at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:603)
        at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:179)
        ... 24 common frames omitted
    Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
        at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
        ... 28 common frames omitted
    b
    l
    • 3
    • 3
  • k

    kind-scientist-44426

    10/11/2022, 5:50 AM
    Hi everyone, I am trying to configure the lineage via airflow in datahub. After following all the steps provided in https://datahubproject.io/docs/lineage/airflow . I’m getting this error in our airflow
    Copy code
    Broken DAG: [/app/airflow/airflow/dags/mis/dags/dag_generator/datahub_sample_lineage.py] Traceback (most recent call last):
      File "pydantic/__init__.py", line 2, in init pydantic.__init__
      File "pydantic/dataclasses.py", line 52, in init pydantic.dataclasses
    ImportError: cannot import name dataclass_transform
    Can someone help me with this error.
    h
    g
    • 3
    • 4
  • w

    witty-rain-85574

    10/11/2022, 9:16 AM
    Hi everyone, I would like to delete a time series aspect for all dataset entities in a platform, and I used this command to do so from the docs:
    datahub delete -p "snowflake" --entity_type dataset -a "datasetProfile"
    . However, this ended up soft deleting all the entities itself instead of just the aspect. Can someone please help to explain why this behaviour was observed, and how I can go about deleting just the aspect values? Thanks! 🙂
  • b

    bumpy-pharmacist-66525

    10/11/2022, 12:05 PM
    Hi everyone! I am trying to figure out which policy/permission I need to give a user in order for them to have access to the Swagger (OpenAPI UI) page, but I can't seem to find any particular policy which does this. Even after searching through the documentation, I can't find anything on it. My best guess is that the permission to access this page is covered by another permission somewhere, but again, I can't seem to find which one it is under. Would you be able to help me figure out which policy/permission access to the Swagger page is under? Thanks! 🙂
    b
    • 2
    • 4
  • w

    white-hydrogen-24531

    10/11/2022, 2:23 PM
    Has anyone used the python SDK to add new domain or attach a domain to dataset? I cant seem to get it working with below code
    Copy code
    from datahub.metadata.schema_classes import (
      DomainsClass,
      ChangeTypeClass
    )
    import datahub.emitter.mce_builder as builder
    from datahub.emitter.mcp import MetadataChangeProposalWrapper
    from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
    from datahub.ingestion import graph
    graph = DataHubGraph(DatahubClientConfig(server = "<http://datahub-gms>"))
    
    dataset_urn= builder.make_dataset_urn(platform="hive", name="test.test", env="PROD")
    
    #new_domain = DomainsClass(domains=["TEST_123"])
    new_domain = DomainsClass(["TEST_123"])
    
    current_domain = graph.get_domain(entity_urn=dataset_urn)
    print(current_domain)
    
    event = MetadataChangeProposalWrapper(
      entityType="dataset",
      changeType = ChangeTypeClass.UPSERT,
      entityUrn = dataset_urn,
      aspectName="domains",
      aspect=new_domain
    )
    graph.emit(event)
    g
    • 2
    • 6
  • r

    ripe-apple-36185

    10/11/2022, 4:21 PM
    Hi Team, I am trying to add great expectations assertions to a snowflake dataset. The snowflake dataset has the URN in upper case since it is how it is defined in snowflake (I am using
    convert_urns_to_lowercase: false
    in the recipe). great expectations is converting the URN components to lower case. Is there a way to have DataHubValidationAction set the URNs to uppercase?
    h
    • 2
    • 10
  • r

    ripe-tailor-61058

    10/11/2022, 7:26 PM
    is there a way to delete metadata from datahub with the datahub delete when access token is enabled?
  • r

    ripe-tailor-61058

    10/11/2022, 7:27 PM
    I was able to delete via the curl -X POST 'http://localhost:8080/entities?action=delete' and passing in the token in the header but I only knew how to to single datasets that way and looking for how to delete all for an env or platform
    b
    • 2
    • 7
  • l

    limited-forest-73733

    10/12/2022, 6:02 AM
    Hey team! Airflow 2.3.1 is not compitable with the sqlalchemy that we are getting from datahub plugin.Any update on this issue?
    b
    • 2
    • 1
  • g

    glamorous-wire-83850

    10/12/2022, 8:09 AM
    Hello team, I am trying the add LDAP aut to helm datahub but stuck. I did below changes but doesn’t work. Am I missing somethings? Thanks 1. add mount path for new jaas.yaml and confs at frontends values yaml
    Copy code
    extraEnvs:
      - name: AUTH_JAAS_ENABLED
        value: "true"
      - name: JAVA_OPTS
        value: |-
          -Djava.security.auth.login.config=/datahub-frontend/conf/custom/jaas.conf
    
    
    extraVolumes:
      - name: jaas-conf-volume
        configMap:
          name: jaas-conf
    
    extraVolumeMounts:
      - name: jaas-conf-volume
        mountPath: datahub-frontend/conf/custom/jaas.conf
        subPath: jaas.conf
        readOnly: true
    2.the Jaas file:
    Copy code
    WHZ-Authentication {
      com.sun.security.auth.module.LdapLoginModule sufficient
      userProvider="<ldap://server.com.tr:389/CN=test,OU=test2,OU=SERVICE> USERS,DC=infoshop,DC=com,DC=tr"
      authIdentity="{USERNAME}"
      java.naming.security.authentication="simple"
      debug="true"
      useSSL="true";
    };
    b
    f
    c
    • 4
    • 11
  • s

    shy-parrot-64120

    10/12/2022, 2:19 PM
    Hi All We’ve encounted a neo4j failure with its disk corruption therefore recreated this DB is it possible to restore data (like for elasticsearch indexes via job) or need to reingest everything?
    b
    • 2
    • 4
  • f

    fast-ice-59096

    10/12/2022, 3:05 PM
    Hi, everyone, I am trying to use data hub in a Azure VM. When I try to lounch it the following error appears:
  • f

    fast-ice-59096

    10/12/2022, 3:05 PM
    [2022-10-12 150141,124] ERROR {datahub.entrypoints:189} - Command failed with Unknown color 'bright_red'. Run with --debug to get full trace [2022-10-12 150141,124] INFO {datahub.entrypoints:192} - DataHub CLI version: 0.8.43 at /home/azureuser/.local/lib/python3.6/site-packages/datahub/__init__.py
  • f

    fast-ice-59096

    10/12/2022, 3:05 PM
    Can anyone help?
  • b

    bland-orange-13353

    10/12/2022, 3:12 PM
    This message was deleted.
    b
    l
    • 3
    • 3
  • a

    ancient-library-85500

    10/12/2022, 8:23 PM
    Hi! We are testing a custom entity we have created by ingesting some data in the form of JSONs. Our setup is through Docker, so we run these commands to put and get, respectively.
    Copy code
    datahub put --urn "urn:li:process:(PRC-1,Test_Process_1_Description)" --aspect testProcessProperties --aspect-data prc1.json
    datahub get --urn "urn:li:process:(PRC-1,Test_Process_1_Description)"
    The put command completes without any errors, but running the get command will produce the following error
    Copy code
    19:23:22.102 [qtp522764626-22] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2/urn%3Ali%3Aprocess%3A%28PRC-1%2CTest_Process_1_Description%29 - get - 500 - 1ms
    19:23:22.105 [qtp522764626-22] ERROR c.l.m.filter.RestliLoggingFilter:38 - <http://Rest.li|Rest.li> error: 
    com.linkedin.restli.server.RestLiServiceException: java.lang.RuntimeException: Failed to get entity with urn: urn:li:process:(PRC-1,Test_Process_1_Description), aspects: null
    
    Caused by: java.lang.RuntimeException: Failed to get entity with urn: urn:li:process:(PRC-1,Test_Process_1_Description), aspects: null
    	... 88 common frames omitted
    Caused by: java.lang.NullPointerException: null
    	... 89 common frames omitted
    Any help or insight would be greatly appreciated! @kind-dawn-17532 @bland-balloon-48379 @nice-oil-28310
    b
    m
    • 3
    • 7
  • c

    clever-garden-23538

    10/12/2022, 9:52 PM
    it seems like elasticsearch setup is split between the setup job and the GMS startup sequence (let me know if i'm mistaken). is there a reason why all ES index creation isn't done in the elasticsearch-setup job?
  • c

    clever-garden-23538

    10/13/2022, 12:44 AM
    along the same lines, wouldn't it be simpler to run the setup jobs as init containers of the gms deployment?
    b
    • 2
    • 4
  • b

    brave-secretary-27487

    10/13/2022, 7:53 AM
    Hey all, I try to get lineage between views with the new
    bigquery-beta
    plugin. But I get the error that the config options for
    lineage_parse_view_ddl
    and
    lineage_use_sql_parser
    don't exist as a config option. Are there any other options to visualize lineage between views in bigquery?
    d
    • 2
    • 2
1...535455...119Latest