DataHub #troubleshoot

creamy-pizza-80433

06/22/2023, 8:56 AM

Hello everyone, Recently we upgraded datahub version from 0.10.2 to 0.10.4 and we got a new problem regarding permissions and policies for users The permissions suddenly didn't work for Domain, Datasets, Containers but works for Data Products assets. Does anyone know how can I solve this problem? Thanks!

✅ 1

dazzling-rainbow-96194

06/22/2023, 4:48 PM

Hi, we are trying to deploy datahub using Kubernetes. We successfully deployed datahub but while trying to ingest data, we see the following error:

Copy code

2023-06-22 16:40:46,537 [I/O dispatcher 1] ERROR c.l.m.s.e.update.BulkListener:56 - Error feeding bulk request. No retries left. Request: Failed to perform bulk request: index [dh_containerindex_v2], optype: [UPDATE], type [_doc], id [urn%3Ali%3Acontainer%3A47eb8e03b73baa9876828b6d3649509c];Failed to perform bulk request: index [dh_containerindex_v2], optype: [UPDATE], type [_doc], id [urn%3Ali%3Acontainer%3A47eb8e03b73baa9876828b6d3649509c];Failed to perform bulk request: index [dh_containerindex_v2], optype: [UPDATE], type [_doc], id [urn%3Ali%3Acontainer%3A47eb8e03b73baa9876828b6d3649509c];Failed to perform bulk request: index [dh_containerindex_v2], optype: [UPDATE], type [_doc], id [urn%3Ali%3Acontainer%3A47eb8e03b73baa9876828b6d3649509c];

anyone has seen this before? Any tips on how to resolve this?

numerous-address-22061

06/22/2023, 9:08 PM

https://datahubproject.io/docs/python-sdk/models#datahub.metadata.schema_classes.OwnershipTypeCla[…]tahub.metadata.schema_classes.OwnershipTypeClass This part of the documentation seems to be kind of jumbled

✅ 1

red-sundown-5665

06/23/2023, 9:40 AM

Hello, I have deployed the DataHub Helm chart, and I am encountering problems with GMS. The error message I'm seeing is: "Readiness probe failed: HTTP probe failed with status code: 404." Upon investigation, it seems that the readiness probe is attempting to locate the following path: "/health/check/ready." Could someone assist me with this issue? Thank you and best regards.

glamorous-spring-97970

06/23/2023, 9:48 AM

Hi All, I tried upgrading the DataHub with v0.10.3 via composed file 'quickstart' however, when I see the status of the containers: Datahub Upgrade v0.10.3, Kafka Set-up, Elasticsearch set-up & MySQL set-up are always 'Exited'. Below is the screenshot for reference:

✅ 1

glamorous-spring-97970

06/23/2023, 9:49 AM

image.png

glamorous-spring-97970

06/23/2023, 9:49 AM

Could anyone suggest, what's the issue here ? Thanks

acceptable-computer-51491

06/23/2023, 10:14 AM

Hi All, I am deploying datahub to AWS using helm charts. I am using AWS OpenSearch for elastic search for Datahub. When I deploy the helm chart with updated values, the

datahub-elasticsearch-setup-job

pod is getting 403 response when it tries to connect with open search. Any idea why is this happening ?

✅ 1

quiet-businessperson-49384

06/23/2023, 12:35 PM

Hello All, I’m trying to have a lineage between Sql server dataset and Tableau. Ingestion from Sql server and Tableau is working fine. However, I’m not able to link table used in Dashboard with table in SQL. The same table is created twice. I’ll explain: In SQL server, we have a container hierarchy with instance/env/database/schema but in Tableau all tables are created at the root level of SQL Server. For example: • In sql Server we have DataWarehouse/PROD/Marketing/dbo/myTable • In tableau the same table is created with the name DataWarehouse.PROD.Marketing.dbo.myTable I’m using V0.10.3 We would like that Tableau use the same table already loaded with SQL Server. Any help ? Regards, Nabil.

✅ 1

salmon-area-51650

06/23/2023, 12:37 PM

Hello team 👋, I’m trying to reduce the number of logs generated by

datahub-gms

but I’m stuck. So, I currently have the following configuration in my yaml file:

Copy code

datahub-gms:
  enabled: true
  image:
    repository: linkedin/datahub-gms
    tag: "v0.10.3"
  service:
    type: ClusterIP
  env:
    - name: JAVA_OPTS
      value: "-Dlog4j.rootLogger=ERROR,stdout"

But I’m still seeing

INFO

and

WARN

logs. Any advice? Thanks!

powerful-cat-68806

06/24/2023, 4:48 PM

Hi team, I’m facing the same issue like this Executing helm on my namespace, but not sure if upgraded is relevant here GMS helm chart:

Copy code

apiVersion: v2
appVersion: v0.9.3
description: A Helm chart for LinkedIn DataHub's datahub-gms component
name: datahub-gms
type: application
version: 0.2.165

Also - how can I found, from the namespace, what’s the GMS version?

high-twilight-23787

06/24/2023, 7:11 PM

Hi, I tried to edit Lineage through "Lineage graph view" At first it seems to be working, I save changes and new connections are displayed on the screen. But once I refresh the tab, all of the changes disappear I use datahub version 0.10.4 (with Quickstart) Error from GMS:

Copy code

[0]: index [system_metadata_service_v1], type [_doc], id [gPRT5nUbjZTmpbFKO3+1Mw==], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [system_metadata_service_v1] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
[2]: index [graph_service_v1], type [_doc], id [sDI2wkN1U9JE9C/8/heWsw==], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [graph_service_v1] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]

✅ 1

fierce-night-40574

06/25/2023, 2:35 AM

Hello everyone, I am using Datahub version 0.9.2. In this version, I found that the Details module can not filter the degree of dependencies than 3+ , but the Visualize Lineage function module can view the upstream or downstream multi-level information. Excuse me What is the reason?

✅ 1

future-holiday-32084

06/25/2023, 7:56 AM

Hi team. I saw on Paul Logan's blog(medium) post that he mentioned column lineage will support Spark in Q1 2023. I'm not sure if you have released it yet and when you will support automatic column lineage for Spark and HDFS. If it has been released, please provide me with documentation or information regarding this. cc: @astonishing-answer-96712

rich-policeman-92383

06/25/2023, 8:32 AM

Helllo Is there a way to print oidc request and responses in the frontend logs. We are facing issue with datahub where /userinfo endpoint fails with 401 unauthorized.

clever-magician-79463

06/25/2023, 8:36 AM

Hi All, I have deployed datahub using docker on an EC2 M5.xLarge instance. The memory of this box is 16 gb. I have set hard limit of each docker container based on how much memory they were using. The total hard limit set on all 8 containers comes to 8.5 gb. But, according to our grafana dashboard, which check the memory usage of Ec2 instance, The memory is continuously over 75% which should not have been the case, as we only run datahub docker on this instance. The yellow line indicated the memory usage of datahub Ec2 instance. We still don't have clarity on what is causing the spike. According to me memory usage should have been around 60% but is continuously over 75%. Can anyone tell me what could be the reason? I suspect the containers are leaking memory.

salmon-area-51650

06/25/2023, 10:16 AM

Hi team 👋, I have an issue with

dbt

as I cannot see the

dbt test

executions. Evaluations are always empty. For example, this is the content of

run_results.json

Copy code

"unique_id": "test.snowflake_db_transformations.equality_caregiver_check_in_and_check_outs_source_ref_caregiver_check_in_and_check_outs_tests_target____tests.ffe5aef1cc"}, {"status": "success", "timing": [{"name": "compile", "started_at": "2023-06-24T12:41:54.699195Z", "completed_at": "2023-06-24T12:41:54.720918Z"}, {"name": "execute", "started_at": "2023-06-24T12:41:54.725765Z", "completed_at": "2023-06-24T12:41:54.725780Z"}], "thread_id": "Thread-4", "execution_time": 0.04546833038330078, "adapter_response": {}, "message": null, "failures": null, "unique_id":

Attached the output of the metadata ingestion job.

output_dbt_ingestion.txt

aloof-energy-17918

06/26/2023, 2:44 AM

Hi all, I'm trying to figure out this problem i'm having with GMS. I did a re-deployed of Datahub on K8S. Currently seem like it stuck on this request with nothing recurring. However, I tried curl the domain, seem like it the pod has connection to elasticsearch request [POST http://elasticsearch-master:9200/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true]

blue-rainbow-97669

06/26/2023, 7:06 AM

Hi Folks, I encountered an issue while executing the following GraphQL query. According to the documentation, if the URI already exists, it should overwrite it. However, when I execute the query, it returns an error with a status code of 500 and the message "DataFetchingException." Could you please help me investigate and resolve this issue? GraphQL:

Copy code

mutation createGlossaryNode($name: String!, $id: String!,$parentNode: String!)
{
  createGlossaryNode
  (
    input:
      {name: $name, id: $id, parentNode: $parentNode}
    )

}
{
  "name": "TestingMankamalL1-new3",
  "id": "TestingMankamalL1-new3",
  "parentNode": "urn:li:glossaryNode:TestingMankamalL1-new2"
}

Error:

Copy code

{
  "errors": [
    {
      "message": "An unknown error occurred.",
      "locations": [
        {
          "line": 34,
          "column": 3
        }
      ],
      "path": [
        "createGlossaryNode"
      ],
      "extensions": {
        "code": 500,
        "type": "SERVER_ERROR",
        "classification": "DataFetchingException"
      }
    }
  ],
  "data": {
    "createGlossaryNode": null
  },
  "extensions": {}
}

orange-gpu-90973

06/26/2023, 7:55 AM

Hi, I am facing this issue [*3270101 a client request body is buffered to a temporary file /tmp/client-body/0000000243, client: ip server : ip/dataset/urnlidataset(urllidataPlatform:...) This is the logs from ingress I have deployed datahub using helm charts and using v0.10.1. While trying to check top viewed dataset from analytics got this and also it is not showing the dataset page instead of redirects to somewhere else. Any solution for this?

✅ 1

elegant-salesmen-99143

06/26/2023, 2:00 PM

Hello, folks! We are trying to implement the inlets and outlets parameters injection in our custom airflow operators according to the following example in the datahub documentation https://datahubproject.io/docs/lineage/airflow/#emitting-lineage-via-a-custom-operator-to-the-airflow-plugin . But inlets and outlets parameters in "emitting datahub..." logs are not filled as well as "inputs" and "outputs" values in the datahub UI. Is the example in the docs working? Or should we check something else, may be our integration setup is incorrect? We use the recommended integration between the Airflow and the Datahub installations via the Airflow Plugin We use airflow 2.1.2 and datahub 10.1

wide-florist-83539

06/26/2023, 10:30 PM

Hey Everyone, I am following the Airflow Emittance Lineage Backend Demo and I am getting the following error when running Line 38

Copy code

ERROR - Error sending metadata to datahub: ('Unable to emit metadata to DataHub GMS: Invalid format for aspect: {inputDatajobs=[], inputDatasets=[urn:li:dataset:(urn:li:dataPlatform:snowflake,DEV.ACCOUNTS,dev)

Am I not allowed to add the environment per dataset emit event? I see that this is still a property in the Entity class for Dataset. Here is my code btw

Copy code

Dataset(
                "snowflake",
                str(
                    snowflake_get_database(env=ENV) + "." + SCHEMA + ".ACCOUNTS"
                ).upper(),
               "DEV"
            )

I assume its probably easier to just modify the

datahub.cluster

default value to dev so all metadata events emitted from airflow are already labeled Dev.

✅ 1

mysterious-wolf-37802

06/27/2023, 3:26 AM

👋 Hello, team! I’m following the QuickStart instructions but I met some error like this. I tried several method but cannot solve it.

rich-restaurant-61261

06/27/2023, 4:51 AM

Hi Team, is anyone know how to find my datahub gms server? I am trying to connect datahub through datahub rest emitter, my datahub is deployed in kube, do I need to add any connection password in the extra_headers? I find the following potential datahub gms http://datahub-datahub-gms:8080, but it doesn't work and throwing following error.

[Errno 8] nodename nor servname provided, or not known

✅ 1

delightful-autumn-14108

06/27/2023, 11:55 AM

Hey Everyone, I am completely new to Datahub, I have just started working into the Airflow and Datahub integration for my project work (I have followed the steps mentioned in the document : https://datahubproject.io/docs/lineage/airflow/) and I am stuck at one place as of now. I see Datahub listed as a plugin for airflow but My DAG does not show any related log messages that display Emitting Datahub ... Can someone please guide me on this , which steps I have missed and how to fix the issue.

✅ 1

👀 1

cuddly-butcher-39945

06/27/2023, 2:44 PM

Hi team, I am still having some issues with a DH UI modification. Hoping we can take a look at today's #office-hours Environment: EKS on AWS (Running local versions of MySQL, ElasticSearch and Kafka) running on fargate. DH Version: 10.1 Pods deployed via Helm Charts Version 2.161) GMS Error on initial load: 2023-06-27 141827,482 [Thread-890] WARN notprivacysafe.graphql.GraphQL:594 - Query did not validate : 'query getQuickFilters($input: GetQuickFiltersInput!) { getQuickFilters(input: $input) { quickFilters { field value entity { urn type ... on DataPlatform { ...platformFields __typename } __typename } __typename } __typename } } fragment platformFields on DataPlatform { urn type lastIngested name properties { type displayName datasetNameDelimiter logoUrl __typename } displayName info { type displayName datasetNameDelimiter logoUrl __typename } __typename } ' More details on Error: 2023-06-27 141827,485 [Thread-890] ERROR c.datahub.graphql.GraphQLController:101 - Errors while executing graphQL query: "query getQuickFilters($input: GetQuickFiltersInput!) {\n getQuickFilters(input: $input) {\n quickFilters {\n field\n value\n entity {\n urn\n type\n ... on DataPlatform {\n ...platformFields\n __typename\n }\n __typename\n }\n __typename\n }\n __typename\n }\n}\n\nfragment platformFields on DataPlatform {\n urn\n type\n lastIngested\n name\n properties {\n type\n displayName\n datasetNameDelimiter\n logoUrl\n __typename\n }\n displayName\n info {\n type\n displayName\n datasetNameDelimiter\n logoUrl\n __typename\n }\n __typename\n}\n", result: {errors=[{message=Validation error (UnknownType) : Unknown type 'GetQuickFiltersInput', locations=[{line=1, column=31}], extensions={classification=ValidationError}}, {message=Validation error (FieldUndefined@[getQuickFilters]) : Field 'getQuickFilters' in type 'Query' is undefined, locations=[{line=2, column=3}], extensions={classification=ValidationError}}], data=null, extensions={tracing={version=1, startTime=2023-06-27T141827.481071Z, endTime=2023-06-27T141827.483811Z, duration=2758348, parsing={startOffset=911000, duration=877708}, validation={startOffset=1429910, duration=489708}, execution={resolvers=[]}}}}, errors: [ValidationError{validationErrorType=UnknownType, queryPath=null, message=Validation error (UnknownType) : Unknown type 'GetQuickFiltersInput', locations=[SourceLocation{line=1, column=31}], description='Validation error (UnknownType) : Unknown type 'GetQuickFiltersInput''}, ValidationError{validationErrorType=FieldUndefined, queryPath=[getQuickFilters], message=Validation error (FieldUndefined@[getQuickFilters]) : Field 'getQuickFilters' in type 'Query' is undefined, locations=[SourceLocation{line=2, column=3}], description='Validation error (FieldUndefined@[getQuickFilters]) : Field 'getQuickFilters' in type 'Query' is undefined'}] We have built the following images and pushed to our private AWS ECR. AWS/ECR/datahub-gms: v0.10.1 AWS/ECR/datahub-frontend: v0.10.1 acryldata/datahub-upgrade:v0.10.1

✅ 1

fierce-restaurant-41034

06/27/2023, 4:14 PM

Hi all, Does someone use the Datahub Actions Framework on Slack? https://datahubproject.io/docs/actions I managed to work with it on my local machine, But it seems odd that I need to activate it using my terminal and stop it with “Control + C”. • How can I use it on Kubernetes? • Is there a way to run it as part of the datahub-actions pod? • Is there a way to activate it as a background process? Thanks for your help.

dazzling-rainbow-96194

06/27/2023, 8:12 PM

Hi All, I am trying to do an ingest from SNOWFLAKE and I am using a role that has access to only one schema. I am filtering just 5 tables in the initial ingest to test the setup. But I see that the datahub is somehow scanning all schemas available in snowflake. Even the ones that are not accessible to the user and the role. It does say Skipping operations for table <table name>, as table schema is not accessible. But since it is scanning everything, it is timing out cause we have a lot of objects in Snowflake. Is there a way to restrict this action so that we can keep the ingest restricted to the schema and tables of interest?

✅ 1

rich-restaurant-61261

06/27/2023, 9:16 PM

Still facing the error, I saw when we deploy the datahub, it default set the gms as load balancer, would change to NodePort helps here for the rest emitter connection?

✅ 1

rich-restaurant-61261

06/28/2023, 12:50 AM

Hi team, for column level lineage, can we modify it in the UI page for data ingest from trino?

✅ 1