DataHub #troubleshoot

breezy-shoe-41523

10/07/2022, 10:08 AM

Hello team, i have some question about graphql response time i found out that graphql response time is too slow my dataset size is attached below and i have resource limit in my company’s cluster so my cluster setting to gms is

Copy code

datahub-gms:
  enabled: true
  replicaCount: 3
  resources:
   limits:
     cpu: 4
     memory: 8Gi

and i found out that my gms gets faster when i increase limit but doesn’t reach that limit ( it only goes about ~400m) do you know why gms doesn’t use full resource limit ? why gms gets faster when limits grow even though it doesn’t use all that limit? any guide will help. thanks

fresh-cricket-75926

10/07/2022, 12:43 PM

Hi , we are trying to load root certificate for LDAP in datahub frontend , but didn't worked . Is there anyway where we can create truststore from the ldap certificates and configure datahub-frontend to use that truststore ?

wonderful-author-3020

10/07/2022, 4:19 PM

Hello, I'm trying to create an access token for the user I've created, but I encountered some problems. I'm following https://datahubproject.io/docs/api/graphql/token-management/ which says that I should be able to create tokens for others, but every token I create ends up in my "Manage Access Token" panel. I've tried listing all the tokens - the

actorUrn

property is some other account, but the

ownerUrn

is always me.

alert-traffic-45034

10/07/2022, 4:51 PM

hi every one, may i know anyone experiencing this while using athena source before?

Copy code

ImportError: cannot import name 'AthenaTableMetadata' from 'pyathena.model'

witty-wall-84488

10/07/2022, 6:21 PM

Hi every one! What kind of method in GraphQL should i use for search all entities from specific path. This entities can be datasets, folder, datasets, and other. For e.g. i'd like to list all entities from Datasets folder located in Datasets/dev/tableau/some_project_name Looks like method above work with limited object types from EntityType's and dont include folder

query search_across_entities($input: SearchInput!) {

search(input: $input) {

count

total

searchResults {

entity {

urn

type

... on Dataset {

name

variables =

"input": {

"type": "DATASET",

"query": "",

"start": 0,

"count": 1000,

"filters": [{"field": "browsePaths", "value": "dev/tableau/some_project_name"} ]

microscopic-room-90690

10/08/2022, 8:06 AM

Hi everyone, I fellow this link https://datahubproject.io/docs/quickstart/ and ran the command datahub docker quickstart and got this error on my M1 Pro Mac "no matching manifest for linux/arm64/v8 in the manifest list entries ............ Unable to run quickstart - the following issues were detected: - quickstart.sh or dev.sh is not running" and the following are some information might useful: datahub --version acryl-datahub, version 0.8.45.2 Darwin HW0015358 21.6.0 Darwin Kernel Version 21.6.0: Mon Aug 22 201952 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T6000 x86_64 I tried solve this problem refer to the threads before but it seemed not work. Is there any solution to deal with it?

future-hair-23690

10/10/2022, 7:12 AM

Hi guys, am experiencing the issue where my profiling does not start. Does anybody have an idea what might be wrong? There is no error or debug msg, just nothing happens related to profiling. I am using MSSQL(pyodbc) on cli version 0.8.45.2 My config:

Copy code

source:
  type: mssql
  config:
    password: ---------
    database: sandbox_validation
    host_port: 'az-uk-mssql-accept-01.logex.cloud:1433'
    username: ------
    use_odbc: 'true'
    uri_args:
        driver: 'ODBC Driver 17 for SQL Server'
        Encrypt: 'Yes'
        TrustServerCertificate: 'Yes'
        ssl: 'True'
    env: STG
    profiling:
      enabled: true
      limit: 10000
      report_dropped_profiles: false
      profile_table_level_only: false

      include_field_null_count: true
      include_field_min_value: true
      include_field_max_value: true
      include_field_mean_value: true
      include_field_median_value: true
      include_field_stddev_value: true
      include_field_quantiles: true
      include_field_distinct_value_frequencies: true
      include_field_sample_values: true
      turn_off_expensive_profiling_metrics: false
      include_field_histogram: true
      catch_exceptions: false
      max_workers: 4
      query_combiner_enabled: true
      max_number_of_fields_to_profile: 100
      profile_if_updated_since_days: null
      partition_profiling_enabled: false
    schema_pattern:
      deny:
        - DS\\oleksii
        - ds*
        - Logex*
      allow:
        - dbo.*
        - dbo

cheers!

microscopic-mechanic-13766

10/10/2022, 8:41 AM

Good morning everyone, I am trying to update the datahub version to

linkedin/datahub-frontend-react:v0.8.45

, but I keep getting the error shown here. Mention that the previous deployment (which was in version

0.8.44

) worked perfectly, so it is not exactly that the certificate is in a bad format. Is this a known error ?? Note: the certificate that is failing is the one needed for the authentication via OIDC (which in my case is Keycloak)

Front_container_error

gray-telephone-67568

10/10/2022, 12:29 PM

Hi , would like some help. I have added disable_ssl_verification: true in config section of the recipe for ingesting metadata as the GMS is on https right now but it still could not bypass the ssl and got this error [caused by SSLError(SSLCertVerificationError]. Any help would be greatly appreciated. Thank you.

red-analyst-79902

10/10/2022, 2:11 PM

Hello everyone! I am trying to ingest metadata from our Tableau Server, which requires trusted CA certificates deployed. I did deploy them on the Linux machine, but it might require having them in the keystore of the running containers of Datahub, but I am not aware how to do that.

Copy code

'failures': {'tableau-login': ["Unable to LoginReason: HTTPSConnectionPool(host='172.22.5.19', port=443): Max retries exceeded with url: /api/2.4/serverInfo (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1125)')))"]},

Any experience with this?

thankful-morning-85093

10/10/2022, 10:11 PM

Hi Team, Getting "An unknown error occurred. (code 500)" when I log on to Datahub and the entire hive platform is giving the error below. This happened after I tried ingesting another datasource and that might have failed. I upgraded datahub to the latest version to try and fix things. I am also running re-indexing job to check if indexes were corrupted. Any pointers what might be wrong?

clever-garden-23538

10/10/2022, 10:16 PM

Getting the following error log in GMS when I access the "Analytics" page in the UI. I had just recreated my DataHub instance and infra (ES and DB). this has to do with someone interacting with GMS before the ES instances have been set up, right?

Copy code

22:12:21.273 [Thread-1167] ERROR c.l.d.g.a.service.AnalyticsService:264 - Search query failed: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
22:12:21.273 [Thread-1167] ERROR c.l.d.g.a.r.GetHighlightsResolver:35 - Failed to retrieve analytics highlights!
java.lang.RuntimeException: Search query failed:
    at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:265)
    at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.getHighlights(AnalyticsService.java:236)
    at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.getHighlights(GetHighlightsResolver.java:58)
    at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.get(GetHighlightsResolver.java:33)
    at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.get(GetHighlightsResolver.java:24)
    at graphql.execution.ExecutionStrategy.fetchField(ExecutionStrategy.java:270)
    at graphql.execution.ExecutionStrategy.resolveFieldWithInfo(ExecutionStrategy.java:203)
    at graphql.execution.AsyncExecutionStrategy.execute(AsyncExecutionStrategy.java:60)
    at graphql.execution.Execution.executeOperation(Execution.java:165)
    at graphql.execution.Execution.execute(Execution.java:104)
    at graphql.GraphQL.execute(GraphQL.java:557)
    at graphql.GraphQL.parseValidateAndExecute(GraphQL.java:482)
    at graphql.GraphQL.executeAsync(GraphQL.java:446)
    at graphql.GraphQL.execute(GraphQL.java:377)
    at com.linkedin.datahub.graphql.GraphQLEngine.execute(GraphQLEngine.java:90)
    at com.datahub.graphql.GraphQLController.lambda$postGraphQL$0(GraphQLController.java:94)
    at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
    at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
    at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)
    at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)
    at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)
    at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
    at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
    at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)
    at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:260)
    ... 17 common frames omitted
    Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [<http://compass-elasticsearch.us-west-2.prd.fa.tesla.services:80>], URI [/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"datahub_usage_event","node":"QkcIA9AKTCGOho3ag0da_Q","reason":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}}],"caused_by":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.","caused_by":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}}},"status":400}
        at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
        ... 21 common frames omitted
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]
    at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
    at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
    at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
    at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:603)
    at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:179)
    ... 24 common frames omitted
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]
    at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
    at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
    at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
    ... 28 common frames omitted

kind-scientist-44426

10/11/2022, 5:50 AM

Hi everyone, I am trying to configure the lineage via airflow in datahub. After following all the steps provided in https://datahubproject.io/docs/lineage/airflow . I’m getting this error in our airflow

Copy code

Broken DAG: [/app/airflow/airflow/dags/mis/dags/dag_generator/datahub_sample_lineage.py] Traceback (most recent call last):
  File "pydantic/__init__.py", line 2, in init pydantic.__init__
  File "pydantic/dataclasses.py", line 52, in init pydantic.dataclasses
ImportError: cannot import name dataclass_transform

Can someone help me with this error.

witty-rain-85574

10/11/2022, 9:16 AM

Hi everyone, I would like to delete a time series aspect for all dataset entities in a platform, and I used this command to do so from the docs:

datahub delete -p "snowflake" --entity_type dataset -a "datasetProfile"

. However, this ended up soft deleting all the entities itself instead of just the aspect. Can someone please help to explain why this behaviour was observed, and how I can go about deleting just the aspect values? Thanks! 🙂

bumpy-pharmacist-66525

10/11/2022, 12:05 PM

Hi everyone! I am trying to figure out which policy/permission I need to give a user in order for them to have access to the Swagger (OpenAPI UI) page, but I can't seem to find any particular policy which does this. Even after searching through the documentation, I can't find anything on it. My best guess is that the permission to access this page is covered by another permission somewhere, but again, I can't seem to find which one it is under. Would you be able to help me figure out which policy/permission access to the Swagger page is under? Thanks! 🙂

white-hydrogen-24531

10/11/2022, 2:23 PM

Has anyone used the python SDK to add new domain or attach a domain to dataset? I cant seem to get it working with below code

Copy code

from datahub.metadata.schema_classes import (
  DomainsClass,
  ChangeTypeClass
)
import datahub.emitter.mce_builder as builder
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
from datahub.ingestion import graph
graph = DataHubGraph(DatahubClientConfig(server = "<http://datahub-gms>"))

dataset_urn= builder.make_dataset_urn(platform="hive", name="test.test", env="PROD")

#new_domain = DomainsClass(domains=["TEST_123"])
new_domain = DomainsClass(["TEST_123"])

current_domain = graph.get_domain(entity_urn=dataset_urn)
print(current_domain)

event = MetadataChangeProposalWrapper(
  entityType="dataset",
  changeType = ChangeTypeClass.UPSERT,
  entityUrn = dataset_urn,
  aspectName="domains",
  aspect=new_domain
)
graph.emit(event)

ripe-apple-36185

10/11/2022, 4:21 PM

Hi Team, I am trying to add great expectations assertions to a snowflake dataset. The snowflake dataset has the URN in upper case since it is how it is defined in snowflake (I am using

convert_urns_to_lowercase: false

in the recipe). great expectations is converting the URN components to lower case. Is there a way to have DataHubValidationAction set the URNs to uppercase?

ripe-tailor-61058

10/11/2022, 7:26 PM

is there a way to delete metadata from datahub with the datahub delete when access token is enabled?

ripe-tailor-61058

10/11/2022, 7:27 PM

I was able to delete via the curl -X POST 'http://localhost:8080/entities?action=delete' and passing in the token in the header but I only knew how to to single datasets that way and looking for how to delete all for an env or platform

limited-forest-73733

10/12/2022, 6:02 AM

Hey team! Airflow 2.3.1 is not compitable with the sqlalchemy that we are getting from datahub plugin.Any update on this issue?

glamorous-wire-83850

10/12/2022, 8:09 AM

Hello team, I am trying the add LDAP aut to helm datahub but stuck. I did below changes but doesn’t work. Am I missing somethings? Thanks 1. add mount path for new jaas.yaml and confs at frontends values yaml

Copy code

extraEnvs:
  - name: AUTH_JAAS_ENABLED
    value: "true"
  - name: JAVA_OPTS
    value: |-
      -Djava.security.auth.login.config=/datahub-frontend/conf/custom/jaas.conf


extraVolumes:
  - name: jaas-conf-volume
    configMap:
      name: jaas-conf

extraVolumeMounts:
  - name: jaas-conf-volume
    mountPath: datahub-frontend/conf/custom/jaas.conf
    subPath: jaas.conf
    readOnly: true

2.the Jaas file:

Copy code

WHZ-Authentication {
  com.sun.security.auth.module.LdapLoginModule sufficient
  userProvider="<ldap://server.com.tr:389/CN=test,OU=test2,OU=SERVICE> USERS,DC=infoshop,DC=com,DC=tr"
  authIdentity="{USERNAME}"
  java.naming.security.authentication="simple"
  debug="true"
  useSSL="true";
};

shy-parrot-64120

10/12/2022, 2:19 PM

Hi All We’ve encounted a neo4j failure with its disk corruption therefore recreated this DB is it possible to restore data (like for elasticsearch indexes via job) or need to reingest everything?

fast-ice-59096

10/12/2022, 3:05 PM

Hi, everyone, I am trying to use data hub in a Azure VM. When I try to lounch it the following error appears:

fast-ice-59096

10/12/2022, 3:05 PM

[2022-10-12 150141,124] ERROR {datahub.entrypoints:189} - Command failed with Unknown color 'bright_red'. Run with --debug to get full trace [2022-10-12 150141,124] INFO {datahub.entrypoints:192} - DataHub CLI version: 0.8.43 at /home/azureuser/.local/lib/python3.6/site-packages/datahub/__init__.py

fast-ice-59096

10/12/2022, 3:05 PM

Can anyone help?

bland-orange-13353

10/12/2022, 3:12 PM

This message was deleted.

ancient-library-85500

10/12/2022, 8:23 PM

Hi! We are testing a custom entity we have created by ingesting some data in the form of JSONs. Our setup is through Docker, so we run these commands to put and get, respectively.

Copy code

datahub put --urn "urn:li:process:(PRC-1,Test_Process_1_Description)" --aspect testProcessProperties --aspect-data prc1.json
datahub get --urn "urn:li:process:(PRC-1,Test_Process_1_Description)"

The put command completes without any errors, but running the get command will produce the following error

Copy code

19:23:22.102 [qtp522764626-22] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2/urn%3Ali%3Aprocess%3A%28PRC-1%2CTest_Process_1_Description%29 - get - 500 - 1ms
19:23:22.105 [qtp522764626-22] ERROR c.l.m.filter.RestliLoggingFilter:38 - <http://Rest.li|Rest.li> error: 
com.linkedin.restli.server.RestLiServiceException: java.lang.RuntimeException: Failed to get entity with urn: urn:li:process:(PRC-1,Test_Process_1_Description), aspects: null

Caused by: java.lang.RuntimeException: Failed to get entity with urn: urn:li:process:(PRC-1,Test_Process_1_Description), aspects: null
	... 88 common frames omitted
Caused by: java.lang.NullPointerException: null
	... 89 common frames omitted

Any help or insight would be greatly appreciated! @kind-dawn-17532 @bland-balloon-48379 @nice-oil-28310

clever-garden-23538

10/12/2022, 9:52 PM

it seems like elasticsearch setup is split between the setup job and the GMS startup sequence (let me know if i'm mistaken). is there a reason why all ES index creation isn't done in the elasticsearch-setup job?

clever-garden-23538

10/13/2022, 12:44 AM

along the same lines, wouldn't it be simpler to run the setup jobs as init containers of the gms deployment?

brave-secretary-27487

10/13/2022, 7:53 AM

Hey all, I try to get lineage between views with the new

bigquery-beta

plugin. But I get the error that the config options for

lineage_parse_view_ddl

and

lineage_use_sql_parser

don't exist as a config option. Are there any other options to visualize lineage between views in bigquery?