DataHub #troubleshoot

creamy-machine-95935

03/23/2023, 4:46 PM

HI! We are using SSO with Google to authenticate users. How can we assign all this authenticated user the Reader role? Thanks!

white-shampoo-69122

03/23/2023, 5:16 PM

Hi, we recently upgraded to DataHub 0.10.0, we are ingesting metadata for Glue + dbt and we noticed the list of columns is now duplicated, with: • 1 set of them (coming from Glue) having no description • the 2nd set, which is repeated has the description coming from dbt After a bit of digging noticed, that in the graphql response the Glue datasets have

fieldPaths

as version 2:

Copy code

fields: 
  0:
    description: null
    fieldPath: "[version=2.0].[type=string].a_column_x"
    globalTags: null
    glossaryTerms: null
    isPartOfKey: false
    jsonPath: null
    label: null
    nativeDataType: "string"
    nullable: true
    recursive: false
    type: "STRING"
    __typename: "SchemaField"

While the dbt sibling entity has fieldPaths version 1(?):

Copy code

siblings: 
  isPrimary: false
  siblings: 
    0:
      ... 
      schemaMetadata: 
        fields: 
          ...
          60: 
            description: "A description for column x"
            fieldPath: "a_column_x"
            globalTags: null
            glossaryTerms: null
            isPartOfKey: false
            jsonPath: null
            label: null
            nativeDataType: "varchar"
            nullable: false
            recursive: false
            type: "STRING"
            __typename: "SchemaField"

Not really sure how it was before but found that interesting. Also might be worth mentioning that we have enable statful ingestion for both dbt and Glue and that it worked well before. Any ideas what might be going wrong?

flaky-portugal-377

03/23/2023, 6:32 PM

Hello - Im very new to DataHub. We are trying to ingest metadata from our Snowflake instance. When we test the connection it runs successfully, but when we try to run the actual ingestion we are getting these types of errors [2023-03-22 195814,603] INFO {datahub.cli.ingest_cli:120} - Starting metadata ingestion [2023-03-22 195815,429] INFO {datahub.ingestion.source.snowflake.snowflake_v2:1388} - Checking current version [2023-03-22 195815,551] ERROR {datahub.ingestion.source.snowflake.snowflake_v2:224} - version => Error: 'CURRENT_VERSION()' [2023-03-22 195815,551] INFO {datahub.ingestion.source.snowflake.snowflake_v2:1394} - Checking current role [2023-03-22 195815,645] ERROR {datahub.ingestion.source.snowflake.snowflake_v2:224} - version => Error: 'CURRENT_ROLE()' [2023-03-22 195815,645] INFO {datahub.ingestion.source.snowflake.snowflake_v2:1400} - Checking current warehouse [2023-03-22 195815,744] ERROR {datahub.ingestion.source.snowflake.snowflake_v2:224} - current_warehouse => Error: 'CURRENT_WAREHOUSE()' [2023-03-22 195815,744] INFO {datahub.ingestion.source.snowflake.snowflake_v2:1407} - Checking current edition [2023-03-22 195816,368] WARNING {datahub.ingestion.source.snowflake.snowflake_v2:185} - snowsight url => unable to get snowsight base url due to an error -> 'CURRENT_ACCOUNT()' [2023-03-22 195816,368] ERROR {datahub.ingestion.source.snowflake.snowflake_v2:224} - permission-error => Current role does not have permissions to use warehouse WH_NMHS_XS. Please update permissions. We are on a Tenant of Snowflake, not sure if that has anything to do with this or not. Any help would be much appreciated. Thanks!!

acceptable-football-40437

03/23/2023, 6:39 PM

Hello! I'm trying to get more familiar with the DataHub API, and want to be able to update metadata for a given entity--the issue is, I'm having trouble figuring out the pattern for the URN. The UI gives me something like

urn:li:container:<alphanumeric-string>

, but from the docs it seems more human-readable (and programmatically guessable) versions of URNs exist. How does one put together the latter type of URN for, say, a BigQuery dataset? If this belongs in a difference channel, I'll happily cross-post!

🩺 1

strong-hospital-52301

03/23/2023, 6:45 PM

Hello! I'm trying to synchronize the data that I have on a local instance of MySQL with docker container datahub but it stays freeze on this state. MySQL instance runs on localhost:3306 and it's MySQL 8.0.32 version (i attach a picture). My Ingestion file is the following one:

✅ 1

bumpy-activity-74405

03/24/2023, 7:01 AM

Hi, some time ago I had an issue with

HTTP header value exceeds the configured limit of 8192 characters

in frontend. I was able to work around it with env variables introduced in this PR and it worked on version

v0.8.44

. After upgrading to

v0.9.6.1

the issue is back. I suspect it has to do with renaming the configuration option in this commit. Not sure why it was done since akka documentation states it should be

max-header-value-length

🩺 1

swift-dream-78272

03/24/2023, 1:53 PM

Hey Team, I’ve been trying to run ingestion using python script like this - https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/programatic_pipeline.py Does it support when config_dict has env variables instead of explicitly inserted values? Something like this?

Copy code

from datahub.ingestion.run.pipeline import Pipeline

# The pipeline configuration is similar to the recipe YAML files provided to the CLI tool.
pipeline = Pipeline.create(
    {
        "source": {
            "type": "mysql",
            "config": {
                "username": "user",
                "password": "pass",
                "database": "db_name",
                "host_port": "localhost:3306",
            },
        },
        "sink": {
            "type": "datahub-rest",
            "config": {"server": "${DATAHUB_GMS_URL}",},
        },
    }
)

# Run the pipeline and report the results.
pipeline.run()
pipeline.pretty_print_summary()

✅ 1

brief-bear-90340

03/24/2023, 4:07 PM

hello team, i am trying to setup datahub for bigquery and running into issue trying to setup the connection

Copy code

[12:05 PM] DefaultCredentialsError: ('Failed to load service account credentials from /tmp/tmp53gmekv8', ValueError('Could not deserialize key data. The data may be in an incorrect format, it may be encrypted with an unsupported algorithm, or it may be an unsupported key type (e.g. EC curves with explicit parameters).', [<OpenSSLError(code=503841036, lib=60, reason=524556, reason_text=unsupported)>]))

any help with this would be appreciated

rapid-zoo-88437

03/25/2023, 3:28 AM

Hi team, I'm new to datahub. Tried to use spark lineage feature, but I met some problems. Listener counld find the downstream but couldn't find the upstream! I had already set "spark.datahub.coalesce_jobs" to true. I would be grateful if someone could give me some clue, thank you! Below is my sample code: find downstream Persons1, Persons2 can't find upstream Persons on datahub UI lineage

Copy code

val ds1 = spark.read
        .format("jdbc")
        .option("driver","com.mysql.cj.jdbc.Driver")
        .option("url", "jdbc:mysql://{myhost}:3306/xxx")
        .option("dbtable", "Persons")
        .option("user", "xxx")
        .option("password", "xxx")
        .load()


ds1.write.mode(SaveMode.Append)
.format("jdbc")
.option("driver","com.mysql.cj.jdbc.Driver")
.option("url", "jdbc:mysql://{myhost}:3306/xxx")
.option("dbtable", "Persons1")
.option("user", "xxx")
.option("password", "xxx")
.save()

ds1.write.mode(SaveMode.Append)
.format("jdbc")
.option("driver","com.mysql.cj.jdbc.Driver")
.option("url", "jdbc:mysql://{myhost}:3306/xxx")
.option("dbtable", "Persons2")
.option("user", "xxx")
.option("password", "xxx")
.save()

✅ 1

fresh-cricket-75926

03/27/2023, 11:35 AM

Hi Community , when i run datahub delete command i.e " datahub delete --env DEV --urn "urnlidataset:(urnlidataPlatform:redshift,lt_phoenix.business.f_order_line,DEV)" " i am facing with "*JSONDecodeError: Expecting value: line 1 column 1 (char 0)*". Any idea what might be the issue here. Attached are the stack trace.

datahub delete logs.txt

wonderful-quill-11255

03/27/2023, 12:57 PM

Hello Community. We are deploying a forked version of datahub (only minor changes) and after a long period of inactivity we are doing an update from 0.8.35 to 0.10.0. When we develop our fork, we tag our own releases with our own git tags that don't follow the exact format

vX.Y.Z

that the regular datahub code does. Up until now that hasn't mattered a lot since that value was mainly used in the UI to show the version running. But it seems that recently this value has become more important, controlling a step in the bootstrap process. I'm wondering if anyone else have encountered this and how you chose to deal with it. Best Regards

✅ 1

limited-refrigerator-50812

03/27/2023, 3:59 PM

Hi Team, I added a new entity to my local fork of datahub and despite trying to make it as simple as possible I'm having a hard time. I managed to follow all the steps in the guide for extending the metadtaa model here: https://datahubproject.io/docs/metadata-modeling/extending-the-metadata-model. After that I followed the GraphQL guide: https://datahubproject.io/docs/datahub-graphql-core. However, now, when I try to rebuild datahub with the command

./gradlew quickstartDebug --stacktrace -x yarnTest -x yarnLint

I get an error that I don't know how to deal with. Including the the error message(s) below. Any idea how I can find out what I did wrong?

Copy code

> Task :datahub-web-react:yarnGenerateyarn run v1.22.0
$ graphql-codegen --config codegen.yml0s]
(node:4132) ExperimentalWarning: stream/web is an experimental feature. This feature could change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
[15:13:03] Parse configuration [started]]
[15:13:03] Parse configuration [completed]
[15:13:03] Generate outputs [started]
[15:13:03] Generate src/types.generated.ts [started]
[15:13:03] Generate to src/ (using EXPERIMENTAL preset "near-operation-file") [started]
[15:13:03] Load GraphQL schemas [started]
[15:13:03] Load GraphQL schemas [started]
[15:13:03] Load GraphQL schemas [failed]
[15:13:03] → Failed to load schema
[15:13:03] Generate to src/ (using EXPERIMENTAL preset "near-operation-file") [failed]
[15:13:03] → Failed to load schema
[15:13:03] Load GraphQL schemas [failed]
[15:13:03] → Failed to load schema
[15:13:03] Generate src/types.generated.ts [failed]
[15:13:03] → Failed to load schema
[15:13:03] Generate outputs [failed]
Something went wrong
error Command failed with exit code 1.
info Visit <https://yarnpkg.com/en/docs/cli/run> for documentation about this command.
> :datahub-web-react:yarnGenerate
> Task :datahub-web-react:yarnGenerate FAILED
> :datahub-web-react:yarnGenerate
FAILURE: Build failed with an exception.]

* What went wrong:
Execution failed for task ':datahub-web-react:yarnGenerate'.
> Process 'command '/mnt/c/Users/dries528/Documents/Code/datahub_fresh/datahub/datahub-web-react/.gradle/yarn/yarn-v1.22.0/bin/yarn'' finished with non-zero exit value 1

* Try:
Run with --info or --debug option to get more log output. Run with --scan to get full insights.

partial-errorlog.txt

bulky-grass-52762

03/27/2023, 6:59 PM

Hey lovely DataHub team 👋 I wanted to bring to your attention an important breaking change that appears to be missing from the documentation ❗ Upon attempting to upgrade DataHub from version

0.9.3

0.10.1

, we discovered that certain nodes in the lineage UI have disappeared. These nodes were not entities themselves, but rather were connected to other entities as upstream/downstream dependencies. For example in our use case as attached in the screenshot, we used s3 lineage aspect to complete the flow of hive -> s3 -> redshift, but that flow seems to be broken because in

0.10.1

, the lineage aspects seems to be missing in the lineage UI. I believe this is because of the implementation of showing an error message if the entity is not found. IMHO, this shouldn’t have impacted the nodes in the lineage UI, since the original redshift ingestion is still offloading the related s3 upstream lineage aspect without the entity itself. TIA for your future efforts looking at this thankyou

👀 1

cuddly-butcher-39945

03/27/2023, 10:26 PM

Hey everyone, I’m having issues with the quickstart. Following this guide: https://datahubproject.io/docs/docker/development Here are my steps: 1. Synced fork 2. datahub docker nuke 3. datahub docker quickstart --version=v0.10.0 4. I see many connectivity errors during the mysql-setup and gms.. Here are a few of the lines where I see a bunch of connectivity errors: datahub-datahub-actions-1 | 2023/03/27 213232 Problem with request: Get “http://datahub-gms:8080/health”: dial tcp 172.20.0.78080 connect: connection refused. Sleeping 1s mysql-setup | 2023/03/27 213936 Problem with dial: dial tcp 172.20.0.63306 connect: connection refused. Sleeping 1s datahub-gms | 2023/03/27 213235 Problem with dial: dial tcp 172.20.0.929092 connect: connection refused. Sleeping 1s I also checked for any of these ports already listening from other pids, but did not see any. netstat -lntup Update to this: Further troubleshooting steps: 1. Cleared all docker images, etc.. 2. docker system prune --all --volumes --force 3. Rebooted Box 4. datahub docker quickstart --version=v0.10.0 Still unable to complete the quickstart… Unable to run quickstart - the following issues were detected: - datahub-gms is still starting - mysql-setup is still running - datahub-upgrade exited with an error If you think something went wrong, please file an issue at https://github.com/datahub-project/datahub/issues or send a message in our Slack https://slack.datahubproject.io/ Be sure to attach the logs from /tmp/tmpo132cuyl.log Any help would be appreciated!

numerous-account-62719

03/28/2023, 6:28 AM

Hi Team, I have ingested data from kafka but I am not able to see any schema. Every thing is null. I have around 70 datasets, all are blank Can someone help me

microscopic-room-90690

03/28/2023, 8:01 AM

Hi team, I’m experiencing an issue. I tried to remove a link and it appeared to work, but the link is still there after refreshing the page. My UI version is 0.9.6.1. Would anyone be able to help me with this?

bumpy-activity-74405

03/28/2023, 10:04 AM

Hi I am running

v0.9.6.1

. Having issues with the download csv feature when trying to download ~5k datasets. It's my understanding that it tries to batch these queries in chunks of 1000, but each chunk takes longer and longer until I get a timeout error in gms:

Copy code

09:50:18.820 [qtp71399214-1199] WARN  o.s.w.s.m.s.DefaultHandlerExceptionResolver:208 - Resolved [org.springframework.web.context.request.async.AsyncRequestTimeoutException]

I am getting similar results when trying to run a graphql query - if I set the

count

to 10000 (everything in one chunk) it times out. If I try to batch my queries using offsets (

start/count

) I can observe that with increasing offsets I also get increasing query run times which eventually time out when reaching 30s. Is there something that I could do about this - increase timeout somehow or should I somehow scale elasticsearch?

bright-morning-76046

03/28/2023, 11:47 AM

Hi! I'm following the quickstart guide and when I do 'datahub docker quickstart' i have the follow error message:

Copy code

Unable to run quickstart - the following issues were detected:
- datahub-gms is running by not yet healthy
- datahub-upgrade is still running

If you think something went wrong, please file an issue at <https://github.com/datahub-project/datahub/issues>
or send a message in our Slack <https://slack.datahubproject.io/>
Be sure to attach the logs from /var/folders/c2/3gbwy5wj5dbfvgjzz3kctd000000gp/T/tmpsdxu45ta.log

My Version is DataHub CLI version: 0.10.1 Thank you so much!

mysterious-advantage-78411

03/28/2023, 12:36 PM

Hi, have anybody same error? (0.10.1)

best-wire-59738

03/29/2023, 1:50 AM

Hi Team, we are facing policies syncing issue in datahub v0.10.0.6 version. we have recently upgraded from v0.9.2 to v0.10.0.6 and old policies are working and are in sync but new policies created are not getting synced. We have given privilege for all users to View Dataset Usage and View Dataset Profile but still users are unable to view dataset details they are getting code 500 error in the UI. Any kind of help is Appreciated

fierce-monkey-46092

03/29/2023, 6:40 AM

Hi, I'm doing file-based ingestion with data profiling enabled (stats tab) from Oracle source. The question is during the ingestion, I'm having some exceptions and warnings but ingestion is keep going. After around 20 mins the process is completely stuck. Did anyone faced this issue before?

busy-mechanic-8014

03/29/2023, 9:00 AM

Hello everyone, i'm trying to create the first Personal Access Token programatically but got "401 Client Error: Unauthorized for url". This issue has already been mentioned in these threads but even following the steps described there it does not work : • https://app.slack.com/client/TUMKD5EGJ/search/search-eyJkIjoicHlqd3QiLCJxIjoiVTA0UjdRMUdNRUciLCJyIjoicHlqd3QifQ==/thread/CV2UVAPPG-1678295093.481849 • https://app.slack.com/client/TUMKD5EGJ/search/search-eyJkIjoicHlqd3QiLCJxIjoiVTA0Uj[…]NRUciLCJyIjoicHlqd3QifQ==/thread/C029A3M079U-1668589539.838859 Here are my steps: Configuration (Helm = app version: v0.10.0 – chart version : v0.2.151) • Set METADATA_SERVICE_AUTH_ENABLED var to true in helm values for datahub-gms & datahub-front • Enable metadata_service_authentication with no changes

Copy code

metadata_service_authentication:
      enabled: true
      systemClientId: "__datahub_system"
      systemClientSecret:
        secretRef: "datahub-auth-secrets"
        secretKey: "token_service_signing_key"
      tokenService:
        signingKey:
          secretRef: "datahub-auth-secrets"
          secretKey: "token_service_signing_key"
        salt:
          secretRef: "datahub-auth-secrets"
          secretKey: "token_service_salt"
      # Set to false if you'd like to provide your own auth secrets
      provisionSecrets:
        enabled: true
        autoGenerate: true
      # Only specify if autoGenerate set to false
      #  secretValues:
      #    secret: <secret value>
      #    signingKey: <signing key value>
      #    salt: <salt value>

=> I’ve now a secret with token_service_signing_key: f2E0BZoNKlr7CEu71kjZjAduRNCsePKS Create programmatically the access token • Decode an access token created on the UI and get the payload

Copy code

{
  "actorType": "USER",
  "actorId": "datahub",
  "type": "PERSONAL",
  "version": "2",
  "jti": "6ec82917-d39a-4c52-9a5e-5d4caacf6b7d",
  "sub": "datahub",
  "exp": 1680015431,
  "iss": "datahub-metadata-service"
}

• I validated the service key by recreating the token by my own means (just used https://jwt.io/ with payload, header and token signing key) • Create a new token in Python

Copy code

import jwt
import time

# I noticed that you have to encode the service key in ASCII to get the same verified signature as the token created on the UI (anyway I tested with or without for the same result)
secret_signing_key = "f2E0BZoNKlr7CEu71kjZjAduRNCsePKS".encode('ascii')  
payload = {
  "actorType": "USER",
  "actorId": "datahub",
  "type": "PERSONAL",
  "version": "2",
  "jti": "6ec82917-d39a-4c52-9a5e-5d4caacf6b7d",
  "sub": "datahub",
  "exp": 1680015431,
  "iss": "datahub-metadata-service"
}
header = { "alg": "HS256" }
token = jwt.encode(payload, secret, headers=header)
print(token)
eyJhbGciOiJIUzI1NiJ9…

• Decode my new access token to check if it is well built => all looks good *cURL (*Curl proposed when creating a token on the UI)

Copy code

curl -X POST "<http://datahub-front-url/api/graphql>" --header 'Authorization: Bearer eyJhbGciOiJIUzI1NiJ9… ' --header 'Content-Type: application/json' --data-raw '{"query": "{\n me {\n corpUser {\n username\n }\n }\n}","variables":{}}'

=> HTTP ERROR 401 Unauthorized to perform this action Datahub API

datahub ingest -c /tmp/ch_recipe.yml

ch_recipe.yml:

Copy code

source:
    type: clickhouse
    config:
        host_port: "clickhouse-install.clickhouse.svc.cluster.local:8123"
        username: ****
        password: ****
        platform_instance: DatabaseNameToBeIngested
        include_views: true
        include_tables: true
sink:
    type: "datahub-rest"
    config:
            server: "<http://datahub-gms.datahub.svc.cluster.local:8080>"
            token: "eyJhbGciOiJIUzI1NiJ9…."

=> 401 Client Error: Unauthorized for url All works fine if I put a token created on the UI. Questions Has anyone managed to create a token programmatically and used it for queries? Is it really possible to do that now? I also noticed (if I understood correctly) that if I create a token via the UI, retrieve it but delete it immediately afterwards, it's as if I simulate creating the token programmatically and get this result. If we can really create our own token with the token signing key, we should be able to use this token (present or not on the UI) to request datahub. On my side it doesn't work. I remain available if you need more information! 🙂 Thanks for your time and I hope someone can help me out!

astonishing-dusk-99990

03/29/2023, 9:21 AM

Hi Community, I’ve a problem regarding setting for OIDC using google. Currently I already set up my oidc with google and here’s my yaml on datahub-frontend pod

Copy code

datahub-frontend:
  enabled: true
  image:
    repository: linkedin/datahub-frontend-react
    tag: "v0.10.0" # # defaults to .global.datahub.version
  resources:
    limits:
      memory: 1400Mi
    requests:
      cpu: 100m
      memory: 512Mi
  # Set up ingress to expose react front-end
  ingress:
    enabled: false
  oidcAuthentication: # OIDC auth based on <https://datahubproject.io/docs/authentication/guides/sso/configure-oidc-react>
    enabled: false
  extraEnvs:
    - name: AUTH_JAAS_ENABLED
      value: "true"
    - name: AUTH_OIDC_ENABLED
      value: "true"
    - name: AUTH_OIDC_CLIENT_ID
      value: "your_oidc_client_id"
    - name: AUTH_OIDC_CLIENT_SECRET
      value: your_client_secret
    - name: AUTH_OIDC_DISCOVERY_URI
      value: "<https://accounts.google.com/.well-known/openid-configuration>"
    - name: AUTH_OIDC_BASE_URL
      value: "<http://localhost:9002>"
    - name: AUTH_OIDC_USER_NAME_CLAIM
      value: "email"
    - name: AUTH_OIDC_USER_NAME_CLAIM_REGEX
      value: "([^@]+)"
  extraVolumes:
    - name: datahub-users
      secret:
        defaultMode: 0444
        secretName: datahub-users-secret
  extraVolumeMounts:
    - name: datahub-users
      mountPath: /datahub-frontend/conf/user.props
      #mountPath: /etc/datahub/plugins/frontend/auth/user.props
      subPath: user.props

And then I followed this article to set up google and already set up my Authorized Javascript Origins and Authorized Redirect URLs in attachment below. However when I tested, It showed google sign in with my personal gmail and work gmail. Then, first I tried to test with my personal gmail and the result is as expected which is access blocked, but when I use my work gmail always refused to connect like attachment below. My question, what’s the problem here can anyone here help me? Notes: • I already allow port 9002 in firewall rule • My version image is 0.10.0 • Deployed using helm chart on kubernetes cluster

✅ 1

powerful-cat-68806

03/29/2023, 10:26 AM

Hi DH team 🙂 I’ve executed

helm upgrade

from my local, but I’m not seeing the latest updates from the announcement This is my chart

Copy code

apiVersion: v2
name: jfrog-datahub
description: A Helm chart for Acryl DataHubd
type: application
version: 0.0.1
appVersion: latest #0.3.1
dependencies:
  - name: datahub
    version: 0.2.148
    repository: <https://helm.datahubproject.io>

icy-flag-80360

03/29/2023, 12:10 PM

Hello! I was faced with error after updating datahub on k8s from version 0.10.0 to 0.10.1 with this exception:

Copy code

uppressed: org.elasticsearch.client.ResponseException: method [POST], host [<http://elasticsearch-master:9200>], URI [/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request] {"error":{"root_cause":[{"type":"query_shard_exception","reason":"[simple_query_string] analyzer [query_word_delimited] not found","index_uuid":"GZJPC-CBTtekUqWRmtZGfA","index":"datahubpolicyindex_v2"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"datahubpolicyindex_v2","node":"MhPwkRJ4T8WYHY0QwONrOg","reason":{"type":"query_shard_exception","reason":"[simple_query_string] analyzer [query_word_delimited] not found","index_uuid":"GZJPC-CBTtekUqWRmtZGfA","index":"datahubpolicyindex_v2"}}]},"status":400}

But if I'm check from curl in GMS pod - all ok, elastic returns data with existing policies, but without any index_uuid. Example: curl -XGET 'http://elasticsearch-master:9200/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true' Is any ways to repair it? I'll tried many ways to recover with full erasing elastic data too.

plus1 3

🚨 1

✅ 1

adventurous-waiter-4058

03/29/2023, 12:16 PM

Hi All, I am trying to execute datahub ingest command on using aws cli. Datahub is hosted on AWS EKS and its up and running but facing below error while ingesting metadata from aws glue as source and Datahub API as sink. I tried using datahub ingest -c command for same. Error : $ python3 -m datahub ingest -c abc.dhub.yaml --dry-run [2023-03-29 121108,768] INFO {datahub.cli.ingest_cli:173} - DataHub CLI version: 0.10.1 [2023-03-29 121109,169] ERROR {datahub.entrypoints:192} - Command failed: Failed to set up framework context: Failed to instantiate a valid DataHub Graph instance

✅ 1

microscopic-leather-94537

03/29/2023, 12:17 PM

hi I am unable to add more terms in my glossaary please help.😵‍💫😵‍💫

fast-midnight-10167

03/29/2023, 1:31 PM

Maybe a dumb question - how can I properly manage the path for a dataset when emitting a change to the metadata to datahub? (Using the python approach) For example, I have some data in s3 buckets. In the script, I loop over the objects in said bucket, and, using a

MetadataChangeProposalWrapper

, emit changes to add new custom properties. But the problem is, when I give the entityUrn the name (via

make_dataset_urn

), it treats the filepath both as the filepath and name of the object. So instead of ending up with a

<env>/<folderpath>/<obj_name>

in datahub, I end up with that path, but the object name itself includes the folder path as if it was part of the object name itself.

✅ 1

wide-optician-47025

03/29/2023, 5:29 PM

hello, I need to control what datasets users are able to view, yet when I created a policy for a user and restricted read access to just a few datasets, the user can still see all datasets

glamorous-microphone-33484

03/30/2023, 12:51 AM

Hi Datahub Team, Sorry for the duplicate post. There is no reply for the following questions in the "getting-started" channel. 1. Will Datahub Ingest cli (Python), Datahub UI ingestion and GMS rest API be governed by access policies (https://datahubproject.io/docs/authorization/policies/) or apache ranger plugin? 2. On a related question related to Apache Ranger plugin, are you able to provide screenshots/examples on how to configure metadata privileges via the Ranger UI? I was able to verify that platform privileges could be offloaded to Ranger and Datahub was able to sync the policies from Ranger correctly. However, it was not the case for metadata privileges and I was not able to get Datahub to apply the metadata related policies from Ranger. Can the team verify that datahub integration with ranger is working with metadata privileges and we can offload them to Ranger? 3. What is the use of this env variable "REST_API_AUTHORIZATION_ENABLED"? It is not clearly documented in the project

✅ 1