https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • c

    creamy-machine-95935

    03/23/2023, 4:46 PM
    HI! We are using SSO with Google to authenticate users. How can we assign all this authenticated user the Reader role? Thanks!
    a
    a
    • 3
    • 5
  • w

    white-shampoo-69122

    03/23/2023, 5:16 PM
    Hi, we recently upgraded to DataHub 0.10.0, we are ingesting metadata for Glue + dbt and we noticed the list of columns is now duplicated, with: • 1 set of them (coming from Glue) having no description • the 2nd set, which is repeated has the description coming from dbt After a bit of digging noticed, that in the graphql response the Glue datasets have
    fieldPaths
    as version 2:
    Copy code
    fields: 
      0:
        description: null
        fieldPath: "[version=2.0].[type=string].a_column_x"
        globalTags: null
        glossaryTerms: null
        isPartOfKey: false
        jsonPath: null
        label: null
        nativeDataType: "string"
        nullable: true
        recursive: false
        type: "STRING"
        __typename: "SchemaField"
    While the dbt sibling entity has fieldPaths version 1(?):
    Copy code
    siblings: 
      isPrimary: false
      siblings: 
        0:
          ... 
          schemaMetadata: 
            fields: 
              ...
              60: 
                description: "A description for column x"
                fieldPath: "a_column_x"
                globalTags: null
                glossaryTerms: null
                isPartOfKey: false
                jsonPath: null
                label: null
                nativeDataType: "varchar"
                nullable: false
                recursive: false
                type: "STRING"
                __typename: "SchemaField"
    Not really sure how it was before but found that interesting. Also might be worth mentioning that we have enable statful ingestion for both dbt and Glue and that it worked well before. Any ideas what might be going wrong?
    a
    g
    +4
    • 7
    • 13
  • f

    flaky-portugal-377

    03/23/2023, 6:32 PM
    Hello - Im very new to DataHub. We are trying to ingest metadata from our Snowflake instance. When we test the connection it runs successfully, but when we try to run the actual ingestion we are getting these types of errors [2023-03-22 195814,603] INFO {datahub.cli.ingest_cli:120} - Starting metadata ingestion [2023-03-22 195815,429] INFO {datahub.ingestion.source.snowflake.snowflake_v2:1388} - Checking current version [2023-03-22 195815,551] ERROR {datahub.ingestion.source.snowflake.snowflake_v2:224} - version => Error: 'CURRENT_VERSION()' [2023-03-22 195815,551] INFO {datahub.ingestion.source.snowflake.snowflake_v2:1394} - Checking current role [2023-03-22 195815,645] ERROR {datahub.ingestion.source.snowflake.snowflake_v2:224} - version => Error: 'CURRENT_ROLE()' [2023-03-22 195815,645] INFO {datahub.ingestion.source.snowflake.snowflake_v2:1400} - Checking current warehouse [2023-03-22 195815,744] ERROR {datahub.ingestion.source.snowflake.snowflake_v2:224} - current_warehouse => Error: 'CURRENT_WAREHOUSE()' [2023-03-22 195815,744] INFO {datahub.ingestion.source.snowflake.snowflake_v2:1407} - Checking current edition [2023-03-22 195816,368] WARNING {datahub.ingestion.source.snowflake.snowflake_v2:185} - snowsight url => unable to get snowsight base url due to an error -> 'CURRENT_ACCOUNT()' [2023-03-22 195816,368] ERROR {datahub.ingestion.source.snowflake.snowflake_v2:224} - permission-error => Current role does not have permissions to use warehouse WH_NMHS_XS. Please update permissions. We are on a Tenant of Snowflake, not sure if that has anything to do with this or not. Any help would be much appreciated. Thanks!!
    a
    • 2
    • 5
  • a

    acceptable-football-40437

    03/23/2023, 6:39 PM
    Hello! I'm trying to get more familiar with the DataHub API, and want to be able to update metadata for a given entity--the issue is, I'm having trouble figuring out the pattern for the URN. The UI gives me something like
    urn:li:container:<alphanumeric-string>
    , but from the docs it seems more human-readable (and programmatically guessable) versions of URNs exist. How does one put together the latter type of URN for, say, a BigQuery dataset? If this belongs in a difference channel, I'll happily cross-post!
    🩺 1
    a
    m
    • 3
    • 4
  • s

    strong-hospital-52301

    03/23/2023, 6:45 PM
    Hello! I'm trying to synchronize the data that I have on a local instance of MySQL with docker container datahub but it stays freeze on this state. MySQL instance runs on localhost:3306 and it's MySQL 8.0.32 version (i attach a picture). My Ingestion file is the following one:
    ✅ 1
    a
    • 2
    • 6
  • b

    bumpy-activity-74405

    03/24/2023, 7:01 AM
    Hi, some time ago I had an issue with
    HTTP header value exceeds the configured limit of 8192 characters
    in frontend. I was able to work around it with env variables introduced in this PR and it worked on version
    v0.8.44
    . After upgrading to
    v0.9.6.1
    the issue is back. I suspect it has to do with renaming the configuration option in this commit. Not sure why it was done since akka documentation states it should be
    max-header-value-length
    .
    🩺 1
    a
    e
    • 3
    • 6
  • s

    swift-dream-78272

    03/24/2023, 1:53 PM
    Hey Team, I’ve been trying to run ingestion using python script like this - https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/programatic_pipeline.py Does it support when config_dict has env variables instead of explicitly inserted values? Something like this?
    Copy code
    from datahub.ingestion.run.pipeline import Pipeline
    
    # The pipeline configuration is similar to the recipe YAML files provided to the CLI tool.
    pipeline = Pipeline.create(
        {
            "source": {
                "type": "mysql",
                "config": {
                    "username": "user",
                    "password": "pass",
                    "database": "db_name",
                    "host_port": "localhost:3306",
                },
            },
            "sink": {
                "type": "datahub-rest",
                "config": {"server": "${DATAHUB_GMS_URL}",},
            },
        }
    )
    
    # Run the pipeline and report the results.
    pipeline.run()
    pipeline.pretty_print_summary()
    ✅ 1
    b
    • 2
    • 3
  • b

    brief-bear-90340

    03/24/2023, 4:07 PM
    hello team, i am trying to setup datahub for bigquery and running into issue trying to setup the connection
    Copy code
    [12:05 PM] DefaultCredentialsError: ('Failed to load service account credentials from /tmp/tmp53gmekv8', ValueError('Could not deserialize key data. The data may be in an incorrect format, it may be encrypted with an unsupported algorithm, or it may be an unsupported key type (e.g. EC curves with explicit parameters).', [<OpenSSLError(code=503841036, lib=60, reason=524556, reason_text=unsupported)>]))
    any help with this would be appreciated
  • r

    rapid-zoo-88437

    03/25/2023, 3:28 AM
    Hi team, I'm new to datahub. Tried to use spark lineage feature, but I met some problems. Listener counld find the downstream but couldn't find the upstream! I had already set "spark.datahub.coalesce_jobs" to true. I would be grateful if someone could give me some clue, thank you! Below is my sample code: find downstream Persons1, Persons2 can't find upstream Persons on datahub UI lineage
    Copy code
    val ds1 = spark.read
            .format("jdbc")
            .option("driver","com.mysql.cj.jdbc.Driver")
            .option("url", "jdbc:mysql://{myhost}:3306/xxx")
            .option("dbtable", "Persons")
            .option("user", "xxx")
            .option("password", "xxx")
            .load()
    
    
    ds1.write.mode(SaveMode.Append)
    .format("jdbc")
    .option("driver","com.mysql.cj.jdbc.Driver")
    .option("url", "jdbc:mysql://{myhost}:3306/xxx")
    .option("dbtable", "Persons1")
    .option("user", "xxx")
    .option("password", "xxx")
    .save()
    
    ds1.write.mode(SaveMode.Append)
    .format("jdbc")
    .option("driver","com.mysql.cj.jdbc.Driver")
    .option("url", "jdbc:mysql://{myhost}:3306/xxx")
    .option("dbtable", "Persons2")
    .option("user", "xxx")
    .option("password", "xxx")
    .save()
    ✅ 1
    g
    i
    • 3
    • 4
  • f

    fresh-cricket-75926

    03/27/2023, 11:35 AM
    Hi Community , when i run datahub delete command i.e " datahub delete --env DEV --urn "urnlidataset:(urnlidataPlatform:redshift,lt_phoenix.business.f_order_line,DEV)" " i am facing with "*JSONDecodeError: Expecting value: line 1 column 1 (char 0)*". Any idea what might be the issue here. Attached are the stack trace.
    datahub delete logs.txt
    a
    • 2
    • 1
  • w

    wonderful-quill-11255

    03/27/2023, 12:57 PM
    Hello Community. We are deploying a forked version of datahub (only minor changes) and after a long period of inactivity we are doing an update from 0.8.35 to 0.10.0. When we develop our fork, we tag our own releases with our own git tags that don't follow the exact format
    vX.Y.Z
    that the regular datahub code does. Up until now that hasn't mattered a lot since that value was mainly used in the UI to show the version running. But it seems that recently this value has become more important, controlling a step in the bootstrap process. I'm wondering if anyone else have encountered this and how you chose to deal with it. Best Regards
    ✅ 1
    a
    o
    • 3
    • 3
  • l

    limited-refrigerator-50812

    03/27/2023, 3:59 PM
    Hi Team, I added a new entity to my local fork of datahub and despite trying to make it as simple as possible I'm having a hard time. I managed to follow all the steps in the guide for extending the metadtaa model here: https://datahubproject.io/docs/metadata-modeling/extending-the-metadata-model. After that I followed the GraphQL guide: https://datahubproject.io/docs/datahub-graphql-core. However, now, when I try to rebuild datahub with the command
    ./gradlew quickstartDebug --stacktrace -x yarnTest -x yarnLint
    I get an error that I don't know how to deal with. Including the the error message(s) below. Any idea how I can find out what I did wrong?
    Copy code
    > Task :datahub-web-react:yarnGenerateyarn run v1.22.0
    $ graphql-codegen --config codegen.yml0s]
    (node:4132) ExperimentalWarning: stream/web is an experimental feature. This feature could change at any time
    (Use `node --trace-warnings ...` to show where the warning was created)
    [15:13:03] Parse configuration [started]]
    [15:13:03] Parse configuration [completed]
    [15:13:03] Generate outputs [started]
    [15:13:03] Generate src/types.generated.ts [started]
    [15:13:03] Generate to src/ (using EXPERIMENTAL preset "near-operation-file") [started]
    [15:13:03] Load GraphQL schemas [started]
    [15:13:03] Load GraphQL schemas [started]
    [15:13:03] Load GraphQL schemas [failed]
    [15:13:03] → Failed to load schema
    [15:13:03] Generate to src/ (using EXPERIMENTAL preset "near-operation-file") [failed]
    [15:13:03] → Failed to load schema
    [15:13:03] Load GraphQL schemas [failed]
    [15:13:03] → Failed to load schema
    [15:13:03] Generate src/types.generated.ts [failed]
    [15:13:03] → Failed to load schema
    [15:13:03] Generate outputs [failed]
    Something went wrong
    error Command failed with exit code 1.
    info Visit <https://yarnpkg.com/en/docs/cli/run> for documentation about this command.
    > :datahub-web-react:yarnGenerate
    > Task :datahub-web-react:yarnGenerate FAILED
    > :datahub-web-react:yarnGenerate
    FAILURE: Build failed with an exception.]
    
    * What went wrong:
    Execution failed for task ':datahub-web-react:yarnGenerate'.
    > Process 'command '/mnt/c/Users/dries528/Documents/Code/datahub_fresh/datahub/datahub-web-react/.gradle/yarn/yarn-v1.22.0/bin/yarn'' finished with non-zero exit value 1
    
    * Try:
    Run with --info or --debug option to get more log output. Run with --scan to get full insights.
    partial-errorlog.txt
    c
    a
    • 3
    • 5
  • b

    bulky-grass-52762

    03/27/2023, 6:59 PM
    Hey lovely DataHub team 👋 I wanted to bring to your attention an important breaking change that appears to be missing from the documentation ❗ Upon attempting to upgrade DataHub from version
    0.9.3
    to
    0.10.1
    , we discovered that certain nodes in the lineage UI have disappeared. These nodes were not entities themselves, but rather were connected to other entities as upstream/downstream dependencies. For example in our use case as attached in the screenshot, we used s3 lineage aspect to complete the flow of hive -> s3 -> redshift, but that flow seems to be broken because in
    0.10.1
    , the lineage aspects seems to be missing in the lineage UI. I believe this is because of the implementation of showing an error message if the entity is not found. IMHO, this shouldn’t have impacted the nodes in the lineage UI, since the original redshift ingestion is still offloading the related s3 upstream lineage aspect without the entity itself. TIA for your future efforts looking at this thankyou
    👀 1
    a
    d
    a
    • 4
    • 13
  • c

    cuddly-butcher-39945

    03/27/2023, 10:26 PM
    Hey everyone, I’m having issues with the quickstart. Following this guide: https://datahubproject.io/docs/docker/development Here are my steps: 1. Synced fork 2. datahub docker nuke 3. datahub docker quickstart --version=v0.10.0 4. I see many connectivity errors during the mysql-setup and gms.. Here are a few of the lines where I see a bunch of connectivity errors: datahub-datahub-actions-1 | 2023/03/27 213232 Problem with request: Get “http://datahub-gms:8080/health”: dial tcp 172.20.0.78080 connect: connection refused. Sleeping 1s mysql-setup | 2023/03/27 213936 Problem with dial: dial tcp 172.20.0.63306 connect: connection refused. Sleeping 1s datahub-gms | 2023/03/27 213235 Problem with dial: dial tcp 172.20.0.929092 connect: connection refused. Sleeping 1s I also checked for any of these ports already listening from other pids, but did not see any. netstat -lntup Update to this: Further troubleshooting steps: 1. Cleared all docker images, etc.. 2. docker system prune --all --volumes --force 3. Rebooted Box 4. datahub docker quickstart --version=v0.10.0 Still unable to complete the quickstart… Unable to run quickstart - the following issues were detected: - datahub-gms is still starting - mysql-setup is still running - datahub-upgrade exited with an error If you think something went wrong, please file an issue at https://github.com/datahub-project/datahub/issues or send a message in our Slack https://slack.datahubproject.io/ Be sure to attach the logs from /tmp/tmpo132cuyl.log Any help would be appreciated!
    g
    • 2
    • 2
  • n

    numerous-account-62719

    03/28/2023, 6:28 AM
    Hi Team, I have ingested data from kafka but I am not able to see any schema. Every thing is null. I have around 70 datasets, all are blank Can someone help me
    w
    a
    a
    • 4
    • 20
  • m

    microscopic-room-90690

    03/28/2023, 8:01 AM
    Hi team, I’m experiencing an issue. I tried to remove a link and it appeared to work, but the link is still there after refreshing the page. My UI version is 0.9.6.1. Would anyone be able to help me with this?
    a
    • 2
    • 1
  • b

    bumpy-activity-74405

    03/28/2023, 10:04 AM
    Hi I am running
    v0.9.6.1
    . Having issues with the download csv feature when trying to download ~5k datasets. It's my understanding that it tries to batch these queries in chunks of 1000, but each chunk takes longer and longer until I get a timeout error in gms:
    Copy code
    09:50:18.820 [qtp71399214-1199] WARN  o.s.w.s.m.s.DefaultHandlerExceptionResolver:208 - Resolved [org.springframework.web.context.request.async.AsyncRequestTimeoutException]
    I am getting similar results when trying to run a graphql query - if I set the
    count
    to 10000 (everything in one chunk) it times out. If I try to batch my queries using offsets (
    start/count
    ) I can observe that with increasing offsets I also get increasing query run times which eventually time out when reaching 30s. Is there something that I could do about this - increase timeout somehow or should I somehow scale elasticsearch?
    h
    a
    • 3
    • 3
  • b

    bright-morning-76046

    03/28/2023, 11:47 AM
    Hi! I'm following the quickstart guide and when I do 'datahub docker quickstart' i have the follow error message:
    Copy code
    Unable to run quickstart - the following issues were detected:
    - datahub-gms is running by not yet healthy
    - datahub-upgrade is still running
    
    If you think something went wrong, please file an issue at <https://github.com/datahub-project/datahub/issues>
    or send a message in our Slack <https://slack.datahubproject.io/>
    Be sure to attach the logs from /var/folders/c2/3gbwy5wj5dbfvgjzz3kctd000000gp/T/tmpsdxu45ta.log
    My Version is DataHub CLI version: 0.10.1 Thank you so much!
    a
    a
    • 3
    • 5
  • m

    mysterious-advantage-78411

    03/28/2023, 12:36 PM
    Hi, have anybody same error? (0.10.1)
    a
    • 2
    • 1
  • b

    best-wire-59738

    03/29/2023, 1:50 AM
    Hi Team, we are facing policies syncing issue in datahub v0.10.0.6 version. we have recently upgraded from v0.9.2 to v0.10.0.6 and old policies are working and are in sync but new policies created are not getting synced. We have given privilege for all users to View Dataset Usage and View Dataset Profile but still users are unable to view dataset details they are getting code 500 error in the UI. Any kind of help is Appreciated
    h
    g
    +2
    • 5
    • 16
  • f

    fierce-monkey-46092

    03/29/2023, 6:40 AM
    Hi, I'm doing file-based ingestion with data profiling enabled (stats tab) from Oracle source. The question is during the ingestion, I'm having some exceptions and warnings but ingestion is keep going. After around 20 mins the process is completely stuck. Did anyone faced this issue before?
    a
    • 2
    • 1
  • b

    busy-mechanic-8014

    03/29/2023, 9:00 AM
    Hello everyone, i'm trying to create the first Personal Access Token programatically but got "401 Client Error: Unauthorized for url". This issue has already been mentioned in these threads but even following the steps described there it does not work : • https://app.slack.com/client/TUMKD5EGJ/search/search-eyJkIjoicHlqd3QiLCJxIjoiVTA0UjdRMUdNRUciLCJyIjoicHlqd3QifQ==/thread/CV2UVAPPG-1678295093.481849 • https://app.slack.com/client/TUMKD5EGJ/search/search-eyJkIjoicHlqd3QiLCJxIjoiVTA0Uj[…]NRUciLCJyIjoicHlqd3QifQ==/thread/C029A3M079U-1668589539.838859 Here are my steps: Configuration (Helm = app version: v0.10.0 – chart version : v0.2.151) • Set METADATA_SERVICE_AUTH_ENABLED var to true in helm values for datahub-gms & datahub-front • Enable metadata_service_authentication with no changes
    Copy code
    metadata_service_authentication:
          enabled: true
          systemClientId: "__datahub_system"
          systemClientSecret:
            secretRef: "datahub-auth-secrets"
            secretKey: "token_service_signing_key"
          tokenService:
            signingKey:
              secretRef: "datahub-auth-secrets"
              secretKey: "token_service_signing_key"
            salt:
              secretRef: "datahub-auth-secrets"
              secretKey: "token_service_salt"
          # Set to false if you'd like to provide your own auth secrets
          provisionSecrets:
            enabled: true
            autoGenerate: true
          # Only specify if autoGenerate set to false
          #  secretValues:
          #    secret: <secret value>
          #    signingKey: <signing key value>
          #    salt: <salt value>
    => I’ve now a secret with token_service_signing_key: f2E0BZoNKlr7CEu71kjZjAduRNCsePKS Create programmatically the access token • Decode an access token created on the UI and get the payload
    Copy code
    {
      "actorType": "USER",
      "actorId": "datahub",
      "type": "PERSONAL",
      "version": "2",
      "jti": "6ec82917-d39a-4c52-9a5e-5d4caacf6b7d",
      "sub": "datahub",
      "exp": 1680015431,
      "iss": "datahub-metadata-service"
    }
    • I validated the service key by recreating the token by my own means (just used https://jwt.io/ with payload, header and token signing key) • Create a new token in Python
    Copy code
    import jwt
    import time
    
    # I noticed that you have to encode the service key in ASCII to get the same verified signature as the token created on the UI (anyway I tested with or without for the same result)
    secret_signing_key = "f2E0BZoNKlr7CEu71kjZjAduRNCsePKS".encode('ascii')  
    payload = {
      "actorType": "USER",
      "actorId": "datahub",
      "type": "PERSONAL",
      "version": "2",
      "jti": "6ec82917-d39a-4c52-9a5e-5d4caacf6b7d",
      "sub": "datahub",
      "exp": 1680015431,
      "iss": "datahub-metadata-service"
    }
    header = { "alg": "HS256" }
    token = jwt.encode(payload, secret, headers=header)
    print(token)
    eyJhbGciOiJIUzI1NiJ9…
    • Decode my new access token to check if it is well built => all looks good *cURL (*Curl proposed when creating a token on the UI)
    Copy code
    curl -X POST "<http://datahub-front-url/api/graphql>" --header 'Authorization: Bearer eyJhbGciOiJIUzI1NiJ9… ' --header 'Content-Type: application/json' --data-raw '{"query": "{\n me {\n corpUser {\n username\n }\n }\n}","variables":{}}'
    => HTTP ERROR 401 Unauthorized to perform this action Datahub API
    datahub ingest -c /tmp/ch_recipe.yml
    ch_recipe.yml:
    Copy code
    source:
        type: clickhouse
        config:
            host_port: "clickhouse-install.clickhouse.svc.cluster.local:8123"
            username: ****
            password: ****
            platform_instance: DatabaseNameToBeIngested
            include_views: true
            include_tables: true
    sink:
        type: "datahub-rest"
        config:
                server: "<http://datahub-gms.datahub.svc.cluster.local:8080>"
                token: "eyJhbGciOiJIUzI1NiJ9…."
    => 401 Client Error: Unauthorized for url All works fine if I put a token created on the UI. Questions Has anyone managed to create a token programmatically and used it for queries? Is it really possible to do that now? I also noticed (if I understood correctly) that if I create a token via the UI, retrieve it but delete it immediately afterwards, it's as if I simulate creating the token programmatically and get this result. If we can really create our own token with the token signing key, we should be able to use this token (present or not on the UI) to request datahub. On my side it doesn't work. I remain available if you need more information! 🙂 Thanks for your time and I hope someone can help me out!
    a
    e
    +2
    • 5
    • 11
  • a

    astonishing-dusk-99990

    03/29/2023, 9:21 AM
    Hi Community, I’ve a problem regarding setting for OIDC using google. Currently I already set up my oidc with google and here’s my yaml on datahub-frontend pod
    Copy code
    datahub-frontend:
      enabled: true
      image:
        repository: linkedin/datahub-frontend-react
        tag: "v0.10.0" # # defaults to .global.datahub.version
      resources:
        limits:
          memory: 1400Mi
        requests:
          cpu: 100m
          memory: 512Mi
      # Set up ingress to expose react front-end
      ingress:
        enabled: false
      oidcAuthentication: # OIDC auth based on <https://datahubproject.io/docs/authentication/guides/sso/configure-oidc-react>
        enabled: false
      extraEnvs:
        - name: AUTH_JAAS_ENABLED
          value: "true"
        - name: AUTH_OIDC_ENABLED
          value: "true"
        - name: AUTH_OIDC_CLIENT_ID
          value: "your_oidc_client_id"
        - name: AUTH_OIDC_CLIENT_SECRET
          value: your_client_secret
        - name: AUTH_OIDC_DISCOVERY_URI
          value: "<https://accounts.google.com/.well-known/openid-configuration>"
        - name: AUTH_OIDC_BASE_URL
          value: "<http://localhost:9002>"
        - name: AUTH_OIDC_USER_NAME_CLAIM
          value: "email"
        - name: AUTH_OIDC_USER_NAME_CLAIM_REGEX
          value: "([^@]+)"
      extraVolumes:
        - name: datahub-users
          secret:
            defaultMode: 0444
            secretName: datahub-users-secret
      extraVolumeMounts:
        - name: datahub-users
          mountPath: /datahub-frontend/conf/user.props
          #mountPath: /etc/datahub/plugins/frontend/auth/user.props
          subPath: user.props
    And then I followed this article to set up google and already set up my Authorized Javascript Origins and Authorized Redirect URLs in attachment below. However when I tested, It showed google sign in with my personal gmail and work gmail. Then, first I tried to test with my personal gmail and the result is as expected which is access blocked, but when I use my work gmail always refused to connect like attachment below. My question, what’s the problem here can anyone here help me? Notes: • I already allow port 9002 in firewall rule • My version image is 0.10.0 • Deployed using helm chart on kubernetes cluster
    ✅ 1
    a
    • 2
    • 3
  • p

    powerful-cat-68806

    03/29/2023, 10:26 AM
    Hi DH team 🙂 I’ve executed
    helm upgrade
    from my local, but I’m not seeing the latest updates from the announcement This is my chart
    Copy code
    apiVersion: v2
    name: jfrog-datahub
    description: A Helm chart for Acryl DataHubd
    type: application
    version: 0.0.1
    appVersion: latest #0.3.1
    dependencies:
      - name: datahub
        version: 0.2.148
        repository: <https://helm.datahubproject.io>
    a
    a
    c
    • 4
    • 6
  • i

    icy-flag-80360

    03/29/2023, 12:10 PM
    Hello! I was faced with error after updating datahub on k8s from version 0.10.0 to 0.10.1 with this exception:
    Copy code
    uppressed: org.elasticsearch.client.ResponseException: method [POST], host [<http://elasticsearch-master:9200>], URI [/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request] {"error":{"root_cause":[{"type":"query_shard_exception","reason":"[simple_query_string] analyzer [query_word_delimited] not found","index_uuid":"GZJPC-CBTtekUqWRmtZGfA","index":"datahubpolicyindex_v2"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"datahubpolicyindex_v2","node":"MhPwkRJ4T8WYHY0QwONrOg","reason":{"type":"query_shard_exception","reason":"[simple_query_string] analyzer [query_word_delimited] not found","index_uuid":"GZJPC-CBTtekUqWRmtZGfA","index":"datahubpolicyindex_v2"}}]},"status":400}
    But if I'm check from curl in GMS pod - all ok, elastic returns data with existing policies, but without any index_uuid. Example: curl -XGET 'http://elasticsearch-master:9200/datahubpolicyindex_v2/_search?typed_keys=true&amp;max_concurrent_shard_requests=5&amp;ignore_unavailable=false&amp;expand_wildcards=open&amp;allow_no_indices=true&amp;ignore_throttled=true&amp;search_type=query_then_fetch&amp;batched_reduce_size=512&amp;ccs_minimize_roundtrips=true' Is any ways to repair it? I'll tried many ways to recover with full erasing elastic data too.
    plus1 3
    🚨 1
    ✅ 1
    a
    o
    • 3
    • 4
  • a

    adventurous-waiter-4058

    03/29/2023, 12:16 PM
    Hi All, I am trying to execute datahub ingest command on using aws cli. Datahub is hosted on AWS EKS and its up and running but facing below error while ingesting metadata from aws glue as source and Datahub API as sink. I tried using datahub ingest -c command for same. Error : $ python3 -m datahub ingest -c abc.dhub.yaml --dry-run [2023-03-29 121108,768] INFO {datahub.cli.ingest_cli:173} - DataHub CLI version: 0.10.1 [2023-03-29 121109,169] ERROR {datahub.entrypoints:192} - Command failed: Failed to set up framework context: Failed to instantiate a valid DataHub Graph instance
    ✅ 1
    a
    h
    a
    • 4
    • 4
  • m

    microscopic-leather-94537

    03/29/2023, 12:17 PM
    hi I am unable to add more terms in my glossaary please help.😵‍💫😵‍💫
    a
    • 2
    • 1
  • f

    fast-midnight-10167

    03/29/2023, 1:31 PM
    Maybe a dumb question - how can I properly manage the path for a dataset when emitting a change to the metadata to datahub? (Using the python approach) For example, I have some data in s3 buckets. In the script, I loop over the objects in said bucket, and, using a
    MetadataChangeProposalWrapper
    , emit changes to add new custom properties. But the problem is, when I give the entityUrn the name (via
    make_dataset_urn
    ), it treats the filepath both as the filepath and name of the object. So instead of ending up with a
    <env>/<folderpath>/<obj_name>
    in datahub, I end up with that path, but the object name itself includes the folder path as if it was part of the object name itself.
    ✅ 1
    a
    • 2
    • 2
  • w

    wide-optician-47025

    03/29/2023, 5:29 PM
    hello, I need to control what datasets users are able to view, yet when I created a policy for a user and restricted read access to just a few datasets, the user can still see all datasets
    b
    a
    +2
    • 5
    • 19
  • g

    glamorous-microphone-33484

    03/30/2023, 12:51 AM
    Hi Datahub Team, Sorry for the duplicate post. There is no reply for the following questions in the "getting-started" channel. 1. Will Datahub Ingest cli (Python), Datahub UI ingestion and GMS rest API be governed by access policies (https://datahubproject.io/docs/authorization/policies/) or apache ranger plugin? 2. On a related question related to Apache Ranger plugin, are you able to provide screenshots/examples on how to configure metadata privileges via the Ranger UI? I was able to verify that platform privileges could be offloaded to Ranger and Datahub was able to sync the policies from Ranger correctly. However, it was not the case for metadata privileges and I was not able to get Datahub to apply the metadata related policies from Ranger. Can the team verify that datahub integration with ranger is working with metadata privileges and we can offload them to Ranger? 3. What is the use of this env variable "REST_API_AUTHORIZATION_ENABLED"? It is not clearly documented in the project
    ✅ 1
    b
    a
    +6
    • 9
    • 25
1...858687...119Latest