https://datahubproject.io logo
Join SlackCommunities
Powered by
# troubleshoot
  • s

    shy-ability-95880

    06/16/2022, 9:12 AM
    Hi team, I'm trying to connect my superset to datahub but I ran into this error, which I'm not too sure what this error is. Can I get some assistance on this? Sorry, still very unfamiliar with this. recipe I ran source: type: superset config: username: ******* password: ******* provider: db connect_uri: ' https://superset.*************' sink: type: datahub-rest config: server: 'http://datahub-gms:8080'
    Superset error.txt
    b
    • 2
    • 3
  • c

    chilly-elephant-51826

    06/16/2022, 9:46 AM
    #troubleshoot I am getting error, while ingesting metadata from glue, the actions module is not giving proper debugging info, plus in case of failure for one dataset it should skip and try to ingest other, which is not happening. @big-carpet-38439 need some help here
    b
    • 2
    • 2
  • s

    salmon-area-51650

    06/16/2022, 9:48 AM
    👋 Hi team! I have a problem with
    dbt
    ingestion. I was running the ingestion and the cronjob seems ok, but I cannot visualize
    dbt
    platform in UI.
    Attaching the output log of the ingestion job. And this is the configuration
    Copy code
    source:
      type: "dbt"
      config:
        # Coordinates
        manifest_path: "<s3://bucket_name/manifest.json>"
        catalog_path: "<s3://bucket_name/catalog.json>"
        sources_path: "<s3://bucket_name/sources.json>"
    
        aws_connection:
          aws_region: "eu-west-2"
    
        # Options
        target_platform: "snowflake"
        load_schemas: True
        env: STG
        node_name_pattern:
          allow:
            - ".*branch_activities.*"
          deny:
            - ".*test.*"
    
    sink:
      type: "datahub-rest"
      config:
        server: "<http://datahub-datahub-gms:8080>"
    Any clue? Thanks!
    output.txt
    c
    b
    • 3
    • 26
  • d

    damp-minister-31834

    06/16/2022, 10:58 AM
    Hi all! I found there is some trouble about
    /relationships
    API. The official demo is
    Copy code
    curl --location --request GET --header 'X-RestLi-Protocol-Version: 2.0.0' '<http://localhost:8080/relationships?direction=OUTGOING&urn=urn%3Ali%3Achart%3Acustomers&types=List(OwnedBy)>'
    But if the value of urn contains "(", it will throw error "`org.neo4j.driver.exceptions.ClientException:Invalid input '('`". And the urn of dataset and dataJob both contain '(' and ')'. Is this an issue?
    b
    b
    • 3
    • 8
  • r

    ripe-electrician-13049

    06/17/2022, 3:07 AM
    Hi everyone! I want to analyze some spark sql commands in spark 3. And I upgrade the spark dependency in build.gradle like following: 'sparkSql' : 'org.apache.sparkspark sql 2.123.2.1', 'sparkHive' : 'org.apache.sparkspark hive 2.123.2.1' But I when I integrate the jar file with my project, I got the error like: How could I fix this bug?
    m
    c
    • 3
    • 13
  • r

    rich-policeman-92383

    06/17/2022, 6:02 AM
    Hello In datahubv0.8.35 one of our Active Directory user is facing issue during login. Issue: !@someid - Internal Server Error for Get [/callback/oidc?code=AAAA............................................ Auth mechanism: OIDC @magnificent-notebook-88304
  • c

    curved-truck-53235

    06/17/2022, 6:39 AM
    Hello! Our DataHub was updated to 0.8.38 (by datahub docker quickstart --keep-data) and we have an issue with glossary terms. We can't to create/delete terms but we can edit.
    b
    • 2
    • 39
  • h

    high-hospital-85984

    06/17/2022, 9:33 AM
    Thanks to a feature in a AWS database migration tool, we ended up with corrupted data in our GMS database. No biggie. Other than that we lose a lot of historical data, would there be an issue with doing the following in order to start almost from scratch: 1. Stop all ingestion 2. delete entries in the GMS database (excluding a set of urn-types, for example
    dataHubPolicy
    ) 3. Run the index repopulation job 4. Start the ingestion again. Does this sound like a reasonable plan?
  • b

    breezy-portugal-43538

    06/17/2022, 11:07 AM
    Good morning datahub people, I have a question regarding the integration of the great expectations with datahub. As far as I understood the official way to upload great expectations results to datahub urn is to run the
    great_expectations
    command. Although is it possible to update the already existing urn when I already have the results stored in a file from great_expectations? I'm talking here about some nice curl command which I could run passing the json with contents of the great_expectations results, would it be possible? As always thank you a lot for the help : )
    b
    • 2
    • 19
  • b

    brave-tomato-16287

    06/17/2022, 11:13 AM
    Hello! I've faced with the error then I try to run
    redshift-usage
    ingestion:
    Copy code
    '[2022-06-17 03:11:32,443] INFO     {datahub.cli.ingest_cli:97} - DataHub CLI version: 0.8.36\n'
               '[2022-06-17 03:11:36,611] INFO     {datahub.cli.ingest_cli:113} - Starting metadata ingestion\n'
               '/usr/local/bin/run_ingest.sh: line 26:   320 Killed                  ( python3 -m datahub ingest -c "$4/$1.yml" )\n',
               "2022-06-17 03:14:45.501901 [exec_id=8ccd646c-fc06-4bfd-a7af-cc622cc5be81] INFO: Failed to execute 'datahub ingest'",
               '2022-06-17 03:14:45.505219 [exec_id=8ccd646c-fc06-4bfd-a7af-cc622cc5be81] INFO: Caught exception EXECUTING '
               'task_id=8ccd646c-fc06-4bfd-a7af-cc622cc5be81, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n'
               '    self.event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
               '    return f.result()\n'
               '  File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
               '    raise self._exception\n'
               '  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
               '    result = coro.send(None)\n'
               '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
               '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
               "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
    Execution finished with errors.
    Duration about 300s. Can anybody suggest what should we do?
    d
    • 2
    • 2
  • l

    lemon-nail-49127

    06/17/2022, 6:32 PM
    Hello! Is there a way to disable SSL within a recipe file? Or disable entirely? Trying to connect to Trino which is not using SSL and getting SSL errors.
    b
    • 2
    • 1
  • b

    brave-tomato-16287

    06/20/2022, 9:27 AM
    Hello! I've faced with error in tableau ingestion.
    Copy code
    {\'tableau-metadata\': ["Connection: customSQLTablesConnection Error: [{\'message\': \'Showing partial results. The '
               'request exceeded the "\n'
               '                                   "20000 node limit. Use pagination, additional filtering, or both in the query to adjust results.\', '
               '\'extensions\': "\n'
               '                                   "{\'severity\': \'WARNING\', \'code\': \'NODE_LIMIT_EXCEEDED\', \'properties\': {\'nodeLimit\': '
               '20000}}}]"]},\n'
    Do we have solution for this?
    h
    • 2
    • 1
  • s

    swift-breakfast-25077

    06/20/2022, 9:43 AM
    Hi all, i have the error "en unknown error occurred. (code 500)" whene i access datahub and the top tags, glosary terms .... section does not display ! can someone help me ?
    b
    • 2
    • 1
  • b

    better-orange-49102

    06/20/2022, 10:54 AM
    anyone who forked the project and reused the Github actions wholesale? kept encountering failures at this Github action in docker-unified:
    Copy code
    name: Upload image locally for testing (if not publishing)
    uses: ishworkh/docker-image-artifact-upload@v1
            if: ${{ needs.setup.outputs.publish != 'true' }}
            with:
              image: ${{ steps.docker_meta.outputs.tags }}
    whenever i merge to master branch it mysteriously fails all the containers with the cyrptic message:
    Copy code
    Run ishworkh/docker-image-artifact-upload@v1
    Error: RangeError [ERR_CHILD_PROCESS_STDIO_MAXBUFFER]: stdout maxBuffer length exceeded
  • s

    swift-breakfast-25077

    06/20/2022, 12:30 PM
    profiling was successefully excuted but it is not showing stats ??? the problem is since i did the upgrade to 0.8.38
    b
    • 2
    • 1
  • s

    swift-breakfast-25077

    06/20/2022, 12:51 PM
    Hi, i am excuting datahub 0.8.38 with the quick start, this version contains many problems . So how can i downgrade to 0.8.32 please ??
    d
    b
    • 3
    • 4
  • c

    curved-crayon-1929

    06/20/2022, 4:27 PM
    Hi All i tried to run "datahub docker quickstart " however it is failing with below issue any suggestions ?
    i
    • 2
    • 3
  • g

    gentle-camera-33498

    06/20/2022, 5:12 PM
    Hello! I'm having problems with ingestion on K8S deployment.
    Copy code
    '[2022-06-20 16:55:51,002] INFO     {datahub.cli.ingest_cli:99} - DataHub CLI version: 0.8.38\n'
               '[2022-06-20 16:55:54,017] INFO     {datahub.cli.ingest_cli:115} - Starting metadata ingestion\n'
               '[2022-06-20 16:55:54,017] INFO     {datahub.ingestion.source.sql.bigquery:367} - Populating lineage info via GCP audit logs\n'
               "[2022-06-20 16:55:54,021] ERROR    {datahub.ingestion.source.sql.bigquery:505} - lineage-gcp-logs => Error was ('Failed to load service "
               "account credentials from /tmp/tmp01jy7wi8', ValueError('Could not deserialize key data. The data may be in an incorrect format, it may "
               "be encrypted with an unsupported algorithm, or it may be an unsupported key type (e.g. EC curves with explicit parameters).', "
               "[_OpenSSLErrorWithText(code=503841036, lib=60, reason=524556, reason_text=b'error:1E08010C:DECODER routines::unsupported')]))\n"
    This is related to the creation of the encryption key?
    i
    s
    • 3
    • 6
  • q

    quick-megabyte-61846

    06/20/2022, 6:18 PM
    Hello, feedback from
    acryl-datahub[dbt]==0.8.38.1rc1
    While trying to ingest test that isn’t in
    test_name_to_assertion_map
    https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/dbt.py#L742 In UI Im getting this (screens below) While digging In code base I found this:
    Copy code
    elif kw_args.get("column_name"): <- in this case
    logic=node.compiled_sql
    if node.compiled_sql
    else node.raw_sql,
    In my observation
    logic
    kinda broking UI in assertion PS. Huge thanks @mammoth-bear-12532 for this integration
    m
    • 2
    • 5
  • b

    bitter-dusk-52400

    06/21/2022, 5:31 AM
    @big-carpet-38439, @better-orange-49102 @helpful-optician-78938 and datahub team, Could you please help me which part of the code is handling the policy.json file. I planning to add custom policy. Can anyone tell details regarding this and also we are using kubernetes engine. And also please answer this question, after the changed the custom policy.json to a different mount point. do we need to build the war file again from scratch? The mount point means the kubernetes volume?
  • f

    few-air-56117

    06/21/2022, 9:46 AM
    Hi folks, i think i found a problem. I have an user how is owner to a dataset which is in another dataset. dataset1.dataset2. If i have a table in dataset2 and the user is the owner of the dataset, it will not see the tables info, only if its the owner of the table . I think because dataset2 is in dataset1, the right are not propagates correct
    e
    • 2
    • 2
  • m

    millions-notebook-72121

    06/21/2022, 12:39 PM
    Hi - posting this here as well as not sure what's the best channel for this. Thanks for the help! https://datahubspace.slack.com/archives/C02FWNS2F08/p1655815134591969
    q
    • 2
    • 1
  • d

    delightful-sugar-63810

    06/21/2022, 2:50 PM
    Hey hey 👋🏻 tldr: We are having an issue with an environment variable that can be passed to the GMS,
    ELASTICSEARCH_INDEX_BUILDER_NUM_RETRIES
    . I cannot see any change in behaviour even if I change the value of it. Can it be because this space here? This environment variable lets you to adjust how much GMS will wait for the validation of reindexing tasks it triggers on ES on Datahub version upgrades. (See flow: 1 -> 2 -> 3) We are passing this variable as we pass others on helm, from extra env vars. I can validate that it is passed to the container by describing the deployment on kubernetes. Still, we cannot observe the effect of it on the pod. I have tried to investigate the issue and see if I'm doing something silly but it seems like I'm not 😬. There is one possible cause which is an extra space in the yaml file when reading the env var to the application.yml file in here, but I couldn't reproduce the same behaviour of using the default value with this space added in my local on a different project. There is nothing online for the behaviour of spring in case of such usage of its variable substitution with additional space. Can you also take a look? I know it is a very loose question but we are kind of stuck now 😄 Should I just open a pr? Am I doing something totally wrong? Can there be an spring bean initialization issue causing
    @Value
    to not work(I'm so far away from spring ecosystem)
    e
    • 2
    • 7
  • h

    helpful-painting-48754

    06/22/2022, 8:27 AM
    Hi everyone, I tried to ingest data from my database and I got this error. May I know what could be the cause of this error?
    Copy code
    'Profiling - Unable to get column cardinality'
    I tried to ignore the columns with this error but other columns would pop up with this error after ignoring.
    h
    • 2
    • 1
  • q

    quick-megabyte-61846

    06/22/2022, 8:48 AM
    Hello, while trying to ingest a business glossary
    source_ref
    source_url
    are not reflected in UI Source yaml: https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/bootstrap_data/business_glossary.yml Recipe yaml:
    Copy code
    source:
      type: datahub-business-glossary
      config:
        file: ./business_glossary/business_glossary.yml
    
    sink:
      type: datahub-rest
      config:
        server: <http://localhost:8080>
    version: 0.8.38 PS. I added another screen with desirable effect
    l
    b
    +3
    • 6
    • 8
  • a

    abundant-receptionist-6114

    06/22/2022, 11:59 AM
    Hi guys, do you have any plans for updating sqlalchemy in python package? it's critical for us https://github.com/datahub-project/datahub/issues/4809
    • 1
    • 1
  • c

    calm-dinner-63735

    06/22/2022, 12:18 PM
    Hi i am getting this error while deploying datahub in eks
  • c

    calm-dinner-63735

    06/22/2022, 12:18 PM
    Hi I am getting this error from datahub-gms pod 2022/06/22 100800 Connected to tcp://********.c6.kafka.eu-central-1.amazonaws.com:9092 2022/06/22 100800 Connected to tcp://********.eu-central-1.rds.amazonaws.com:3306 2022/06/22 100800 Connected to tcp://********.c6.kafka.eu-central-1.amazonaws.com:9092 2022/06/22 100800 Connected to tcp://********.********.c6.kafka.eu-central-1.amazonaws.com:9092 2022/06/22 100800 Received 200 from https://********.eu-central-1.es.amazonaws.com:443 2022/06/22 100801 Problem with request: Get “http“ http: no Host in request URL. Sleeping 1s 2022/06/22 100802 Problem with request: Get “http“ http: no Host in request URL. Sleeping 1s 2022/06/22 100803 Problem with request: Get “http“ http: no Host in request URL. Sleeping 1s 2022/06/22 100804 Problem with request: Get “http“ http: no Host in request URL. Sleeping 1s 2022/06/22 100805 Problem with request: Get “http“ http: no Host in request URL. Sleeping 1s
    e
    • 2
    • 2
  • w

    witty-butcher-82399

    06/22/2022, 12:28 PM
    Hi! I would like to share with you a possible issue we have found when combining stateful ingestion and transformers. This is how you can reproduce it and the evidences we have found along the way: • Set up an ingestion recipe with stateful ingestion enabled for the source + any transform (eg. add_dataset_ownership transform) • Run the recipe adding some deny pattern (just to force the soft deletion of some dataset). At this point we have checked the logs and the generated events look totally correct with two events for the affected dataset: one from the soft-removal with
    Status(removed=True)
    and the second event for the
    Ownerhip
    aspect (nothing about the status in the second upsert). • The dataset is wrongly shown in the UI as a valid dataset (not soft-deleted). We have also checked the backend and the dataset has
    Status(removed=False)
    . So if the issue is not during the ingestion, it must be the backend the one deciding to enable back the dataset for some reason. Looking for something supporting our assumption we have found this in the source code https://github.com/datahub-project/datahub/blob/8c8f1b987a0c9fc29f4005aa8d132ad2550f3f05/metadata-io/src/main/java/com/linkedin/metadata/entity/EntityService.java#L1097 I could be wrong but it looks like in some cases, the backend decides to set the removal flag to false. It’s like it decides to re-enable back the dataset because there are other aspects being updated. If that’s true and while it could make sense in some cases, it causes our simple use case to misbehave. WDYT? Could be that the root cause of the issue?
    d
    w
    +2
    • 5
    • 30
  • b

    brave-tomato-16287

    06/22/2022, 12:39 PM
    Hello! We've upgraded the version to 0.8.38 but still receive this error.
    Copy code
    [2022-06-22 12:28:32,603] INFO     {datahub.cli.ingest_cli:99} - DataHub CLI version: 0.8.38\n'
               '1 validation error for DBTConfig\n'
               'test_results_path\n'
               '  extra fields not permitted (type=value_error.extra)\n',
    Should we wait the next update?
    plus1 1
    g
    • 2
    • 1
1...343536...119Latest