https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • h

    hallowed-shampoo-52722

    12/20/2022, 4:40 PM
    Hi Team. I am new to the datahub (Literally 1 day old).. I really like to explore new stuff. I am starting with self-managed datahub instance. I have followed steps on the documentation to pull the datahub docker image. But the process got stuck at pulling the images from yesterday! Can someone help me move forward from this issue!!
    b
    a
    • 3
    • 4
  • b

    bland-orange-13353

    12/20/2022, 4:40 PM
    Acryl Data delivers an easy to consume DataHub platform for the enterprise - sign up here: https://www.acryldata.io/sign-up
  • n

    nutritious-bird-77396

    12/20/2022, 9:53 PM
    @brainy-tent-14503 As discussed this morning in the office hours.. I am using
    v0.9.3
    version of datahub-gms and
    v0.9.3
    datahub-frontend and
    0.0.8
    version of datahub actions. I see the below message when i hit the datahub frontend endpoint -
    Failed to perform post authentication steps. Error message: com.linkedin.r2.RemoteInvocationException: Failed to get response from server for URI <http://datahub-gms-service>.<team>.svc.cluster.local:8080/aspects
    When hitting gms health endpoint returns 200. Also, datahub frontend /admin endpoint returns
    GOOD
    b
    • 2
    • 58
  • a

    average-dinner-25106

    12/21/2022, 2:32 AM
    Hello. Our company are about to deploy datahub for managing metdata of our databases. Since we are pretty strict about secuity, what we want is to prevent staffs who are just readers from ingesting databases by him/herself. In short, those who have no authority about database managing should not "create new source" in the "manage ingestion". How can I do this?
    ✅ 1
    b
    w
    a
    • 4
    • 10
  • f

    full-kite-21373

    12/21/2022, 9:54 AM
    Hi @square-activity-64562 .. While accessing datahub portal we are getting this error.. any help here will be really appreciated... "Validation error (FieldUndefined@[me/CorpUser/settings/views]):Field views in type 'CorpUserSettings' is undefined (code undefined)"
    s
    • 2
    • 10
  • f

    full-kite-21373

    12/21/2022, 9:55 AM
    @square-activity-64562
  • s

    square-solstice-69079

    12/21/2022, 11:39 AM
    Hello, trying to upgrade to 0.9.4. Usually just upgrade with these commands. Running on single ec2 instance. The last command is to set the variables for AD login(OIDC). Getting an error this time in the picture.
    Copy code
    datahub docker quickstart
    and
    docker-compose -p datahub -f docker-compose.yml -f docker-compose.yml up -d datahub-frontend-react
    g
    • 2
    • 2
  • s

    square-solstice-69079

    12/21/2022, 11:41 AM
    image.png
  • g

    gentle-portugal-21014

    12/21/2022, 11:54 AM
    Hello, I noticed that when creating a user having an accented character in his name ("Š" in this particular case, but it's probably a general issue), the accented character isn't stored / displayed correctly. Anybody having an idea what might be wrong / why is this happening? Accented characters seem to be working correctly in other places (e.g. in descriptions).
    e
    • 2
    • 3
  • d

    damp-greece-27806

    12/21/2022, 5:46 PM
    Hello again, we bumped from 0.9.3 to 0.9.4 as there were some bug fixes. Previously we weren’t getting any logs from datahub-frontend, but that’s resolved in 0.9.4. Now that we can see the log output, we see that the OIDC process is wanting a
    code_verifier
    parameter that we’re not passing. We’re not seeing how we can pass that. One thing to note/check is that datahub is using pac4j 4.5.7 (https://github.com/datahub-project/datahub/blob/master/build.gradle#L156) and pac4j recently released 5.1.1 with notes about fixing PKCE OIDC flow issues (https://www.pac4j.org/docs/release-notes.html), in case this problem is from an underlying source. Any help you all can provide would be appreciated, our datahub cluster has been down for 2 weeks now because of this
    a
    b
    • 3
    • 13
  • n

    nutritious-bird-77396

    12/21/2022, 6:26 PM
    @brainy-tent-14503 Apologies to get back on yesterday's issue again. Last evening I was able to login to Datahub
    /login
    using admin creds.. But, now its throwing
    Caused by: java.lang.RuntimeException: Failed to generate session token for user
    error Error stack in 🧵
    a
    o
    +2
    • 5
    • 15
  • b

    bland-balloon-48379

    12/21/2022, 7:49 PM
    Hey everyone! So somewhat recently my team upgraded to datahub v0.9.3, but we've started seeing the following issues: • Attempting to view the lineage tab on a dataset results in a 503 error. However, when clicking "Visualize Lineage" the lineage visualization loads just fine. Maybe because these pages use two different graphql queries to pull data? • There are virtually no logs from the gms containers. I can see the initial startup logs when the container comes online, but then there are no logs until the lineage tab is loaded a few times and the following error occurs:
    java.lang.OutOfMemoryError: Java heap space
    . When this happens the entire application freezes until those containers are manually restarted. I've tried tripling the RAM for these containers and saw no change. • When I create a new policy it says it was successfully created, but the new policy does not appear in the list nor do its permissions take effect. When I check the MySQL database though, I can see the rows for the new policy which was created. I'm unsure if there are other issues, this just what we've observed over the past few days. I downgraded one of our instances to v0.9.2 and all of these issues went away. Are there any thoughts on what may have changed between v0.9.2 and v0.9.3 when could have caused these issues? Also, I saw that v0.9.4 just recently came out. Is there reason to believe that some of these issues may be resolved by upgrading to that version?
    👀 1
    a
    • 2
    • 12
  • b

    bitter-waitress-17567

    12/22/2022, 5:44 AM
    Hi everyone. I am working on amplitude integration with Datahub and building web react locally. During build I am gegtting below error
    ✅ 1
  • b

    bitter-waitress-17567

    12/22/2022, 5:44 AM
    Copy code
    > Task :datahub-web-react:yarnBuild FAILED
    
    FAILURE: Build failed with an exception.
    
    * What went wrong:
    Execution failed for task ':datahub-web-react:yarnBuild'.
    > Process 'command 'yarn'' finished with non-zero exit value 1
    b
    • 2
    • 1
  • b

    bitter-waitress-17567

    12/22/2022, 5:44 AM
    Does anyone know the reason and fix for above problem
    👀 1
    a
    • 2
    • 2
  • m

    microscopic-mechanic-13766

    12/22/2022, 8:48 AM
    Morning, I am trying out the new version 0.9.4 for both gms and front but I keep getting errors. (Note that I am using the same configuration that I used for previous versions and it worked fine). The gms error appear while trying to start such service, but as it can't log with Kafka, it keeps failing. The front error apppear while trying to log in with an OIDC user.
    Front error.txtGMS logs.txt
    o
    • 2
    • 1
  • t

    thankful-fireman-70616

    12/22/2022, 11:12 AM
    Hi, I'm doing a fresh install first time in my local machine and I'm getting below two error while doing: datahub docker quickstart • Kafka-setup is still running • datahub-gms is running but not healthy How shall I debug this ? Can any expert help here ?
    ✅ 2
    👀 1
    d
    p
    a
    • 4
    • 11
  • s

    steep-midnight-37232

    12/22/2022, 1:57 PM
    Hi there, I'm facing an error with the user roles. I'm in version v.0.9.3 and we have SSO authentication. So, the problem is that if I change the role to a new user (by default i got "No Role") to "Editor" or "Reader" and then I refresh the page, the new role is gone. Again the user will appear with "No Role". Any suggestion?? Thanks
    👀 1
    a
    b
    e
    • 4
    • 36
  • b

    brainy-piano-85560

    12/22/2022, 3:09 PM
    Hi guys, Using DH 0.9.3.2. Just made a postgres ingestions. On the middle of the ingestion work, it fails with a weird error:
    Copy code
    '/usr/local/bin/run_ingest.sh: line 40:   467 Killed                  ( datahub ${debug_option} ingest run -c "${recipe_file}" '
               '${report_option} )\n',
               "2022-12-22 14:43:24.766921 [exec_id=b0e5bc27-9d53-46e6-b866-f388b73cac96] INFO: Failed to execute 'datahub ingest'",
               '2022-12-22 14:43:24.771317 [exec_id=b0e5bc27-9d53-46e6-b866-f388b73cac96] INFO: Caught exception EXECUTING '
               'task_id=b0e5bc27-9d53-46e6-b866-f388b73cac96, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
               '    task_event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
               '    return future.result()\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 227, in execute\n'
               '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
               "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
    Execution finished with errors.
    The metadata ingesting worked well. All tables and columns are ingested. Some of the profiling works aswell (the tables that the ingestion finished profiling shows row count etc). Any thoughts of how I can understand what the problem was? Thank you.
    d
    b
    • 3
    • 5
  • r

    rough-flag-51828

    12/23/2022, 9:07 AM
    Creating data source i got this error, maybe someone could help? Version: 0.9.0
    Copy code
    RUN_INGEST - {'errors': [],
     'exec_id': '5aa6b080-4493-450a-ad72-dc7a7ab9e9ee',
     'infos': ['2022-12-23 09:06:34.833273 [exec_id=5aa6b080-4493-450a-ad72-dc7a7ab9e9ee] INFO: Starting execution for task with name=RUN_INGEST',
               '2022-12-23 09:06:34.834367 [exec_id=5aa6b080-4493-450a-ad72-dc7a7ab9e9ee] INFO: Caught exception EXECUTING '
               'task_id=5aa6b080-4493-450a-ad72-dc7a7ab9e9ee, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
               '    task_event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
               '    return future.result()\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 73, in execute\n'
               '    SubProcessTaskUtil._write_recipe_to_file(exec_out_dir, file_name, recipe)\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_task_common.py", line 105, in '
               '_write_recipe_to_file\n'
               '    os.makedirs(dir_path, mode = 0o777, exist_ok = True)\n'
               '  File "/usr/local/lib/python3.10/os.py", line 215, in makedirs\n'
               '    makedirs(head, exist_ok=exist_ok)\n'
               '  File "/usr/local/lib/python3.10/os.py", line 225, in makedirs\n'
               '    mkdir(name, mode)\n'
               "PermissionError: [Errno 13] Permission denied: '/tmp/datahub/ingest'\n"]}
    Execution finished with errors.
    ✅ 1
    h
    o
    a
    • 4
    • 22
  • w

    witty-butcher-82399

    12/23/2022, 10:29 AM
    Hi! I’m checking schema resolution for kafka topics. According to the code,
    TopicName
    and
    TopicRecordName
    strategy are covered (the second in a sort of best-effort mode). However, when testing, we have found that we miss many schemas that follow the
    TopicRecordName
    strategy. We had a look to the code here https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/confluent_schema_registry.py#L72-L78
    Copy code
    # Subject name format when the schema registry subject name strategy is
            #  (a) TopicNameStrategy(default strategy): <topic name>-<key/value>
            #  (b) TopicRecordNameStrategy: <topic name>-<fully-qualified record name>-<key/value>
            for subject in self.known_schema_registry_subjects:
                if subject.startswith(topic) and subject.endswith(subject_key_suffix):
                    return subject
            return None
    and we think there is a mistake. The code assumes the
    -(key|value)
    suffix also exists for
    TopicRecordName
    strategy, however this is not true; we have plenty of topics with the following format
    <topic name>-<fully-qualified record name>
    for the
    TopicRecordName
    strategy. Is there any serializer supporting this
    <topic name>-<fully-qualified record name>-<key/value>
    format? For example, Confluent AVRO serializer doesn’t https://docs.confluent.io/platform/6.1.2/clients/confluent-kafka-python/html/_modules/confluent_kafka/schema_registry/avro.html As a mitigation, schema resolution may first try exact match for
    TopicNameStrategy
    and if not, do a search only with
    startswith
    (no
    endswith
    ). But I think that may bring many false positives and no way to differentiate key and value schemas. So from my understanding, fetching schemas only works for
    TopicName
    strategy. I’m raising the concern here in case I’m missing something or if someone else has an idea to fix
    TopicRecordName
    strategy. Thanks! 🧵
    ✅ 1
    👀 1
    b
    a
    • 3
    • 6
  • f

    fresh-rocket-98009

    12/23/2022, 11:18 AM
    Hi all! Is there any possibility to get new added entities so that i could check them and ping owners to document?
    b
    • 2
    • 1
  • f

    faint-actor-78390

    12/23/2022, 1:03 PM
    Hi all Trying to deploy on GKE with helm chart At the pre-requisite step all the pods elasticsearch stay in pending start (45' wait). Is it an issue ? elasticsearch-master-0 0/1 Pending 0 63m elasticsearch-master-1 0/1 Pending 0 63m elasticsearch-master-2 0/1 Pending 0 63m prerequisites-cp-schema-registry-7cc6786995-jg8mg 2/2 Running 0 63m Best Regards
    o
    • 2
    • 2
  • h

    handsome-football-66174

    12/23/2022, 4:02 PM
    Hi Everyone, Trying to install S3 source pip install 'acryl-datahub[s3]==0.9.3' --constraint "${constraint_url}" ( constraint URL is for Airflow https://raw.githubusercontent.com/apache/airflow/constraints-1.10.12/constraints-3.7.txt ) But unable to do so, due to pyspark dependency conflict, so Raised a bug https://github.com/datahub-project/datahub/issues/6852 Error:
    Copy code
    ERROR: Cannot install acryl-datahub[s3]==0.9.4 because these package versions have conflicting dependencies.
    
    The conflict is caused by:
        acryl-datahub[s3] 0.9.4 depends on pyspark==3.0.3; extra == "s3"
        The user requested (constraint) pyspark==3.2.1
    
    To fix this you could try to:
    1. loosen the range of package versions you've specified
    2. remove package versions to allow pip attempt to solve the dependency conflict
    b
    g
    +2
    • 5
    • 8
  • d

    damp-greece-27806

    12/23/2022, 6:03 PM
    hi, I’ve gotten past the sso issues (thanks!), but on to the next problem. Basically datahub-frontend is receiving 503s when it tries to talk to gms, and gms is unavailable because of this cryptic error:
    Copy code
    18:01:33 [pool-7-thread-1] ERROR c.d.authorization.DataHubAuthorizer - Failed to retrieve policy urns! Skipping updating policy cache until next refresh. start: 0, count: 30
    com.datahub.util.exception.ESQueryException: Search query failed:
    	at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:73)
    	at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.search(ESSearchDAO.java:100)
    	at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:97)
    	at com.linkedin.metadata.client.JavaEntityClient.search(JavaEntityClient.java:300)
    	at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:50)
    	at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:42)
    	at com.datahub.authorization.DataHubAuthorizer$PolicyRefreshRunnable.run(DataHubAuthorizer.java:223)
    	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    	at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
    	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    	at java.base/java.lang.Thread.run(Thread.java:829)
    Caused by: java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
    	at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:857)
    	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:259)
    	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
    	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
    	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
    	at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
    	at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)
    	at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:60)
    	... 12 common frames omitted
    Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
    	at org.apache.http.util.Asserts.check(Asserts.java:46)
    	at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
    	at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
    	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:255)
    	... 18 common frames omitted
    b
    b
    +3
    • 6
    • 51
  • b

    bitter-waitress-17567

    12/26/2022, 7:16 AM
    hi everyone. We are integrating amplitude with Datahub. Locally we have replaced the frontend web react container and getting below error while accessing the UI
    ✅ 1
    a
    • 2
    • 3
  • b

    bitter-waitress-17567

    12/26/2022, 7:16 AM
    Copy code
    Validation error (FieldUndefined@[appConfig/viewsConfig]) : Field 'viewsConfig' in type 'AppConfig' is undefined (code undefined)
  • b

    bitter-waitress-17567

    12/26/2022, 7:17 AM
    Below is analytics.ts file
  • b

    bitter-waitress-17567

    12/26/2022, 7:17 AM
    Copy code
    const config: any = {
        amplitude: {
            apiKey: 'xxxxxxxxxxxxxxxxxxxx',
        },
    };
    
    export default config;
    • 1
    • 1
  • b

    bitter-waitress-17567

    12/26/2022, 7:17 AM
    Can someone help me debugging this issue
1...666768...119Latest