https://datahubproject.io logo
Join SlackCommunities
Powered by
# troubleshoot
  • g

    great-breakfast-81245

    06/09/2022, 3:17 PM
    Hi - I'm trying to get DataHub running on my M1 Mac - I do have it running fine on an x86 Mac - tried quickstart and it fails. Looking through the posts and github it looks like there has been some good work to enable the M1 so I'm wondering if there are some additional steps that I'm missing?
    plus1 2
    d
    m
    +2
    • 5
    • 17
  • w

    wonderful-dream-38059

    06/09/2022, 3:21 PM
    Afternoon team - could someone give me some pointers on debugging something? I'm getting an error on the web UI of
    Failed to perform post authentication steps. Error message: Failed to provision user with urn urn:li:corpuser:alan.
    We get the same error message for all users (clearly referencing their user rather than mine) - and it's always on the callback url (
    /callback/oidc
    ). This deployment worked fine last week, and no configuration has changed to my knowledge. We're still testing data loading so not many users are using the deployment and we're loading all kinds of stuff and that likely has changed. As context, we use Google OAuth to do auth (and that also used to work fine). I've tried killing pods and forcing them to restart. Short of wiping all our loaded data, any other ideas on how to understand what's going on and how to get datahub out of this state?
    • 1
    • 2
  • e

    echoing-alligator-70530

    06/09/2022, 4:02 PM
    Hello everyone, I am trying to ingest dbt into datahub hub and It is successful but if there is a dbt model that points to a schema in snowflake but it doesn't exist no longer in snowflake, dbt seems to create that table manually, is there a way to prevent this?
    m
    • 2
    • 2
  • n

    nutritious-bird-77396

    06/09/2022, 6:58 PM
    Error when trying to run
    datahub-upgrade
    After setting all the envs in local as in here - https://github.com/datahub-project/datahub/blob/master/docker/datahub-upgrade/env/docker-without-neo4j.env and successfully building
    ./gradlew :datahub-upgrade:build
    executing
    java -jar datahub-upgrade/build/libs/datahub-upgrade.jar -u RestoreIndices
    throws Errors. Details of the Error in đŸ§” Any help on this would be great!
    plus1 1
    l
    e
    +2
    • 5
    • 48
  • h

    hallowed-machine-2603

    06/10/2022, 2:22 AM
    Hi guys! I have a problem with datahub connection. I can't access DataHub Login Page. I saw error message below I use Ubuntu 20.04 in VM (on cloud) python version: 3.8.10 datahub CLI version: 0.8.36 2022/06/09 091920 Problem with request: Get "http://datahub-gms:8080/health": dial tcp xxx.xx.x.x8080 connect: connection refused. Sleeping 1s 2022/06/09 091921 Problem with request: Get "http://datahub-gms:8080/health": dial tcp xxx.xx.x.x8080 connect: connection refused. Sleeping 1s 2022/06/09 091922 Received 200 from http://datahub-gms:8080/health [2022-06-09 091925,970] INFO {datahub_actions.cli.actions:68} - DataHub Actions version: unavailable (installed editable via git) [2022-06-09 091926,432] INFO {datahub_actions.cli.actions:98} - Action Pipeline with name 'ingestion_executor' is now running.
    b
    • 2
    • 4
  • a

    ancient-pillow-45716

    06/10/2022, 3:37 AM
    Hi,Team or Everone! After When I update DataHub Version to 0.8.36,I hava a trouble,DataHub is unavailable,I can login datahub ,but other function don't not work,Can you help me to resolve this problem?
    b
    • 2
    • 20
  • m

    modern-laptop-12942

    06/12/2022, 10:36 PM
    Hi teams! I try to get top_n_queries with source type snowflake-usage. But this query “select min(query_start_time) as min_time, max(query_start_time) as max_time from snowflake.account_usage.access_history” spends too much time. [2022-06-12 183107,518] INFO {datahub.ingestion.source.usage.snowflake_usage:438} - Checking usage date ranges Here is my recipe: top_n_queries: 5 start_time: ‘2022-06-9T00:00Z’ end_time: ‘2022-06-12T00:00Z’ database_pattern: allow:
    d
    • 2
    • 7
  • w

    wonderful-egg-79350

    06/13/2022, 2:28 AM
    Hi teams! I am curious about the function 'queries' is in charge of.
    b
    • 2
    • 4
  • n

    numerous-account-62719

    06/13/2022, 4:40 AM
    Hi Team, I have used kubernetes to deploy datahub. Can anyone please tell me exactly which pod stores the data that is ingested?? For example, elastic search, kafka, datahub-acryl-actions, datahub-frontend etc Can someone please give me location to the dataset inside the pod?
    r
    • 2
    • 90
  • p

    polite-orange-57255

    06/13/2022, 6:23 AM
    Hi team, we are not able to generate personal access token in new version of datahub . Even via root user.
    g
    m
    • 3
    • 7
  • q

    quick-megabyte-61846

    06/13/2022, 8:01 AM
    Hello while trying to ingest data with great-expectations checkpoint I’m getting error like this below:
    Copy code
    Unable to emit metadata to DataHub GMS
    My gms versions 0.2.2 deployed with helm chart on gke
    n
    • 2
    • 2
  • a

    acceptable-judge-21659

    06/13/2022, 9:41 AM
    Hi everyone, I can't build Datahub anymore since 0.8.36. Seems pip dependency resolver can't find a solution for dependencies in metadata ingestioninstallDev. I tried to add versions constraints to reduce runtime in the setup.py but i had other problems.
    Copy code
    bigquery_common = {
        # Google cloud logging library
        "google-cloud-logging>=3.0.0",
        "google-cloud-bigquery>=3.0.0",
        "more-itertools>=8.12.0",
    }
    Does anyone have an idea about it ?
    ✅ 1
    thank you 1
    d
    • 2
    • 11
  • m

    millions-waiter-49836

    06/13/2022, 5:08 PM
    Hi everyone, trying to upgrade datahub frontend version to v0.8.38 using DockerFile. When docker runs this line of code, it throws an error (see in the thread)
    Copy code
    RUN cd datahub-src && git fetch --all && \
        git checkout tags/v${DATAHUB_VERSION} -b v${DATAHUB_VERSION} && \
        ./gradlew :datahub-web-react:build -x test -x yarnTest -x yarnLint && \
        ./gradlew :datahub-frontend:dist -PenableEmber=${ENABLE_EMBER} -PuseSystemNode=${USE_SYSTEM_NODE} -x test -x yarnTest -x yarnLint
    b
    n
    • 3
    • 4
  • s

    shy-parrot-64120

    06/13/2022, 6:28 PM
    Hi all Thanks for outstanding 0.8.38 release after migration 0.8.36 -> 0.8.38 faced with anonymous login when
    AUTH_JAAS_ENABLED = false
    (on frontend container) Application UI is redirecting to login page, however is shouldn’t Can some one point me what went wrong?
    b
    b
    • 3
    • 12
  • n

    nutritious-bird-77396

    06/13/2022, 7:29 PM
    @big-carpet-38439 Facing issues in Actions container due to the schema change as part of this PR - https://github.com/datahub-project/datahub/commit/1a31f7888adaef954a066d62c3aa7b21ac7be7ed Error logs in thread đŸ§”
    b
    • 2
    • 25
  • e

    echoing-pillow-41000

    06/13/2022, 10:09 PM
    I just setup DataHub in our OpenShift cluster and tried to create 2 Ingestion Sources (MongoDB and PostGresQL) and both of them error out here: "ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /config (Caused by " "NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2d5efda220>: Failed to establish a new connection: [Errno 111] " "Connection refused'))\n" '[2022-06-13 220247,811] INFO {datahub.entrypoints:176} - DataHub CLI version: 0.8.38 at ' '/tmp/datahub/ingest/venv-0cd8b528-5f0d-4489-a7b5-c91393ca674a/lib/python3.9/site-packages/datahub/__init__.py\n' '[2022-06-13 220247,811] INFO {datahub.entrypoints:179} - Python version: 3.9.9 (main, Dec 21 2021, 100334) \n' '[GCC 10.2.1 20210110] at /tmp/datahub/ingest/venv-0cd8b528-5f0d-4489-a7b5-c91393ca674a/bin/python3 on ' 'Linux-4.18.0-305.25.1.el8_4.x86_64-x86_64-with-glibc2.31\n' '[2022-06-13 220247,811] INFO {datahub.entrypoints:182} - GMS config {}\n', "2022-06-13 220248.526584 [exec_id=0cd8b528-5f0d-4489-a7b5-c91393ca674a] INFO: Failed to execute 'datahub ingest'", '2022-06-13 220248.527164 [exec_id=0cd8b528-5f0d-4489-a7b5-c91393ca674a] INFO: Caught exception EXECUTING ' 'task_id=0cd8b528-5f0d-4489-a7b5-c91393ca674a, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n' ' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n' ' self.event_loop.run_until_complete(task_future)\n' ' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 81, in run_until_complete\n' ' return f.result()\n' ' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n' ' raise self._exception\n' ' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n' ' result = coro.send(None)\n' ' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n' ' raise TaskError("Failed to execute \'datahub ingest\'")\n' "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]} Execution finished with errors. I don't see any locahost:8080 anywhere in my original values.yaml Helm chart config. I configured both ingestion sources from UI. Any thoughts on what is going on? BTW, I am using a proxy and so I had to set PIP_PROXY on acryl-datahub-actions pod.
    ✅ 1
    b
    • 2
    • 28
  • f

    flaky-lawyer-33693

    06/13/2022, 11:46 PM
    Hi, I'm starting with datahub (local virtualenv with
    datahub docker quickstart
    command), everything looks ok but trying to ingest from a Postgresql instance I always get this message:
    Copy code
    '[2022-06-13 23:33:03,897] INFO     {datahub.cli.ingest_cli:99} - DataHub CLI version: 0.8.38\n'
               '[2022-06-13 23:33:04,025] ERROR    {datahub.entrypoints:165} - Unable to connect to <http://datahub-gms:8080/api/gms/config> with '
               'status_code: 404. Please check your configuration and make sure you are talking to the DataHub GMS (usually <datahub-gms-host>:8080) or '
               'Frontend GMS API (usually <frontend>:9002/api/gms).\n'
               '[2022-06-13 23:33:04,026] INFO     {datahub.entrypoints:176} - DataHub CLI version: 0.8.38 at '
               '/tmp/datahub/ingest/venv-9d7eb776-4a0a-4901-a766-a01e0d7d6737/lib/python3.9/site-packages/datahub/__init__.py\n'
               '[2022-06-13 23:33:04,026] INFO     {datahub.entrypoints:179} - Python version: 3.9.9 (main, Dec 21 2021, 10:03:34) \n'
    It seems that the datahub-gms is up but the /api context is not deployed, but looking at the container the jetty is up with the datahub-gms war deployed, following is the
    top
    entry:
    Copy code
    19     1 datahub  S    2731m  17%   3   0% java -Xms1g -Xmx1g -jar /jetty-runner.jar --jar jetty-util.jar --jar jetty-jmx.jar --config /datahub/datahub-gms/scripts/jetty.xml /datahub/datahub-gms/bin/war.war
    Can you please point me to any direction to get this working?
    h
    • 2
    • 2
  • h

    helpful-painting-48754

    06/14/2022, 7:28 AM
    Hi all, I tried to enable data profiling while ingesting from TiDB. I got an error saying that
    Copy code
    ['Profiling exception 5:00:00 is of type timedelta which cannot serialized.'],\n"
    Is this due to the data type not being compatible?
    plus1 1
    b
    • 2
    • 4
  • p

    plain-napkin-77279

    06/14/2022, 10:16 AM
    Hello team, I am using the version 0.8.36 of Datahub, and i am having a problem with analytics .... I had this error after a fresh installation, and even after ingesting my metadata to Datahub. It started form the begging, and this are my gms logs, any suggesting please.....
    b
    • 2
    • 15
  • s

    swift-breakfast-25077

    06/14/2022, 11:37 AM
    Hi everyone, i have installed datahub with quick start guide, i want to configure google oidc authentication, for this i added configurations in docker-compose-without-neo4j.quickstart.yml , when i go to localhost:9200 i have the list of emails, however when i choose the email to authenticate it lead me to the below error :
    b
    • 2
    • 1
  • g

    gentle-camera-33498

    06/14/2022, 8:45 PM
    Hello, I'm having problems with GMS service. Can someone help me?
    b
    • 2
    • 6
  • s

    shy-ability-95880

    06/15/2022, 7:58 AM
    Hi, Sorry I'm new to datahub and I'm trying to connect superset but I am getting this error and I'm not sure how to proceed from here.
    Copy code
    "SSLError: HTTPSConnectionPool(host='superset.dev.baseline.aliyun.sdecloud.tech', port=443): Max retries exceeded with url: "
               "/api/v1/security/login (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self "
               "signed certificate (_ssl.c:1129)')))\n"
    h
    • 2
    • 1
  • q

    quaint-potato-16935

    06/15/2022, 8:07 AM
    Hello , I am starting with datahub with below command. python3 -m datahub docker quickstart But it is failing with below error . Can anybody help me ??
    d
    • 2
    • 4
  • b

    bumpy-activity-74405

    06/15/2022, 10:00 AM
    Cautionary tale with case sensitivity of urns in mysql. GMS allows you to ingest multiple datajob urns that are the same (just different case), for example,
    urn:li:dataJob:(urn:li:dataFlow:(aaa,bbb,PROD),SomeName)
    and
    urn:li:dataJob:(urn:li:dataFlow:(aaa,bbb,PROD),SOMENAME)
    will be stored as separate rows in mysql. But when you try looking at some table in the UI that has this datajob in the downstream lineage you’re going to get an error (see thread)
    b
    b
    • 3
    • 3
  • b

    busy-dusk-4970

    06/15/2022, 10:13 AM
    I'm having issues trying to build locally on an M1 mac any ideas on how to resolve this? 🙏
    b
    • 2
    • 7
  • q

    quick-pizza-8906

    06/15/2022, 1:49 PM
    Hello everybody, I struggle with trying to retrieve particular information from Datahub GraphQL - I want to get list of platform instances given a particular platform - is it something possible and I am missing something? Also is it possible to filter list of datasets by their platform instance? If both is impossible - would it be a feature which would be merged if implemented - even if it required changes to ES/neo4j indexes?
    b
    • 2
    • 9
  • c

    chilly-knife-8692

    06/15/2022, 4:17 PM
    Hi there, I have a question about the documentation. More specifically, I am trying to follow up on removing the dependency of Neo4j (because apparently there is no image for the new M1 Macbooks available). In the documentation of the deployment of DataHub on Kubernetes, it is stated: "The dependencies must be deployed before deploying Datahub. We created a separate chart for deploying the dependencies with example configuration. They could also be deployed separately on-prem or leveraged as managed services. To remove your dependency on Neo4j, set enabled to false in the values.yaml for prerequisites. Then, override the
    graph_service_impl
    field in the values.yaml of datahub instead of
    neo4j
    ." I was wondering what exactly I have to do to "override `graph_service_impl`" in the values.yaml file of datahub. Could you help me out here? Thank you so much 😇
    b
    • 2
    • 2
  • g

    gentle-camera-33498

    06/15/2022, 5:47 PM
    Hello Guys, I'm having problems with MAE consumer. Can someone help me? Short details:
    Copy code
    ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer - Consumer exception
    java.lang.IllegalStateException: This error handler cannot process 'SerializationException's directly; please consider configuring an 'ErrorHandlingDeserializer' in the value and/or key deserializer
    • 1
    • 2
  • e

    echoing-alligator-70530

    06/15/2022, 6:17 PM
    Hi there, I was wondering how the platform instance config works with dbt as it doesn't behave the same way snowflake and mysql does. I ingested dbt data with the config into datahub but I do not see the change.
    b
    g
    • 3
    • 4
  • w

    wide-whale-11635

    06/16/2022, 4:56 AM
    Hello, I have integrated my Datahub instance with Looker. All the dashboards and charts of Looker are now available on Datahub as expected. On checking the lineage of charts in the Datahub demo project, I can see the main source icon for the charts. For instance, this link shows how the source is Snowflake, along with its icon. But in my implementation , I can only see the source dataset name , but not the actual source name. Any help on how to troubleshoot this? On further debugging , I see that in the demo project, the source dataset is registered as table. In my implementation, the dataset is registered as a view. Can this be a reason? Can anyone help me in achieveing the same as the demo?
    h
    • 2
    • 5
1...333435...119Latest