https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • p

    powerful-shampoo-81990

    05/16/2023, 2:49 PM
    Hello Datahub team, Can we sync account-level users and groups Azure AD using SCIM?
    a
    • 2
    • 6
  • c

    cuddly-butcher-39945

    05/16/2023, 2:59 PM
    Hey everyone! I am having issues ingesting dbt cloud sources. dbt-cloud environment: single tenant environment and connecting from DH 10.1. Error in DataHub:
    Copy code
    ~~~~ Execution Summary - RUN_INGEST ~~~~
    Execution finished with errors.
    {'exec_id': '39e4de59-eb3e-4739-b024-c51ea5c76fbe',
     'infos': ['2023-05-16 01:00:00.066911 INFO: Starting execution for task with name=RUN_INGEST',
               '2023-05-16 01:00:00.067371 INFO: Caught exception EXECUTING task_id=39e4de59-eb3e-4739-b024-c51ea5c76fbe, name=RUN_INGEST, '
               'stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 112, in execute_task\n'
               '    task_event_loop = asyncio.new_event_loop()\n'
               '  File "/usr/local/lib/python3.10/asyncio/events.py", line 783, in new_event_loop\n'
               '    return get_event_loop_policy().new_event_loop()\n'
               '  File "/usr/local/lib/python3.10/asyncio/events.py", line 673, in new_event_loop\n'
               '    return self._loop_factory()\n'
               '  File "/usr/local/lib/python3.10/asyncio/unix_events.py", line 64, in __init__\n'
               '    super().__init__(selector)\n'
               '  File "/usr/local/lib/python3.10/asyncio/selector_events.py", line 53, in __init__\n'
               '    selector = selectors.DefaultSelector()\n'
               '  File "/usr/local/lib/python3.10/selectors.py", line 350, in __init__\n'
               'OSError: [Errno 24] Too many open files\n'],
     'errors': []}
    
    ~~~~ Ingestion Logs ~~~~
    My ingestion configuration...
    Copy code
    source:
        type: dbt-cloud
        config:
            max_threads: 1
            metadata_endpoint: '<https://my-metadata-cloud>-.com/graphql'
            project_id: '3'
            job_id: '82'
            target_platform: snowflake
            stateful_ingestion:
                enabled: true
            account_id: '9999'
            token: MYDBTToken
    Failed to configure the source (dbt-cloud): 1 validation error for DBTCloudConfig
    max_threads
      extra fields not permitted (type=value_error.extra)
    Whenever I don't specify max_threads the ingestion works, but I keep hitting a file handle leak. (Error shown above).
    Copy code
    I added "max_threads: 1"
    However, this config does not work either. @gray-shoe-75895 @big-carpet-38439, we looked at this last week for an on-prem dbt fix, (Essentially telling DataHub to run the dbt ingestion single-threaded), but this is not working for dbt cloud. Any help would be appreciated! Thanks
    βœ… 1
    a
    • 2
    • 2
  • g

    glamorous-easter-30119

    05/16/2023, 3:28 PM
    Hello folks, Morning to you all! Datahub Quickstart is failing at: "dependency failed to start: container broker is unhealthy" 11 out of 12, start just fine, broker stays unhealthy. Any help or quick fix is appreciated, while I go and troubleshoot this, TIA!
    a
    w
    • 3
    • 2
  • g

    gentle-camera-33498

    05/16/2023, 4:55 PM
    Hello everyone, Do you know what can cause GMS to process MAEs so slowly? I cleaned my Elasticsearch and ran the reindex job. There are 64994 MAE rows. More than 1 hour later I still can't see all metadata in the front end. Previus versions GMS could process this volume very fast. Deployment details: Environment: kubernetes Datahurb version: 0.10.2 GMS replicas: 1 Standalone consumers: False
    a
    b
    a
    • 4
    • 7
  • s

    steep-alligator-93593

    05/16/2023, 10:31 PM
    Hi everyone, my pod for datahub-gms is constantly getting full where i get an error in my kube pod saying
    Copy code
    The node was low on resource: ephemeral-storage. Container datahub-gms was using 8220940Ki, which exceeds its request of 0.
    Pod The node had condition: [DiskPressure].
    Any ideas on where gms stores it's data? And why this could be happening?
    b
    a
    f
    • 4
    • 37
  • b

    better-spoon-77762

    05/16/2023, 10:50 PM
    hello Did anyone try building using the datahub-frontend docker file https://github.com/datahub-project/datahub/blob/master/docker/datahub-frontend/Dockerfile its unclear where its getting the datahub-frontend.zip file
    g
    • 2
    • 1
  • i

    important-night-50346

    05/17/2023, 3:15 AM
    Hi. Does anyone else is having difficulties with
    searchAcrossEntities
    queries for entities count <10000? Below issue for long time already, it was present in 0.9.5 and it is there in 0.10.2. I ingested exactly 5718 entities into datahub running in quickstart mode and running very simple query, which emulates :
    Copy code
    query getAllEntities {
    	searchAcrossEntities(
    		input: {
    			start: 4000
    			count: 10
    			query: ""
    		}
    	) {
        start
        count
    		total
    		searchResults {
    			entity {
    				urn
    			}
    		}
    	}
    }
    It fails with
    Copy code
    {
      "servlet": "apiServlet",
      "message": "Service Unavailable",
      "url": "/api/graphql",
      "status": "503"
    }
    It does not look to be an issue with 10k entities limit in elasticsearch, more likely it is caused by some sort of timeout - it always run 30s and fails. I also tried to run a query with start=0 and count=5718 it is also failing in 30s, which makes me think it is a timeout issue. The very same
    searchAcrossEntities
    query is used in the UI and causing error there as well. Any thoughts or suggestions how to fix it?
  • p

    purple-balloon-66501

    05/17/2023, 10:01 AM
    Hi need some help. When i add glossary term it not be added in time when frontend say "OK" only after some time (1-2 hours) it appears in frontend, do not see any errors in any component, how to troubleshoot this?
    g
    • 2
    • 3
  • s

    stale-architect-93411

    05/17/2023, 12:58 PM
    Hi everyone, I'm not sure that I understand how the Spark integration works with databricks I followed the documentation, but the workflow fails with a
    NullPointerException
    . On #integration-databricks-datahub I found a specific jar for databricks: https://datahubspace.slack.com/archives/C033H1QJ28Y/p1646937756282179 but it s more than a year old.. With this Jar my workflow is in success, I find data on datahub, however the lineage is empty, there are no references to input/output data (more info in thread) Does anyone manage to have lineage with spark on databricks? πŸ˜• blob help
    d
    • 2
    • 5
  • f

    full-shoe-73099

    05/17/2023, 1:54 PM
    Hi all! Im trying send a request to graphql to search for the datasets and get the error 503 😞 It work for 3000 records. but I need more. { search(input: { type: DATASET, query: "hasOwners:True", start: 0, count: 7000 }) { start count total searchResults { entity { urn type ...on Dataset { ownership { owners { type owner { ...on CorpUser { username } } } } } } } } }
    i
    • 2
    • 2
  • q

    quiet-television-68466

    05/17/2023, 2:33 PM
    Sorry to bother, but wanted to check if this is known about? I think its a pretty major bug and lots of the actions framework is broken because of it: https://github.com/datahub-project/datahub/issues/8007
    g
    • 2
    • 1
  • s

    stale-traffic-76901

    05/17/2023, 7:38 PM
    Hey everyone, I’m having this problem, and the docker service is active. Has it happened to you before?, thanks for the support
    βœ… 1
    g
    • 2
    • 1
  • s

    stale-traffic-76901

    05/17/2023, 7:38 PM
    logs, empty
  • p

    powerful-planet-87080

    05/17/2023, 8:12 PM
    I am running into an issue ingesting from dbt-cloud. Any pointers to the issue?
    g
    g
    • 3
    • 11
  • p

    prehistoric-farmer-31305

    05/17/2023, 10:21 PM
    Hi! I am getting
    An unknown error occurred. (code 500)
    error message in the UI (datahub v0.10.2) after ingesting dbt (via acryl cli). I am able to see the datasets, but I cannot drill-down to get table definitions etc. Is something corrupted in ES or the mysql-db?
    g
    f
    +2
    • 5
    • 20
  • n

    numerous-account-62719

    05/18/2023, 5:13 AM
    Hi Team I have one doubt that I have deployed datahub, created all the users, on-boarded the data. In short I have used all the features. Now if I upgrade the version so will this data get lost or the data will be preserved??
    h
    a
    g
    • 4
    • 6
  • a

    astonishing-father-13229

    05/18/2023, 5:49 AM
    Hello everyone I'm facing datahub-frontend build issue , can someone help me ? datahub>gradlew%20datahub frontenddist cliMajorVersion=0.10.2 *Failed to load class "org.sl*f4j.impl.StaticLoggerBinder*"*
  • n

    nutritious-photographer-79168

    05/18/2023, 8:12 AM
    Hi all I'm new to Datahub and have been asked to deploy it for a POC on our onprem k8s cluster. So far so good, and i can access the http//&lt;ip&gt;9002 link on the public gateway, authenticate and browse around. I now need to place it behind a istio ingress gw with an SSL cert, but this seem to be a bit of a problem. (Could be me being stupid, never worked with istio.io before) I just wanted to ask if it is possible, is there anything i need to do especially to make it work? Any help would be appreciated. Thx
    g
    • 2
    • 1
  • t

    tall-butcher-30509

    05/18/2023, 8:31 AM
    Hi, can anyone help with calling the graphql endpoint from node.js? I can successfully run this query in the GraphiQL web interface:
    Copy code
    {query : searchAcrossEntities(input: {query: "<our custom property>=<value>"}) { searchResults { entity { ... on Dataset { urn } } } } }
    But when I try to send the same query via axios I get an error from datahub JSON:
    Copy code
    const requestData = {
          "query" : `searchAcrossEntities(input: {query: "<our custom property>=${value}"}) { searchResults { entity { ... on Dataset { urn } } } } `
        }
    Error:
    Copy code
    [{"message":"Invalid Syntax : offending token 'searchAcrossEntities' at line 1 column 1","locations":[{"line":1,"column":1}],"extensions":{"classification":"InvalidSyntax"}}]
    βœ… 1
    • 1
    • 1
  • b

    bland-gigabyte-28270

    05/18/2023, 9:46 AM
    Hi, I’m trying to ingest some metadata from Snowflake, however facing the following exceptions. Could someone help?
    Copy code
    datahub-gms 2023-05-18 09:42:42,216 [I/O dispatcher 1] ERROR c.l.m.s.e.update.BulkListener:44 - Failed to feed bulk request. Number of events: 6 Took time ms: -1 Message: failure in bulk execution:
    datahub-gms [1]: index [datahubstepstateindex_v2], type [_doc], id [urn%3Ali%3AdataHubStepState%3Aurn%3Ali%3Acorpuser%3Adatahub-search-results-filters], message [[datahubstepstateindex_v2/VMzXqnXeSnWqzq_cS6B4XA][[datahubstepstateindex_v
    2][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn%3Ali%3AdataHubStepState%3Aurn%3Ali%3Acorpuser%3Adatahub-search-results-filters]: document missing]]]
    datahub-gms [4]: index [datahubstepstateindex_v2], type [_doc], id [urn%3Ali%3AdataHubStepState%3Aurn%3Ali%3Acorpuser%3Adatahub-search-results-advanced-search], message [[datahubstepstateindex_v2/VMzXqnXeSnWqzq_cS6B4XA][[datahubstepstat
    eindex_v2][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn%3Ali%3AdataHubStepState%3Aurn%3Ali%3Acorpuser%3Adatahub-search-results-advanced-search]: document missing]]]
    Version: 0.10.2 Helmchart version: datahub-0.2.164 Note that I have to disable
    datahubUpgrade
    since it keeps checking health of
    datahub-gms
    and failing.
    βœ… 1
    f
    • 2
    • 9
  • m

    magnificent-honey-40185

    05/18/2023, 3:30 PM
    Hey All, Deployed DataHub today and tried to ingest Redshift data. But it gave an error. Not sure what the error is. The logs seem not be helpful. Would it be possible to help. I have attached the logs
    g
    • 2
    • 4
  • m

    magnificent-honey-40185

    05/18/2023, 3:31 PM
    Have obfuscated some numbers and redshift settings
    exec-urn_li_dataHubExecutionRequest_23a4f5a8-5750-43e1-aba4-358d9c84903d.log
  • b

    bland-gigabyte-28270

    05/19/2023, 1:12 AM
    Hi, I current got
    Permission Denied
    on one of the jobs (Snowflake), can someone help?
    Copy code
    No ~/.datahubenv file found, generating one for you...
    PermissionError: [Errno 13] Permission denied: '/.datahubenv'
    βœ… 1
    • 1
    • 1
  • b

    bland-gigabyte-28270

    05/19/2023, 2:05 AM
    Also, while doing
    Test Connection
    , using the Password as plain text works, but the built-in secret storage fails. Is this feature still supported?
    βœ… 1
    • 1
    • 1
  • f

    fierce-electrician-85924

    05/19/2023, 5:42 AM
    Hi team, We were facing some issues where datahub UI was not able to render entities even though they were present in our DB. We figured out this was because elastic search indices were corrupted due to some reason. We fixed it by running
    restoreIndices
    job under datahub-upgrade image. but we wanted to make sure that this issue gets detected earlier with the help of some metrics. Do datahub release any metric for such issue? I know datahub releases prometheus metrics, I read about it here. but do we have any wiki where description of useful metrics and how they can help is available ?
    plus1 1
    g
    • 2
    • 1
  • c

    clever-motherboard-6054

    05/19/2023, 8:25 AM
    I put this in office hours, but it probably belongs here: Hello you lovely people, We've been running Datahub for a bit, and I got into an error state today after upgrading Kubernetes (in which it's running). All I did was move the container from an old node version to a new one. It is "up" in running state, but unhealthy. While I could redeploy as we usually do, I'd like to understand how to recover from it in the future instead, to allow for self healing when the health check fails. It starts with: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080 Caused by: java.net.ConnectException: Connection refused at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777) at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at java.base/java.lang.Thread.run(Thread.java:829) 2023-05-19 082228,844 [R2 Nio Event Loop-1-2] WARN c.l.r.t.h.c.c.ChannelPoolLifecycle:139 - Failed to create channel, remote=localhost/127.0.0.1:8080 io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080 And continues to get connection refused a couple of times. I figured it was a timing problem with the prerequisites, but restarting mysql, kafka and zookeeper did nothing to fix it. I found a thread somewhere on Github to enable Datahub insights, which I have done but with no results. Any ideas what is causing this issue ? Thanks in advance, and have a wonderful weekend if it comes to that. πŸ™‚ Edit: Solved by @damp-insurance-99795,as upgrading to newest (0.24) fixed the issue for me!
    βœ… 1
    d
    • 2
    • 1
  • b

    bumpy-engineer-7375

    05/19/2023, 1:39 PM
    Hi there! Our stack is Databricks (Unity Catalog), dbt and Looker. Because the outcome tables from dbt are basically the same in Databricks, we thought of having lineage only between Databricks and Looker. However, I can't see the lineage, only if I use the dbt connector. So, a few questions about lineage: 1. Is there a way to have column level lineage between Unity Catalog and Looker? 2. Adding the dbt as the linking point between Databricks and Looker but not at the column level, if there is no solution for the first question, is there a way to have it on this one?
  • c

    chilly-boots-22585

    05/20/2023, 12:51 PM
    hello i am receiving this error for helm datahub and for pod when i run below command. NOTE: I have AWS EKS running, AWS Opensearch running, AWS RDS running and i have configured datahub values.yaml file accordingly.
    *command*: helm show values datahub/datahub-prerequisites > datahub-prerequisites-original.yaml
    Error: INSTALLATION FAILED: failed pre-install: timed out waiting for the condition when i check logs for datahub-elasticsearch-setup pod
    kubectl logs -f datahub-elasticsearch-setup-job-cv4qq
    2023/05/20 124548 Waiting for: https://datahub-es:dglk4904hzD_4@https//search datahub starburst es m2riu6mafdqbow3ca.eu west 1.es.amazonaws.com443 2023/05/20 124548 Problem with request: Get https://datahub-es:dglk4904hgh_4@https<//search-datahub-starburst-es-m2rih5h4w4ppfdqbow3ca.eu-west-1.es.amazonaws.com443&gt; dial tcp: lookup https on 10.100.0.1053 no such host. Sleeping 1s
  • c

    chilly-boots-22585

    05/20/2023, 12:52 PM
    please help me to resolve this issue. Also tell me if i am using AWS MYSQL, AWS Elasticsearch then do i need to install prerequisites ?
  • r

    rough-lamp-22858

    05/21/2023, 6:06 AM
    Hi Team, The datahub installation (version 10) created the classic load balancer, which is public faced. My values.yaml have the internal ALB values only. Can anybody can help to understand why this happening. Thank you
    plus1 1
1...969798...119Latest