https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • w

    wonderful-bear-5842

    03/10/2024, 9:14 AM
    Hi datahub team: I wonder what kind of environment do you use to build the software, especially gms and frontend? I try to use an Amazon Linux2 2023 docker image to establish my standard build environment. So far I was able to build gms (war.war), but had various challenges to build frontend. So I wonder if you can share some relevant info? Thanks!
    r
    • 2
    • 3
  • s

    stocky-plumber-3084

    03/11/2024, 2:16 AM
    Hi I tried to install Datahub using quickstart with version tag v.0.13.0 but it's using tag head instead: sudo datahub docker quickstart --version=v0.13.0 any idea why? it works fine with every other versions
    r
    • 2
    • 2
  • b

    billions-yacht-53533

    03/11/2024, 7:18 AM
    At the moment, I am using acryl-datahub with the datahub version: 0.12.0 and 0.13.0, with python 3.10. I've been able to create DataPlatforms, DataFlows and DataJobs entities without a problem, create their own relation between them and add lineages between the dataJobs. But what I would like to do now is to add containers, and associate the datajobs into containers. I could not find a way of doing it. Can you help me?
    r
    • 2
    • 7
  • s

    stocky-plumber-3084

    03/11/2024, 7:33 AM
    I install datahub using quickstart "sudo datahub docker quickstart" and got errors with mysql container, attached is the mysql error log, any idea why it crashes? I have Python 3.10.12 and acryl-datahub, version 0.13.0
    mysql_error.txt
    r
    b
    • 3
    • 2
  • w

    wonderful-rain-49084

    03/11/2024, 8:12 AM
    Hi guys! We are running Datahub 0.10.4 and see the following errors in GMS logs:
    Copy code
    Caused by: java.sql.BatchUpdateException: Batch entry 1 update metadata_aspect_v2 set metadata='{"paths":["/prod/trino/output/sk_test_v1_abc"]}', createdOn='2024-03-08 17:17:44.062+00', createdBy='urn:li:corpuser:datahub', createdFor=NULL, systemmetadata='{"registryVersion":"0.0.0.0-dev","lastRunId":"no-run-id-provided","runId":"trino-2024_03_08-17_17_31","registryName":"unknownRegistry","lastObserved":1709918264042}' where urn='urn:li:dataset:(urn:li:dataPlatform:trino,output.sk_test_v1_abc.fitnesse_route_delta,PROD)' and aspect='browsePaths' and version=0 was aborted: ERROR: could not serialize access due to concurrent update  Call getNextException to see other errors in the batch.
    ...
    Caused by: org.postgresql.util.PSQLException: ERROR: could not serialize access due to concurrent update
    	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2675)
    	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2365)
    Any clue what can be wrong with our setup and how to fix it?
    r
    • 2
    • 1
  • b

    bland-orange-13353

    03/11/2024, 11:29 AM
    This message was deleted.
    r
    • 2
    • 1
  • h

    handsome-fireman-90345

    03/11/2024, 1:24 PM
    Hello everyone, how can i delete the default user of datahub ?
    r
    • 2
    • 2
  • h

    handsome-fireman-90345

    03/11/2024, 1:24 PM
    ive deleted on the IAM but it appears on console and i still can login on it
    r
    • 2
    • 2
  • r

    rapid-queen-98305

    03/11/2024, 2:21 PM
    Hi Team, We are looking to extract metadata ingestion metrics once a day. I found that we can use listIngestionMetrics in graphql API. However, it is giving me all the ingestion runs for ingestion sources. Is there any way we can filter the output based on startTimeMs field. We are using datahub version : 0.12.0 query{ listIngestionSources(input{start0,count:100}) { start, count, total, ingestionSources{ urn, type, name, schedule{ timezone, interval }, platform{ urn, type, lastIngested, name }, config{ recipe, executorId, version }, executions{ start, count, total, executionRequests{ urn, id, result{ status, startTimeMs, durationMs, report, structuredReport{ type, serializedValue, contentType } } } } } } }
    r
    • 2
    • 1
  • l

    lemon-airplane-7413

    03/11/2024, 11:07 PM
    Hello everyone, can you help me connect great expectations with datahub, specifically the checkpoint file
    r
    • 2
    • 2
  • a

    adventurous-dawn-19232

    03/12/2024, 6:35 AM
    hello everyone i want some information regarding entity document how to enable in local datahub via cli any body share the cmd or way it will be useful
    r
    • 2
    • 2
  • a

    aloof-oil-31167

    03/12/2024, 10:59 AM
    Hey i’m using this function -
    get_urns_by_filter
    in order to get all datasets urns of a specific platform, the function default
    batch_size
    arg is value is 10k, whenever i want to take some more(e.g 15k) it’s failing with this error -
    Copy code
    {'code': 500, 'type': 'SERVER_ERROR', 'classification': 'DataFetchingException'}
    is there anyway to paging over those values? Ideally i want to query over almost 20k results this is the code -
    Copy code
    datahub_graph = DataHubGraph(DatahubClientConfig(server=DATAHUB_HOST,
                                                         token=os.getenv('DATAHUB_TOKEN')))
    datasets_urns = datahub_graph.get_urns_by_filter(platform="snowflake", batch_size=15000)
    πŸ‘€ 2
    r
    b
    • 3
    • 2
  • m

    mysterious-advantage-78411

    03/12/2024, 1:22 PM
    Hi All! is there a simple_add_dataset_ownership transformer but only for Tableau charts? How i can add owners on charts? it seams there is no such possibilities... is it true?
    r
    • 2
    • 4
  • i

    important-electrician-22243

    03/12/2024, 8:04 PM
    hey folks - I'm running Datahub v0.12.1 and starting to see issues with our okta integration. As far as we know, nothing has changed in our setup, but SSO has stopped working. We're seeing issues with errors like:
    Copy code
    [application-akka.actor.default-dispatcher-52] WARN  p.api.mvc.DefaultJWTCookieDataCodec - decode: cookie has invalid signature! message = JWT signature does not match locally computed signature. JWT validity cannot be asserted and should not be trusted.
    2024-03-12 19:11:54,317 [application-akka.actor.default-dispatcher-52] INFO  p.api.mvc.DefaultJWTCookieDataCodec - The JWT signature in the cookie does not match the locally computed signature with the server. This usually indicates the browser has a leftover cookie from another Play application, so clearing cookies may resolve this error message.
    2024-03-12 19:11:54,322 [application-akka.actor.default-dispatcher-52] ERROR controllers.SsoCallbackController - Caught exception while attempting to handle SSO callback! It's likely that SSO integration is mis-configured.
    r
    b
    • 3
    • 7
  • a

    adventurous-dawn-19232

    03/13/2024, 4:21 AM
    hello everyone i want some information regarding entity document how to enable in local datahub via cli any body share the cmd or way it will be useful
    r
    • 2
    • 2
  • c

    cuddly-wall-60618

    03/13/2024, 8:05 AM
    hello everyone, i want to update the properties on dataJob (Airflow) with specific custom properties add the key "ingest_server":"true" how to do that? I've tried to run this code but not worked thankyou
    Copy code
    dataset_urn = make_data_job_urn(orchestrator='airflow',flow_id="pal_example_datamart", job_id="params_eval")
    s = DataJobInfoClass(name="hello",type="COMMAND",customProperties={"ingest_server":"true"})
    
    emitter =DataHubRestEmitter(gms_server="<http://localhost:8082>",token='')
    emitter.emit_mcp(datajob_input_output_mcp)
    r
    • 2
    • 2
  • m

    mammoth-apple-56011

    03/13/2024, 11:58 AM
    Hello Datahub users! I have some problems with our installation of Datahub (v0.12.0) in OpenShift. The GMS pods are restarting about 4-10 times per week. And after some of the restarts they cannot go into a ready state - the readiness is "0/1". We have about 900 ingestions, so our Datahub are ingestiontioning data all the time. The GMS Deployment consists of 4 Pods with this limits:
    Copy code
    limits:
      cpu: '4'
      memory: 8Gi
    Also sometimes we have a 500 error in the Datahub UI. The logs in the datahub-frontend Pod saying this:
    Copy code
    Caused by: java.util.concurrent.TimeoutException: Read timeout to datahub-gms-datahub-gms/10.234.71.18:8080 after 60000 ms
    But there is no datahub-gms Pod with such address:
    Copy code
    10.238.46.45
    10.239.10.201
    10.239.26.249
    10.237.17.16
    As I see the 500 error is because of datahub-frontend are using old ip addresses for the GMS Pods. How can I configure it to use the new ip addresses?
    r
    • 2
    • 1
  • a

    able-carpenter-97384

    03/13/2024, 3:14 PM
    Hi - am using getLineageEntity query to fetch upstream and downstream lineage for a urn But I see only one level of downstream relationships- i tried setting , fulllineage properties , but did not work . On the UI, I do see the response has one more level of nested relationships. Thanks .
    r
    • 2
    • 2
  • w

    witty-butcher-82399

    03/13/2024, 3:50 PM
    Hi! Browsing data products results in the error below. Checking the logs I have found
    Copy code
    2024-03-13 16:32:452024-03-13 15:32:45,667 [ForkJoinPool.commonPool-worker-25] ERROR c.l.d.g.r.browse.BrowseResolver:60 - Failed to execute browse: entity type: DATA_PRODUCT, path: [], filters: null, start: 0, count: 10 null
    Which suggests that DataProductType is missing BrowseableEntityType We are running 0.12.0, however DataProductType is still missing the interface in master.
    r
    • 2
    • 2
  • f

    fast-area-764

    03/13/2024, 5:57 PM
    Hi I'm on Datahub v0.12.1 What options are there for stateful ingestion of a business glossary using the business glossary ingestion source
    BusinessGlossaryFileSource
    does not inherit from
    StatefulIngestionSourceBase
    . Without stateful ingestion every new ingestion run only adds terms to the glossary and never removes terms
    r
    • 2
    • 1
  • a

    agreeable-greece-66183

    03/13/2024, 10:04 PM
    Hello, is anyone here using Fargate serverless and Managed Kafka (MSK) from AWS? I have it working, and then tried to enable Encryption in Transit and having trouble getting the KafkaSetup pod to run. I keep getting a really generic message, i don't think it's ports - if anyone has any thoughts please feel free to drop them. I did add config to the helm chart for datahub for SSL, I'm mounting the keystore via EFS directly on the pod so everything should be there πŸ™‚ ERROR admin-client-network-thread exited (kafka.admin.BrokerApiVersionsCommand$AdminClient) java.lang.OutOfMemoryError: Java heap space at java.base/java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:61) at java.base/java.nio.ByteBuffer.allocate(ByteBuffer.java:348) at org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30) at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:102) at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:452) at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:402) at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:674) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:576) at org.apache.kafka.common.network.Selector.poll(Selector.java:481) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:560) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:280) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:251) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242) at kafka.admin.BrokerApiVersionsCommand$AdminClient.$anonfun$networkThread$1(BrokerApiVersionsCommand.scala:115) at kafka.admin.BrokerApiVersionsCommand$AdminClient$$Lambda$119/0x0000000100217840.run(Unknown Source) at java.base/java.lang.Thread.run(Thread.java:829)
    r
    • 2
    • 3
  • s

    some-zoo-21364

    03/14/2024, 11:11 AM
    Hi all. anyone familiar with this error from Redshift connector?
    Copy code
    extract-QUERY_SCAN => Error was {'S': 'ERROR', 'C': 'XX000', 'M': 'Result size exceeds LISTAGG limit', 'D': '\n  -----------------------------------------------\n  error:  Result size exceeds LISTAGG limit\n  code:      8001\n  context:   LISTAGG limit: 65535\n  query:     234890373[child_sequence:3]\n  location:  string_ops.cpp:138\n  process:   query1_1990_234890378 [pid=25559]\n  -----------------------------------------------\n', 'F': '../src/sys/xen_execute.cpp', 'L': '12414', 'R': 'pg_throw'}
    r
    • 2
    • 2
  • b

    blue-cartoon-10359

    03/14/2024, 1:19 PM
    Hey, my team is able to use the
    DataHubGraph
    in python to call
    Copy code
    dataset = graph.get_urns_by_filter(
                entity_types=["dataset"], 
                env="DEV", 
                platform="mssql", extraFilters=[
                {'field': 'domains', 'values': [domain_urn]}])]
    )
    however, when I run this I'm encountering this issue (see below - which is not related to any access tokens being expired etc.), anyone who might know the cause?
    Copy code
    line 172, in _send_restli_request
        raise OperationalError(
    datahub.configuration.common.OperationalError: ('Unable to get metadata from DataHub', {'message': '401 Client Error: Unauthorized for url: <http://___/api/graphql'}>)
    r
    • 2
    • 4
  • c

    creamy-machine-95935

    03/14/2024, 1:48 PM
    Is there a way to delete metadata assets filtering by domain?
    Copy code
    datahub delete --domain financial #Do not work
    r
    • 2
    • 1
  • f

    fresh-musician-87803

    03/14/2024, 2:04 PM
    Hi team , configuring OIDC Azure AD for SSO , in the documentation I see : ken_url βœ… string The token URL that acquires a token from Azure AD for authorizing requests. This source will only work with v1.0 endpoint.. is this an hard requirement wouldn't it work with the v2.0 endpoints
    r
    • 2
    • 4
  • r

    rapid-night-88791

    03/14/2024, 3:14 PM
    Hi, when running datahub docker quickstart --restore i get β€œ*Failed to run MySQL restore*”. How to troubleshoot this? Any tips? v 0.13.0
    r
    • 2
    • 1
  • f

    fierce-coat-26780

    03/14/2024, 3:42 PM
    Hey team, I'm using Datahub version
    v0.12.1
    and I can't filter the data in my GraphQL query and neither the documentation, google nor LLMs could really help me. I want to query all my datasets of type model in the platform dbt. And it should only show datasets that have a certain property
    materialization = ephemeral
    . It seems that I can't access the properties of the dataset with the
    orFilters
    . Details are in this thread πŸ˜‰
    r
    b
    • 3
    • 6
  • b

    bland-orange-13353

    03/14/2024, 4:09 PM
    This message was deleted.
    r
    • 2
    • 2
  • d

    damp-solstice-31196

    03/14/2024, 7:04 PM
    Hi all, I haven't heard back on this post in the deployment channel, so I thought I'd give it a try here. I'm deploying Datahub on ECS Fargate and have come across a strange issue where the datahub-gms container complains about not being able to bind to port 8080 even though there are no other containers occupying the port. You can find the details in the linked post. Thanks!
    r
    • 2
    • 2
  • h

    hundreds-arm-67649

    03/14/2024, 10:55 PM
    Hi all! Found a bug and created an Issue in GitHub. Could you, please, fix that? https://github.com/datahub-project/datahub/issues/10058
    r
    • 2
    • 2