https://datahubproject.io logo
Join Slack
Powered by
# all-things-deployment
  • f

    fast-baker-76018

    02/21/2024, 8:46 AM
    image.png
    r
    b
    • 3
    • 6
  • e

    echoing-state-56889

    02/21/2024, 9:18 AM
    Hello everyone, I'm currently working with v0.12.1 on-premises, and I'm seeking assistance in configuring a managed service for my persistence layer. Specifically, I'm interested in utilizing a managed MySQL database instead of the default setup. Could someone guide me through the process? Thank you!
    r
    • 2
    • 2
  • a

    agreeable-greece-66183

    02/22/2024, 2:36 PM
    Hi Everyone, I'm running Datahub v0.11 and have a working Datahub install with EKS and NodeGroups (Ec2). I'm trying to move the entire cluster to EKS Fargate (serverless). Kafka and Zookeeper need persistent storage, and I've set up and EFS file system for those two pods. The pre-requisites pods start and stay running, but Kafka crashes from time to time. When I run the datahub helms, I keep seeing this error in the logs and the kafka setup job keeps failing. Does anyone have any troubleshooting ideas?
    [2024-02-21 22:46:41,386] WARN [LegacyAdminClient clientId=admin-1] Bootstrap broker prerequisites-kafka:9092 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)
    Exception in thread "main" java.lang.RuntimeException: Request METADATA failed on brokers List(prerequisites-kafka:9092 (id: -1 rack: null))at kafka.admin.BrokerApiVersionsCommand$AdminClient.sendAnyNode(BrokerApiVersionsCommand.scala:156)
    at kafka.admin.BrokerApiVersionsCommand$AdminClient.findAllBrokers(BrokerApiVersionsCommand.scala:179)
    at kafka.admin.BrokerApiVersionsCommand$AdminClient.awaitBrokers(BrokerApiVersionsCommand.scala:171)
    at kafka.admin.BrokerApiVersionsCommand$.execute(BrokerApiVersionsCommand.scala:61)
    at kafka.admin.BrokerApiVersionsCommand$.main(BrokerApiVersionsCommand.scala:55)
    at kafka.admin.BrokerApiVersionsCommand.main(BrokerApiVersionsCommand.scala)
    r
    i
    • 3
    • 5
  • c

    creamy-machine-95935

    02/22/2024, 8:08 PM
    Everytime you upgrade tha datahub version, do we have to run the job datahub-restore-indices-adhoc? please šŸš€
    thank you 1
    r
    g
    • 3
    • 3
  • c

    colossal-father-31148

    02/23/2024, 2:04 AM
    Hello team, we're considering to use Cloud Spanner as main DB because of globally distribution, scale-out features. Although DataHub supports postgreSQL DB as main DB, Cloud Spanner only supports subset of postgreSQL interface, so we are curious that we can use Cloud Spanner as main DB. Could we choice Cloud Spanner as main DB? Or if you know some cases, would you introduce it to us?
    b
    • 2
    • 2
  • m

    mysterious-doctor-34270

    02/23/2024, 3:16 AM
    Hi team, I followed this guide https://datahubproject.io/docs/deploy/kubernetes/ and managed to get it working. I need to do the same in an offline environment. Is there a guide for this?
    b
    • 2
    • 1
  • w

    witty-motorcycle-52108

    02/23/2024, 11:53 PM
    are the acryldata docker hub repos no longer recommended for use, and we should use the linkedin ones instead?
    p
    r
    • 3
    • 10
  • l

    lemon-receptionist-90470

    02/26/2024, 4:32 PM
    Hey šŸ‘‹ I'm currently setting up DataHub using the
    0.3.27
    chart version and using Elasticsearch
    8.5.1
    chart version to be compatible with EKS
    1.28.0
    Datahub Setup - authentication • Set up a user and password for Elasticsearch. • In the
    datahub
    values, I've configured the
    elasticsearch
    settings to use SSL, skip SSL checks (for now), and provided the auth credentials:
    Copy code
    global:
      strict_mode: true
      graph_service_impl: elasticsearch
      datahub_analytics_enabled: true
      datahub_standalone_consumers_enabled: false
    
      elasticsearch:
        useSSL: "true"
        skipcheck: "true"
        insecure: "true" # This should be false in values-production.yaml
        auth:
          username: "elastic"
          password:
            secretRef: elasticsearch-master-credentials
            secretKey: password
    Issue • The
    elasticsearchSetupJob
    runs as expected, but the
    gms
    service is encountering errors during startup. • The error logs indicate an issue with SSL certification path and a connection refused error:
    Copy code
    Caused by: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target                                                │
    │ See <https://opensearch.org/docs/latest/clients/java-rest-high-level/> for troubleshooting.                                                                                                                                                         │
    │     at org.opensearch.client.RestClient.extractAndWrapCause(RestClient.java:947)                                                                                                                                                                  │
    │     at org.opensearch.client.RestClient.performRequest(RestClient.java:332)                                                                                                                                                                       │
    │     at org.opensearch.client.RestClient.performRequest(RestClient.java:320)                                                                                                                                                                       │
    │     at org.opensearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1918)                                                                                                                                            │
    │     at org.opensearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1884)                                                                                                                                                    │
    │     at org.opensearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1852)                                                                                                                                      │
    │     at org.opensearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1095)                                                                                                                                                            │
    │     at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:186)                                                                                                                                       │
    │     ... 16 common frames omitted
    Does anyone have insights on resolving this? I also checked other similar threads here, but they did not help. Thanks!
    r
    • 2
    • 3
  • b

    brainy-butcher-66683

    02/26/2024, 4:52 PM
    ~Hi team, I have two Datahub instances the
    prod
    has more datasets, charts, dashboards and domains than the
    dev
    instance however the
    dev
    instance has more data in the metadata database ~24Gb while the prod has ~ 7GB. I believe this is due to the higher frequency of change made in the
    dev
    instance (so the change log information is more).~ Is there a way to reduce the retention time of the change log within the metadata database
    r
    • 2
    • 1
  • l

    lemon-dusk-93888

    02/26/2024, 6:27 PM
    Hi Team We are trying to get datahub up and running with OIDC SSO on a kubernetes cluster. We followed the documentation regarding Azure AuthN configuration for datahub-frontend. Additional changes we made: • We configured datahub-frontend to use an Ingress controller (avi-lb) with cert-manager annotation to get a TLS certificate issued from our private ACME/PKI instance. • We changed the datahub-frontend.service.type value to ClusterIP as we are using an Ingress controller. • We added some extraEnvs to datahub-frontend to use a proxy where we use a TLS interception mechanism. • We added all the AUTH_OIDC_ settings to the extraEnvs of the datahub-frontend section in the helm chart. Our problem is that SSO is not working and the datahub-frontend pod throws following errors:
    Copy code
    2024-02-26 15:27:50,722 [application-akka.actor.default-dispatcher-7] ERROR controllers.SsoCallbackController - Caught exception while attempting to handle SSO callback! It's likely that SSO integration is mis-configured.
    java.util.concurrent.CompletionException: org.pac4j.core.exception.TechnicalException: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    Do we have to provide our private CA/PKI certificates to the JVM or datahub-frontend certificate trust store? Any advice or help is very much appreciated.
    r
    • 2
    • 1
  • r

    rich-barista-93413

    02/27/2024, 6:53 PM
    Hi Drew.. can you provide more details? We just see a screenshot.
  • r

    rapid-ocean-29344

    02/28/2024, 11:58 PM
    Hello, I am trying to deploy in AWS, We are using MSK (Kafka). Is there any recommended environment configs for
    datahub-actions
    for IAM Role based access?
    r
    m
    b
    • 4
    • 8
  • s

    stocky-garage-83010

    02/29/2024, 10:33 AM
    Hi, is it possible to add caCert via helm parameter or in yaml file ? we have to include our rootca cause sso azure aad, hope anyone could give me a hint
    r
    l
    • 3
    • 2
  • n

    nice-piano-41179

    03/03/2024, 9:02 AM
    Hi Trying to following the guide of deploy with Kubernetes I’m using docker desktop k8s on macOS (Intel chip) The ā€œprerequisitesā€ installation is ok, The problem is that the ā€œhelm install datahub datahub/datahubā€ is hang up and not respondimg. resources of the local kube is 2 cpu / 12 GB Tx
    r
    • 2
    • 4
  • a

    agreeable-address-71270

    03/04/2024, 5:31 AM
    Hello folks, I am trying to find some answers on how the ordering of events in a Kafka topic partition impact the Datahub application? What is the suggested partition strategy to use? Can events be out of order of the topics? I am hosting the Kafka cluster on AWS MSK, and would appreciate advice on how to configure the topic partitions. Thanks in advance!
    l
    • 2
    • 2
  • p

    proud-psychiatrist-26286

    03/05/2024, 4:15 PM
    I am seeing a
    CrashBackOffLoop
    all of a sudden on an M2 MacBook for the
    prerequisites-mysql
    pod and image it is pulling, when running
    helm install prerequisites datahub/datahub-prerequisites
    . After inspecting the images, it is pulling down the incorrect image architecture (amd64) for mysql, all other prerequisites are pulling the correct one (arm64). Opened a question/issue here: https://github.com/acryldata/datahub-helm/issues/435 just trying to track down why this started all of a sudden. Was having no issues deploying these helm charts up until yesterday. Any help would be greatly appreciated.
    r
    • 2
    • 7
  • a

    astonishing-keyboard-8419

    03/05/2024, 4:49 PM
    Is it possible to use a kafka registry with only truststore and no keystore? I deploy with helm and try with registry_ssl env var and also the springOverride configuration but with no luck. sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    l
    r
    • 3
    • 2
  • t

    thankful-teacher-72271

    03/05/2024, 9:36 PM
    Hello, everyone! I use DataHub on Azure AKS and currently ingest metadata using cronjobs. However, this process requires many steps: creating the recipe, changing helm chart values and changing the k8s configmap. What pattern do you usually use for metadata ingestion in k8s?
    l
    r
    • 3
    • 2
  • b

    brainy-musician-50192

    03/06/2024, 2:03 PM
    How stable is the new release, 0.13.0, and is the chart updated to support it?
    b
    l
    +3
    • 6
    • 29
  • r

    rich-pager-68736

    03/06/2024, 3:31 PM
    Hi there, is there an ENV variable for frontend to disable the incident feature in v0.13.0?
    • 1
    • 1
  • c

    cuddly-library-82577

    03/07/2024, 1:44 PM
    Hello, good morning, If we are not setting
    graph_service_impl
    anywhere, what is being used: elasticsearch or neo4j? Is neo4j used for anything else besides the graph service implementation?
    r
    • 2
    • 2
  • n

    nice-magazine-21738

    03/07/2024, 4:55 PM
    šŸ‘‹ Hello, team!, I am an ops trying to configure some health check for datahub, I am going through the documentation and I cannot find an health endpoint. I Just need to know the datahub is running.
    r
    b
    • 3
    • 11
  • d

    damp-greece-27806

    03/07/2024, 4:58 PM
    Hi everyone - does anyone know if there’s a global env support in the helm chart?
    r
    • 2
    • 1
  • q

    quiet-oil-71180

    03/08/2024, 1:10 AM
    Is AWS Opensearch supported for the latest 0.13.0 release ? Have moved to 0.13.0 i get
    {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [datahubpolicyindex_v2]","index":"datahubpolicyindex_v2","resource.id":"datahubpolicyindex_v2","resource.type":"index_or_alias","index_uuid":"_na_"}],"type":"index_not_found_exception","reason":"no such index [datahubpolicyindex_v2]","index":"datahubpolicyindex_v2","resource.id":"datahubpolicyindex_v2","resource.type":"index_or_alias","index_uuid":"_na_"},"status":404}
  • q

    quiet-oil-71180

    03/08/2024, 2:56 AM
    Fixed my issue by moving my schema from AWS Glue to Confluence internal
  • l

    limited-motherboard-51317

    03/08/2024, 8:20 AM
    Hi! Is it possible to use login/passwords for ingestion sources like DBs from some specific vault? Simply saying store credentials in external vault.
    r
    • 2
    • 5
  • l

    limited-fountain-24833

    03/08/2024, 10:55 AM
    Hi, I am trying to deploy DataHub through argoCD. We have Azure EventHub setup. Wondering how the configuration would look like in values.yaml file for kafka if we want to use Azure EventHub.: kafka: bootstrap: server: ā€œprerequisites-kafka:9092ā€ zookeeper: server: ā€œprerequisites-zookeeper:2181" # This section defines the names for the kafka topics that DataHub depends on, at a global level. Do not override this config # at a sub-chart level. topics: metadata_change_event_name: ā€œMetadataChangeEvent_v4ā€ failed_metadata_change_event_name: ā€œFailedMetadataChangeEvent_v4" metadata_audit_event_name: ā€œMetadataAuditEvent_v4ā€ datahub_usage_event_name: ā€œDataHubUsageEvent_v1" metadata_change_proposal_topic_name: ā€œMetadataChangeProposal_v1ā€ failed_metadata_change_proposal_topic_name: ā€œFailedMetadataChangeProposal_v1" metadata_change_log_versioned_topic_name: ā€œMetadataChangeLog_Versioned_v1ā€ metadata_change_log_timeseries_topic_name: ā€œMetadataChangeLog_Timeseries_v1" platform_event_topic_name: ā€œPlatformEvent_v1ā€ datahub_upgrade_history_topic_name: ā€œDataHubUpgradeHistory_v1" maxMessageBytes: ā€œ5242880ā€ # 5MB producer: compressionType: none maxRequestSize: ā€œ5242880" # 5MB consumer: maxPartitionFetchBytes: ā€œ5242880ā€ # 5MB stopContainerOnDeserializationError: true ## For AWS MSK set this to a number larger than 1 # partitions: 3 # replicationFactor: 3 schemaregistry: # GMS Implementation -
    url
    configured based on component context type: INTERNAL # Confluent Kafka Implementation # type: KAFKA # url: ā€œhttp://prerequisites-cp-schema-registry:8081ā€ # Glue Implementation -
    url
    not applicable # type: AWS_GLUE # glue: # region: us-east-1 # registry: datahub
    b
    • 2
    • 1
  • m

    modern-orange-37660

    03/12/2024, 4:36 PM
    ./gradlew build
    is failing at
    :docs-website:downloadHistoricalVersions
    due to SSL error. Could anyone point me to the file I can download manually if I can do so? Local docker build 0.13.0 version with M1 Mac. Log in the thread.
    b
    • 2
    • 2
  • d

    damp-greece-27806

    03/13/2024, 1:32 PM
    Hello! we’re trying to upgrade to v0.13.0, and are stuck at the systemupdate job. It keeps failing to instantiate a kafka producer due to:
    Copy code
    WARN [AnnotationConfigApplicationContext] Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'upgradeCli': Unsatisfied dependency expressed through field 'noCodeUpgrade': Error creating bean with name 'dataHubKafkaEventProducerFactory': Unsatisfied dependency expressed through field 'kafkaProducer': Error creating bean with name 'kafkaProducer' defined in class path resource [com/linkedin/gms/factory/kafka/DataHubKafkaProducerFactory.class]: Failed to instantiate [org.apache.kafka.clients.producer.Producer]: Factory method 'createInstance' threw exception with message: Failed to construct kafka producer
    2024-03-13 13:30:24,459 [main] INFO io.ebean.datasource:755 - DataSourcePool [gmsEbeanServiceConfig] shutdown min[2] max[50] free[2] busy[0] waiting[0] highWaterMark[1] waitCount[0] hitCount[2] psc[hit:0 miss:0 put:0 rem:0]
    ERROR [SpringApplication] Application run failed
    org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'upgradeCli': Unsatisfied dependency expressed through field 'noCodeUpgrade': Error creating bean with name 'dataHubKafkaEventProducerFactory': Unsatisfied dependency expressed through field 'kafkaProducer': Error creating bean with name 'kafkaProducer' defined in class path resource [com/linkedin/gms/factory/kafka/DataHubKafkaProducerFactory.class]: Failed to instantiate [org.apache.kafka.clients.producer.Producer]: Factory method 'createInstance' threw exception with message: Failed to construct kafka producer
    At the bottom I see this:
    Copy code
    Caused by: java.lang.NoClassDefFoundError: software/amazon/awssdk/thirdparty/jackson/core/JsonFactory
    But this feels like it would be a problem outside of configuration, any pointers? Our details FYI: • AWS MSK 3.6.0 • AWS_GLUE schema registry • AWS OpenSearch • kubernetes/helm deployment on EKS
    b
    • 2
    • 3
  • m

    millions-art-55322

    03/13/2024, 2:28 PM
    Good morning. Does DataHub have any MDM/RDM (Master Data Management) capabilities? is there a recommended MDM tool that integrates well with DataHub? @millions-art-55322 @damp-refrigerator-68911 would love to know!
    b
    • 2
    • 1
1...4950515253Latest