https://datahubproject.io logo
Join Slack
Powered by
# all-things-deployment
  • m

    mysterious-doctor-34270

    02/23/2024, 3:16 AM
    Hi team, I followed this guide https://datahubproject.io/docs/deploy/kubernetes/ and managed to get it working. I need to do the same in an offline environment. Is there a guide for this?
    b
    • 2
    • 1
  • w

    witty-motorcycle-52108

    02/23/2024, 11:53 PM
    are the acryldata docker hub repos no longer recommended for use, and we should use the linkedin ones instead?
    p
    r
    • 3
    • 10
  • l

    lemon-receptionist-90470

    02/26/2024, 4:32 PM
    Hey 👋 I'm currently setting up DataHub using the
    0.3.27
    chart version and using Elasticsearch
    8.5.1
    chart version to be compatible with EKS
    1.28.0
    Datahub Setup - authentication • Set up a user and password for Elasticsearch. • In the
    datahub
    values, I've configured the
    elasticsearch
    settings to use SSL, skip SSL checks (for now), and provided the auth credentials:
    Copy code
    global:
      strict_mode: true
      graph_service_impl: elasticsearch
      datahub_analytics_enabled: true
      datahub_standalone_consumers_enabled: false
    
      elasticsearch:
        useSSL: "true"
        skipcheck: "true"
        insecure: "true" # This should be false in values-production.yaml
        auth:
          username: "elastic"
          password:
            secretRef: elasticsearch-master-credentials
            secretKey: password
    Issue • The
    elasticsearchSetupJob
    runs as expected, but the
    gms
    service is encountering errors during startup. • The error logs indicate an issue with SSL certification path and a connection refused error:
    Copy code
    Caused by: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target                                                │
    │ See <https://opensearch.org/docs/latest/clients/java-rest-high-level/> for troubleshooting.                                                                                                                                                         │
    │     at org.opensearch.client.RestClient.extractAndWrapCause(RestClient.java:947)                                                                                                                                                                  │
    │     at org.opensearch.client.RestClient.performRequest(RestClient.java:332)                                                                                                                                                                       │
    │     at org.opensearch.client.RestClient.performRequest(RestClient.java:320)                                                                                                                                                                       │
    │     at org.opensearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1918)                                                                                                                                            │
    │     at org.opensearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1884)                                                                                                                                                    │
    │     at org.opensearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1852)                                                                                                                                      │
    │     at org.opensearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1095)                                                                                                                                                            │
    │     at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:186)                                                                                                                                       │
    │     ... 16 common frames omitted
    Does anyone have insights on resolving this? I also checked other similar threads here, but they did not help. Thanks!
    r
    • 2
    • 3
  • b

    brainy-butcher-66683

    02/26/2024, 4:52 PM
    ~Hi team, I have two Datahub instances the
    prod
    has more datasets, charts, dashboards and domains than the
    dev
    instance however the
    dev
    instance has more data in the metadata database ~24Gb while the prod has ~ 7GB. I believe this is due to the higher frequency of change made in the
    dev
    instance (so the change log information is more).~ Is there a way to reduce the retention time of the change log within the metadata database
    r
    • 2
    • 1
  • l

    lemon-dusk-93888

    02/26/2024, 6:27 PM
    Hi Team We are trying to get datahub up and running with OIDC SSO on a kubernetes cluster. We followed the documentation regarding Azure AuthN configuration for datahub-frontend. Additional changes we made: • We configured datahub-frontend to use an Ingress controller (avi-lb) with cert-manager annotation to get a TLS certificate issued from our private ACME/PKI instance. • We changed the datahub-frontend.service.type value to ClusterIP as we are using an Ingress controller. • We added some extraEnvs to datahub-frontend to use a proxy where we use a TLS interception mechanism. • We added all the AUTH_OIDC_ settings to the extraEnvs of the datahub-frontend section in the helm chart. Our problem is that SSO is not working and the datahub-frontend pod throws following errors:
    Copy code
    2024-02-26 15:27:50,722 [application-akka.actor.default-dispatcher-7] ERROR controllers.SsoCallbackController - Caught exception while attempting to handle SSO callback! It's likely that SSO integration is mis-configured.
    java.util.concurrent.CompletionException: org.pac4j.core.exception.TechnicalException: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    Do we have to provide our private CA/PKI certificates to the JVM or datahub-frontend certificate trust store? Any advice or help is very much appreciated.
    r
    • 2
    • 1
  • r

    rich-barista-93413

    02/27/2024, 6:53 PM
    Hi Drew.. can you provide more details? We just see a screenshot.
  • r

    rapid-ocean-29344

    02/28/2024, 11:58 PM
    Hello, I am trying to deploy in AWS, We are using MSK (Kafka). Is there any recommended environment configs for
    datahub-actions
    for IAM Role based access?
    r
    m
    b
    • 4
    • 8
  • s

    stocky-garage-83010

    02/29/2024, 10:33 AM
    Hi, is it possible to add caCert via helm parameter or in yaml file ? we have to include our rootca cause sso azure aad, hope anyone could give me a hint
    r
    l
    • 3
    • 2
  • n

    nice-piano-41179

    03/03/2024, 9:02 AM
    Hi Trying to following the guide of deploy with Kubernetes I’m using docker desktop k8s on macOS (Intel chip) The “prerequisites” installation is ok, The problem is that the “helm install datahub datahub/datahub” is hang up and not respondimg. resources of the local kube is 2 cpu / 12 GB Tx
    r
    • 2
    • 4
  • a

    agreeable-address-71270

    03/04/2024, 5:31 AM
    Hello folks, I am trying to find some answers on how the ordering of events in a Kafka topic partition impact the Datahub application? What is the suggested partition strategy to use? Can events be out of order of the topics? I am hosting the Kafka cluster on AWS MSK, and would appreciate advice on how to configure the topic partitions. Thanks in advance!
    l
    • 2
    • 2
  • p

    proud-psychiatrist-26286

    03/05/2024, 4:15 PM
    I am seeing a
    CrashBackOffLoop
    all of a sudden on an M2 MacBook for the
    prerequisites-mysql
    pod and image it is pulling, when running
    helm install prerequisites datahub/datahub-prerequisites
    . After inspecting the images, it is pulling down the incorrect image architecture (amd64) for mysql, all other prerequisites are pulling the correct one (arm64). Opened a question/issue here: https://github.com/acryldata/datahub-helm/issues/435 just trying to track down why this started all of a sudden. Was having no issues deploying these helm charts up until yesterday. Any help would be greatly appreciated.
    r
    • 2
    • 7
  • a

    astonishing-keyboard-8419

    03/05/2024, 4:49 PM
    Is it possible to use a kafka registry with only truststore and no keystore? I deploy with helm and try with registry_ssl env var and also the springOverride configuration but with no luck. sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    l
    r
    • 3
    • 2
  • t

    thankful-teacher-72271

    03/05/2024, 9:36 PM
    Hello, everyone! I use DataHub on Azure AKS and currently ingest metadata using cronjobs. However, this process requires many steps: creating the recipe, changing helm chart values and changing the k8s configmap. What pattern do you usually use for metadata ingestion in k8s?
    l
    r
    • 3
    • 2
  • b

    brainy-musician-50192

    03/06/2024, 2:03 PM
    How stable is the new release, 0.13.0, and is the chart updated to support it?
    b
    l
    +3
    • 6
    • 29
  • r

    rich-pager-68736

    03/06/2024, 3:31 PM
    Hi there, is there an ENV variable for frontend to disable the incident feature in v0.13.0?
    • 1
    • 1
  • c

    cuddly-library-82577

    03/07/2024, 1:44 PM
    Hello, good morning, If we are not setting
    graph_service_impl
    anywhere, what is being used: elasticsearch or neo4j? Is neo4j used for anything else besides the graph service implementation?
    r
    • 2
    • 2
  • n

    nice-magazine-21738

    03/07/2024, 4:55 PM
    👋 Hello, team!, I am an ops trying to configure some health check for datahub, I am going through the documentation and I cannot find an health endpoint. I Just need to know the datahub is running.
    r
    b
    • 3
    • 11
  • d

    damp-greece-27806

    03/07/2024, 4:58 PM
    Hi everyone - does anyone know if there’s a global env support in the helm chart?
    r
    • 2
    • 1
  • q

    quiet-oil-71180

    03/08/2024, 1:10 AM
    Is AWS Opensearch supported for the latest 0.13.0 release ? Have moved to 0.13.0 i get
    {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [datahubpolicyindex_v2]","index":"datahubpolicyindex_v2","resource.id":"datahubpolicyindex_v2","resource.type":"index_or_alias","index_uuid":"_na_"}],"type":"index_not_found_exception","reason":"no such index [datahubpolicyindex_v2]","index":"datahubpolicyindex_v2","resource.id":"datahubpolicyindex_v2","resource.type":"index_or_alias","index_uuid":"_na_"},"status":404}
  • q

    quiet-oil-71180

    03/08/2024, 2:56 AM
    Fixed my issue by moving my schema from AWS Glue to Confluence internal
  • l

    limited-motherboard-51317

    03/08/2024, 8:20 AM
    Hi! Is it possible to use login/passwords for ingestion sources like DBs from some specific vault? Simply saying store credentials in external vault.
    r
    • 2
    • 5
  • l

    limited-fountain-24833

    03/08/2024, 10:55 AM
    Hi, I am trying to deploy DataHub through argoCD. We have Azure EventHub setup. Wondering how the configuration would look like in values.yaml file for kafka if we want to use Azure EventHub.: kafka: bootstrap: server: “prerequisites-kafka:9092” zookeeper: server: “prerequisites-zookeeper:2181" # This section defines the names for the kafka topics that DataHub depends on, at a global level. Do not override this config # at a sub-chart level. topics: metadata_change_event_name: “MetadataChangeEvent_v4” failed_metadata_change_event_name: “FailedMetadataChangeEvent_v4" metadata_audit_event_name: “MetadataAuditEvent_v4” datahub_usage_event_name: “DataHubUsageEvent_v1" metadata_change_proposal_topic_name: “MetadataChangeProposal_v1” failed_metadata_change_proposal_topic_name: “FailedMetadataChangeProposal_v1" metadata_change_log_versioned_topic_name: “MetadataChangeLog_Versioned_v1” metadata_change_log_timeseries_topic_name: “MetadataChangeLog_Timeseries_v1" platform_event_topic_name: “PlatformEvent_v1” datahub_upgrade_history_topic_name: “DataHubUpgradeHistory_v1" maxMessageBytes: “5242880” # 5MB producer: compressionType: none maxRequestSize: “5242880" # 5MB consumer: maxPartitionFetchBytes: “5242880” # 5MB stopContainerOnDeserializationError: true ## For AWS MSK set this to a number larger than 1 # partitions: 3 # replicationFactor: 3 schemaregistry: # GMS Implementation -
    url
    configured based on component context type: INTERNAL # Confluent Kafka Implementation # type: KAFKA # url: “http://prerequisites-cp-schema-registry:8081” # Glue Implementation -
    url
    not applicable # type: AWS_GLUE # glue: # region: us-east-1 # registry: datahub
    b
    • 2
    • 1
  • m

    modern-orange-37660

    03/12/2024, 4:36 PM
    ./gradlew build
    is failing at
    :docs-website:downloadHistoricalVersions
    due to SSL error. Could anyone point me to the file I can download manually if I can do so? Local docker build 0.13.0 version with M1 Mac. Log in the thread.
    b
    • 2
    • 2
  • d

    damp-greece-27806

    03/13/2024, 1:32 PM
    Hello! we’re trying to upgrade to v0.13.0, and are stuck at the systemupdate job. It keeps failing to instantiate a kafka producer due to:
    Copy code
    WARN [AnnotationConfigApplicationContext] Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'upgradeCli': Unsatisfied dependency expressed through field 'noCodeUpgrade': Error creating bean with name 'dataHubKafkaEventProducerFactory': Unsatisfied dependency expressed through field 'kafkaProducer': Error creating bean with name 'kafkaProducer' defined in class path resource [com/linkedin/gms/factory/kafka/DataHubKafkaProducerFactory.class]: Failed to instantiate [org.apache.kafka.clients.producer.Producer]: Factory method 'createInstance' threw exception with message: Failed to construct kafka producer
    2024-03-13 13:30:24,459 [main] INFO io.ebean.datasource:755 - DataSourcePool [gmsEbeanServiceConfig] shutdown min[2] max[50] free[2] busy[0] waiting[0] highWaterMark[1] waitCount[0] hitCount[2] psc[hit:0 miss:0 put:0 rem:0]
    ERROR [SpringApplication] Application run failed
    org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'upgradeCli': Unsatisfied dependency expressed through field 'noCodeUpgrade': Error creating bean with name 'dataHubKafkaEventProducerFactory': Unsatisfied dependency expressed through field 'kafkaProducer': Error creating bean with name 'kafkaProducer' defined in class path resource [com/linkedin/gms/factory/kafka/DataHubKafkaProducerFactory.class]: Failed to instantiate [org.apache.kafka.clients.producer.Producer]: Factory method 'createInstance' threw exception with message: Failed to construct kafka producer
    At the bottom I see this:
    Copy code
    Caused by: java.lang.NoClassDefFoundError: software/amazon/awssdk/thirdparty/jackson/core/JsonFactory
    But this feels like it would be a problem outside of configuration, any pointers? Our details FYI: • AWS MSK 3.6.0 • AWS_GLUE schema registry • AWS OpenSearch • kubernetes/helm deployment on EKS
    b
    • 2
    • 3
  • m

    millions-art-55322

    03/13/2024, 2:28 PM
    Good morning. Does DataHub have any MDM/RDM (Master Data Management) capabilities? is there a recommended MDM tool that integrates well with DataHub? @millions-art-55322 @damp-refrigerator-68911 would love to know!
    b
    • 2
    • 1
  • d

    damp-solstice-31196

    03/13/2024, 7:47 PM
    Hi all. I'm working on a AWS ECS Fargate deployment and have stumbled on an issue I can't figure out. I have an RDS MySQL database and am deploying all other services as containers. The containers I've successfully brought up are the following: • zookeeper • elasticsearch • elasticsearch-setup • kafka-setup • broker • schema-registry • datahub-upgrade I'm stuck on datahub-gms with an error about the Jetty server and failing to bind to 0.0.0.0/0.0.0.0:8080. There are no other containers using port 8080. I also attempted using a different port (8082) and got the same error, so it seems like an issue internal to the container. Attaching additional info on error message, settings in the thread.
    b
    • 2
    • 16
  • d

    dazzling-flag-51738

    03/14/2024, 6:56 AM
    Hi all, I am deploying Datahub on my kubernetes k3s server which is a proxy restricted server (Redhat 8.7 (only cli)). I am getting error during deployment of kafka pod using helm. I use helm install command in prerequisite folder. And I got connection refused error in logs. I know this is somewhere related to Proxy. Can you suggest where i have to mention PROXY's in values.yaml file.
    b
    • 2
    • 1
  • w

    wooden-football-93372

    03/14/2024, 7:38 AM
    Facing an issue with terraform provider version and opensearch version compatibility. We are using terraform provider version as 5.39.0 and opensearch version is 2.11. Also facing the issue while creating the index in opensearch. Could someone suggest the compatible versionfor this
    r
    • 2
    • 1
  • b

    better-orange-49102

    03/14/2024, 7:51 AM
    hi team, is it possible to create multiple public datahub search views and assign as default view (ie they dont have to select the view to use it) to certain groups of users? Based on my current understanding, 1 common public views are assigned to all users at any time and it is not possible to assign view A to group 1 and view B to group 2.
    • 1
    • 1
  • c

    calm-alligator-12692

    03/14/2024, 9:29 AM
    Which services/components would you need to scale to improve ingestion performance?
    b
    • 2
    • 1