https://datahubproject.io logo
Join Slack
Powered by
# all-things-deployment
  • s

    silly-fish-85029

    02/06/2023, 9:46 AM
    Hey, We recently changed DataHub's backend MySQL to an AWS RDS instance but DataHub still shows the old metadata even when there is no data ingested yet. How can we get rid of it? I read that it's because of ElasticSearch. I tried
    datahub delete
    commands but it didn't cleanup the old metadata.
    b
    a
    • 3
    • 32
  • s

    salmon-jordan-53958

    02/07/2023, 3:30 PM
    Hi, any thoughts on the error below: kafka.common.InconsistentClusterIdException: The Cluster ID VqMGuzTLRJKCcUfI7vq91A doesn't match stored clusterId Some(pQUnNhLwSAiX_Qx842-AtA) in meta.properties. The broker is trying to join the wrong cluster. Configured zookeeper.connect may be wrong.
    βœ… 1
    a
    b
    • 3
    • 2
  • w

    witty-motorcycle-52108

    02/07/2023, 7:44 PM
    hey all, it appears that there's no
    acryldata/datahub-postgres-setup:v0.9.6.1
    image tagged on docker hub. was that an intentional omission, or an unintentional one? i see
    v0.9.6.4
    , but that's not an official release on GitHub. what version should we be using for the
    acryldata/datahub-*
    images on docker hub that's consistent across all the images (minus actions)?
    βœ… 1
    a
    • 2
    • 8
  • w

    wide-laptop-97072

    02/08/2023, 2:44 AM
    Hi πŸ‘‹ ! Noob to Kubernetes & EKS here, but I still am following this & this guide to try and deploy
    datahub
    onto AWS EKS. But I am not able to get the prerequisites deployed and see:
    CrashLoopBackOff
    when I run
    kubectl get pods
    . Upon checking the logs for the
    schema-regiratry
    logs (from some similar threads before), I see that Kafka is not deployed successfully with the logs. Any pointers to resolve this is appreciated.
    Copy code
    [main] INFO io.confluent.admin.utils.ClusterStatus - Expected 1 brokers but found only 0. Trying to query Kafka for metadata again ...
    [main] ERROR io.confluent.admin.utils.ClusterStatus - Expected 1 brokers but found only 0. Brokers found [].
    b
    s
    • 3
    • 10
  • c

    creamy-van-28626

    02/08/2023, 6:06 PM
    Hi team We could not able to figure out what is issue behind this In place of version it is coming as null Can you please help?
    e
    • 2
    • 5
  • f

    fierce-baker-1392

    02/09/2023, 8:25 AM
    Hi team, when I use kubernetes to deploy datahub, sometimes these errors occur, does anyone know how to solve this problem?
    e
    b
    • 3
    • 10
  • i

    important-rainbow-77301

    02/09/2023, 9:59 AM
    Hi, dear Datahub team. we tried several datahub images (like in the attachment) in our deployments, but almost all the tags have vulnerability issues and fail to get scanned. Could you please let us know which tags are vulnerability free? We have tried from version 0.8.45 to 0.10.0. None of them passes through a vulnerability check.
    b
    • 2
    • 1
  • e

    elegant-article-21703

    02/09/2023, 1:38 PM
    Hello Datahub team! I'm trying to upgrade from
    v0.9.6.1
    to
    v0.10.0
    and I got the following message:
    Copy code
    $ helm upgrade -n datahub --atomic --debug datahub datahub ./my-folder/datahub
    upgrade.go:121: [debug] preparing upgrade for datahub
    upgrade.go:129: [debug] performing update for datahub
    upgrade.go:301: [debug] creating upgraded release for datahub
    client.go:255: [debug] Starting delete for "datahub-datahub-system-update-job" Job
    client.go:109: [debug] creating 1 resource(s)
    W0209 17:32:01.749531   11644 warnings.go:67] spec.template.spec.containers[0].env[27].name: duplicate name "DATAHUB_UPGRADE_HISTORY_TOPIC_NAME"
    W0209 17:32:01.749531   11644 warnings.go:67] spec.template.spec.containers[0].env[29].name: duplicate name "ENTITY_REGISTRY_CONFIG_PATH"
    W0209 17:32:01.750525   11644 warnings.go:67] spec.template.spec.containers[0].env[30].name: duplicate name "EBEAN_DATASOURCE_USERNAME"
    W0209 17:32:01.750525   11644 warnings.go:67] spec.template.spec.containers[0].env[31].name: duplicate name "EBEAN_DATASOURCE_PASSWORD"
    W0209 17:32:01.750525   11644 warnings.go:67] spec.template.spec.containers[0].env[32].name: duplicate name "EBEAN_DATASOURCE_HOST"
    W0209 17:32:01.750525   11644 warnings.go:67] spec.template.spec.containers[0].env[33].name: duplicate name "EBEAN_DATASOURCE_URL"
    W0209 17:32:01.751522   11644 warnings.go:67] spec.template.spec.containers[0].env[34].name: duplicate name "EBEAN_DATASOURCE_DRIVER"
    W0209 17:32:01.751522   11644 warnings.go:67] spec.template.spec.containers[0].env[35].name: duplicate name "KAFKA_BOOTSTRAP_SERVER"
    W0209 17:32:01.752524   11644 warnings.go:67] spec.template.spec.containers[0].env[36].name: duplicate name "KAFKA_SCHEMAREGISTRY_URL"
    W0209 17:32:01.752524   11644 warnings.go:67] spec.template.spec.containers[0].env[38].name: duplicate name "ELASTICSEARCH_HOST"
    W0209 17:32:01.753524   11644 warnings.go:67] spec.template.spec.containers[0].env[39].name: duplicate name "ELASTICSEARCH_PORT"
    W0209 17:32:01.753524   11644 warnings.go:67] spec.template.spec.containers[0].env[40].name: duplicate name "SKIP_ELASTICSEARCH_CHECK"
    W0209 17:32:01.760523   11644 warnings.go:67] spec.template.spec.containers[0].env[41].name: duplicate name "ELASTICSEARCH_USE_SSL"
    W0209 17:32:01.765522   11644 warnings.go:67] spec.template.spec.containers[0].env[45].name: duplicate name "GRAPH_SERVICE_IMPL"
    client.go:464: [debug] Watching for changes to Job datahub-datahub-system-update-job with timeout of 5m0s
    client.go:492: [debug] Add/Modify event for datahub-datahub-system-update-job: ADDED
    client.go:531: [debug] datahub-datahub-system-update-job: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
    client.go:492: [debug] Add/Modify event for datahub-datahub-system-update-job: MODIFIED
    client.go:174: [debug] checking 13 resources for changes
    client.go:437: [debug] Looks like there are no changes for Secret "datahub-auth-secrets"
    client.go:437: [debug] Looks like there are no changes for Secret "datahub-encryption-secrets"
    W0209 17:33:15.230558   11644 warnings.go:67] spec.jobTemplate.spec.template.spec.containers[0].env[10].name: duplicate name "EBEAN_DATASOURCE_USERNAME"
    W0209 17:33:15.236558   11644 warnings.go:67] spec.jobTemplate.spec.template.spec.containers[0].env[11].name: duplicate name "EBEAN_DATASOURCE_PASSWORD"
    W0209 17:33:15.241559   11644 warnings.go:67] spec.jobTemplate.spec.template.spec.containers[0].env[12].name: duplicate name "EBEAN_DATASOURCE_HOST"
    W0209 17:33:15.245558   11644 warnings.go:67] spec.jobTemplate.spec.template.spec.containers[0].env[13].name: duplicate name "EBEAN_DATASOURCE_URL"
    W0209 17:33:15.249560   11644 warnings.go:67] spec.jobTemplate.spec.template.spec.containers[0].env[14].name: duplicate name "EBEAN_DATASOURCE_DRIVER"
    wait.go:53: [debug] beginning wait for 13 resources with timeout of 5m0s
    wait.go:225: [debug] Deployment is not ready: datahub/datahub-acryl-datahub-actions. 0 out of 1 expected pods are ready
    wait.go:225: [debug] Deployment is not ready: datahub/datahub-datahub-gms. 0 out of 1 expected pods are ready
    client.go:255: [debug] Starting delete for "datahub-nocode-migration-job" Job
    client.go:284: [debug] jobs.batch "datahub-nocode-migration-job" not found
    client.go:109: [debug] creating 1 resource(s)
    client.go:464: [debug] Watching for changes to Job datahub-nocode-migration-job with timeout of 5m0s
    client.go:492: [debug] Add/Modify event for datahub-nocode-migration-job: ADDED
    client.go:531: [debug] datahub-nocode-migration-job: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
    client.go:492: [debug] Add/Modify event for datahub-nocode-migration-job: MODIFIED
    client.go:531: [debug] datahub-nocode-migration-job: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
    upgrade.go:360: [debug] warning: Upgrade "datahub" failed: post-upgrade hooks failed: timed out waiting for the condition
    upgrade.go:378: [debug] Upgrade failed and atomic is set, rolling back to last successful release
    history.go:53: [debug] getting history for release datahub
    rollback.go:64: [debug] preparing rollback of datahub
    rollback.go:112: [debug] rolling back datahub (current: v19, target: v18)
    rollback.go:71: [debug] creating rolled back release for datahub
    rollback.go:77: [debug] performing rollback of datahub
    client.go:174: [debug] checking 13 resources for changes
    client.go:437: [debug] Looks like there are no changes for Secret "datahub-auth-secrets"
    client.go:437: [debug] Looks like there are no changes for Secret "datahub-encryption-secrets"
    W0209 17:42:46.507031   11644 warnings.go:67] spec.jobTemplate.spec.template.spec.containers[0].env[10].name: duplicate name "EBEAN_DATASOURCE_USERNAME"
    W0209 17:42:46.507031   11644 warnings.go:67] spec.jobTemplate.spec.template.spec.containers[0].env[11].name: duplicate name "EBEAN_DATASOURCE_PASSWORD"
    W0209 17:42:46.507031   11644 warnings.go:67] spec.jobTemplate.spec.template.spec.containers[0].env[12].name: duplicate name "EBEAN_DATASOURCE_HOST"
    W0209 17:42:46.507031   11644 warnings.go:67] spec.jobTemplate.spec.template.spec.containers[0].env[13].name: duplicate name "EBEAN_DATASOURCE_URL"
    W0209 17:42:46.508031   11644 warnings.go:67] spec.jobTemplate.spec.template.spec.containers[0].env[14].name: duplicate name "EBEAN_DATASOURCE_DRIVER"
    wait.go:53: [debug] beginning wait for 13 resources with timeout of 5m0s
    rollback.go:223: [debug] superseding previous deployment 18
    rollback.go:83: [debug] updating status for rolled back release for datahub
    Error: UPGRADE FAILED: release datahub failed, and has been rolled back due to atomic being set: post-upgrade hooks failed: timed out waiting for the condition
    helm.go:81: [debug] post-upgrade hooks failed: timed out waiting for the condition
    release datahub failed, and has been rolled back due to atomic being set
    <http://helm.sh/helm/v3/pkg/action.(*Upgrade).failRelease|helm.sh/helm/v3/pkg/action.(*Upgrade).failRelease>
            /home/circleci/helm.sh/helm/pkg/action/upgrade.go:410
    <http://helm.sh/helm/v3/pkg/action.(*Upgrade).performUpgrade|helm.sh/helm/v3/pkg/action.(*Upgrade).performUpgrade>
            /home/circleci/helm.sh/helm/pkg/action/upgrade.go:341
    <http://helm.sh/helm/v3/pkg/action.(*Upgrade).Run|helm.sh/helm/v3/pkg/action.(*Upgrade).Run>
            /home/circleci/helm.sh/helm/pkg/action/upgrade.go:130
    main.newUpgradeCmd.func2
            /home/circleci/helm.sh/helm/cmd/helm/upgrade.go:154
    <http://github.com/spf13/cobra.(*Command).execute|github.com/spf13/cobra.(*Command).execute>
            /go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842
    <http://github.com/spf13/cobra.(*Command).ExecuteC|github.com/spf13/cobra.(*Command).ExecuteC>
            /go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950
    <http://github.com/spf13/cobra.(*Command).Execute|github.com/spf13/cobra.(*Command).Execute>
            /go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887
    main.main
            /home/circleci/helm.sh/helm/cmd/helm/helm.go:80
    runtime.main
            /usr/local/go/src/runtime/proc.go:203
    runtime.goexit
            /usr/local/go/src/runtime/asm_amd64.s:1373
    UPGRADE FAILED
    main.newUpgradeCmd.func2
            /home/circleci/helm.sh/helm/cmd/helm/upgrade.go:156
    <http://github.com/spf13/cobra.(*Command).execute|github.com/spf13/cobra.(*Command).execute>
            /go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842
    <http://github.com/spf13/cobra.(*Command).ExecuteC|github.com/spf13/cobra.(*Command).ExecuteC>
            /go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950
    <http://github.com/spf13/cobra.(*Command).Execute|github.com/spf13/cobra.(*Command).Execute>
            /go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887
    main.main
            /home/circleci/helm.sh/helm/cmd/helm/helm.go:80
    runtime.main
            /usr/local/go/src/runtime/proc.go:203
    runtime.goexit
            /usr/local/go/src/runtime/asm_amd64.s:1373
    Copy code
    $ kubectl get pods -n datahub
    NAME                                                READY   STATUS      RESTARTS      AGE
    datahub-acryl-datahub-actions-79f678dc-ntcmv        1/1     Running     0             13d
    datahub-datahub-frontend-79c7949c69-ftptf           1/1     Running     0             13d
    datahub-datahub-gms-698cb7d7-4wsvj                  1/1     Running     0             13d
    datahub-datahub-system-update-job--1-zvbw4          1/1     Running     0             4m46s
    datahub-datahub-upgrade-job--1-n68k7                0/1     Completed   0             13d
    datahub-elasticsearch-setup-job--1-djdvl            0/1     Completed   0             6m37s
    datahub-kafka-setup-job--1-cjgjf                    0/1     Completed   0             6m28s
    datahub-mysql-setup-job--1-vkdr9                    0/1     Completed   0             4m52s
    elasticsearch-master-0                              0/1     Running     1 (61s ago)   15d
    elasticsearch-master-1                              1/1     Running     1 (13d ago)   15d
    elasticsearch-master-2                              1/1     Running     0             15d
    prerequisites-cp-schema-registry-7d489cfc6d-swp2d   2/2     Running     0             15d
    prerequisites-kafka-0                               1/1     Running     0             15d
    prerequisites-mysql-0                               1/1     Running     0             15d
    prerequisites-neo4j-community-0                     1/1     Running     0             169d
    prerequisites-zookeeper-0                           1/1     Running     0             15d
    Does anyone have any idea of what do I'm missing here? I've seen that there are duplicate environment variables such as:
    Copy code
    DATAHUB_UPGRADE_HISTORY_TOPIC_NAME
    ENTITY_REGISTRY_CONFIG_PATH
    EBEAN_DATASOURCE_USERNAME
    EBEAN_DATASOURCE_PASSWORD
    EBEAN_DATASOURCE_HOST
    EBEAN_DATASOURCE_PORT
    EBEAN_DATASOURCE_DBNAME
    Thank you all in advance!
    b
    b
    +2
    • 5
    • 9
  • b

    brainy-tent-14503

    02/10/2023, 12:49 AM
    Looks like this job is still running
    datahub-datahub-system-update-job--1-zvbw4          1/1     Running     0             4m46s
    wait for it to complete and re-run the command. The atomic flag might be interrupting it on timeout, so either try without atomic, let that job run or increase the timeout based on your data size and hardware it may take awhile, refer to this doc.
  • b

    billions-family-12217

    02/10/2023, 7:13 AM
    hi
    βœ… 1
  • b

    billions-family-12217

    02/10/2023, 7:14 AM
    datahub-ingestion-cron: enabled: true crons: mysql: schedule: "0 * * * *" # Every hour recipe: configmapName: recipe-config fileName: mysql_recipe.yml is this not working [12:38 PM] can any One help me out
    a
    b
    • 3
    • 2
  • f

    fierce-baker-1392

    02/10/2023, 11:00 AM
    Hi team, I am struggling on change homepage logo, can I change logo through modify chart’s value? If so, are there any use case on how to change it? I deploy datahub by k8s. Thanks~
    βœ… 1
    b
    • 2
    • 5
  • p

    powerful-memory-77948

    02/10/2023, 9:22 PM
    Hi All, I have a question related to using ES instead of Neo4J for graph service. If we don't have neo4j footprint in our ecosystem and we switch the graph service to use ES as described here, would we still get all the features (both functional and non-functional)?
    βœ… 1
    m
    • 2
    • 1
  • f

    fierce-baker-1392

    02/12/2023, 12:44 PM
    Hi team, I have modified the default kafka topics in values.yaml. But when I deploy kafkaSetup job in k8s, some topics have not been replaced, is there any problem with my values.yaml ?
    βœ… 1
    πŸ‘€ 1
    a
    i
    • 3
    • 6
  • f

    fierce-baker-1392

    02/12/2023, 12:47 PM
    image.png
  • b

    billions-twilight-48559

    02/13/2023, 11:51 AM
    set the channel topic: _schema
  • b

    billions-twilight-48559

    02/13/2023, 11:51 AM
    cleared channel topic
  • m

    microscopic-mechanic-13766

    02/13/2023, 1:36 PM
    Good morning, I have the
    METADATA_SERVICE_AUTH_ENABLED
    enabled but I have a "problem". The problem is that the creation of the tokens has to be done in Datahub. Is there any existing way to make Datahub check the validation of a token in a third-party software like Apache Knox? The aim of this is to have a centralize site to manage the applications tokens. Thanks in advance!!
    πŸ‘€ 2
    βœ… 1
    i
    • 2
    • 3
  • l

    little-megabyte-1074

    02/13/2023, 5:39 PM
    set the channel topic: Channel to discuss all-things-deploying DataHub
  • w

    witty-motorcycle-52108

    02/14/2023, 5:05 AM
    having trouble understand what does and does not need to be configured on some of the containers such as the upgrade container. if i look at the helm chart for the upgrade job, it defines a TON of env vars for things like elasticsearch cloning settings, kafka topic names, and more. last deploy we did of 0.9.1 we didnt have to define the kafka topic names anywhere to use the default values. do we now have to define all of these things everywhere, or are there still sane defaults present in the built containers if we skip setting some of these configs? i've read the docs for the in place migration but they appear specific to the 0.8.0 release, and i dont see any other docs about running upgrades and what commands need to be executed when during an upgrade. it's hard to understand all of this when doing a deployment without the helm charts.
    βœ… 2
    πŸ‘€ 1
    i
    • 2
    • 1
  • w

    witty-motorcycle-52108

    02/14/2023, 6:23 AM
    basically i'd appreciate some clarity on what exactly the steps are to go from one DH version to another. i first tried deploying all the 0.10.0 images, and had ANTLR errors. then i tried running the setup jobs which all succeeded, but i still had ANTLR errors. now i've set up the upgrade container based on the helm chart, am running it with
    -u SystemUpdate
    as specified in helm, and i'm getting logs with
    Caused by: java.lang.IllegalArgumentException: No upgrade with id SystemUpdate could be found. Aborting...
    in them which does not make any sense to me. also tried running the container with no
    -u
    arg, still threw errors. attaching screenshots of some logs. i also dont understand why it's saying
    2023-02-14 06:09:37.349 INFO 1 --- [ main] c.l.g.f.k.s.AwsGlueSchemaRegistryFactory : Creating AWS Glue registry
    when i have
    SCHEMA_REGISTRY_TYPE
    set to
    kafka
    . what services need to be running in order for an upgrade to take place? all? none? GMS is bootlooping due to
    Copy code
    2023-02-14 06:20:12,838 [ThreadPoolTaskExecutor-1] ERROR o.a.k.c.c.i.ConsumerCoordinator:283 - [Consumer clientId=consumer-generic-duhe-consumer-job-client-1, groupId=generic-duhe-consumer-job-client] User provided listener org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer$ListenerConsumerRebalanceListener failed on invocation of onPartitionsAssigned for partitions [DataHubUpgradeHistory_v1-2]
    and MAE consumer is bootlooping due to GMS not being available, but the upgrade task seems to have hostnames for both of those based on the helm chart? is there some circular dependency here that's causing issues?
    πŸ‘€ 1
    βœ… 1
    i
    s
    +2
    • 5
    • 54
  • p

    powerful-cat-68806

    02/14/2023, 8:09 AM
    Hi team, My
    datahub-datahub-gms-xxxx
    pod is failing with the error
    Copy code
    org.postgresql.util.PSQLException: ERROR: relation "metadata_aspect_v2" does not exist
    I’m using my own pgSQL db & configured its values in the chart I understand that the Postgres setup, in the deployment, should create this relation, but it’s not Pls. advise Cc: @incalculable-ocean-74010 @astonishing-answer-96712
    πŸ‘€ 1
    βœ… 1
    • 1
    • 1
  • m

    microscopic-mechanic-13766

    02/14/2023, 8:25 AM
    Good morning team, I am looking to send request to Datahub via API in order to create datasets and all the elements related to them (columns, description of dataset and columns, tags,...) My initial idea is to take such info from one Datahub (part that I have already manage to complete) and query it into another Datahub instance. I have been taking a look at the existing metadata ingestion example (this one) and I have noticed that it is possible to indicate dataset aspects like the profiling, the usage stats, ... but I haven't seen how to send the datasets columns (name, type, if nullable or not, ....) Is it really not possible?? Thanks in advance! Note: The lines that I have been looking at are from 3384 to the end, as I don't know the exact purpose of the rest of lines. If someone could help me understand what is their aim, it would be really great πŸ™‚ !!
    βœ… 1
    b
    a
    b
    • 4
    • 6
  • s

    shy-dog-84302

    02/14/2023, 11:05 AM
    Hi! I am trying to use an existing Kafka Cluster in my organization as Kafka backend for DataHub deployment in k8s. I want to create/use a dedicated Kafka user with minimal privileges to operate DataHub seamlessly. I have described my assumptions about this user and ACLs required in this 🧡 I would like to see comments from an expert or someone who used Kafka backend in the same way if my approach is correct or need some corrections.
    βœ… 2
    πŸ‘€ 1
    i
    a
    • 3
    • 7
  • w

    white-horse-97256

    02/14/2023, 5:53 PM
    Hi Team, Can we deploy the quickstart docker file in kubernetes or do we need to provision all other dependents by ourself?
    βœ… 1
    i
    b
    • 3
    • 6
  • w

    witty-motorcycle-52108

    02/14/2023, 9:58 PM
    i hate to do it to y'all but ran into another fun error. the ES setup job is failing, and i suspect its an opendistro vs elasticsearch thing given the error. any thoughts on how to resolve this? seems like its going to require some manual OS cluster manipulation
    βœ… 1
    b
    b
    s
    • 4
    • 121
  • r

    rapid-crowd-46218

    02/15/2023, 1:36 AM
    Hi, team. I posted again. Do I need to install a elasticsearch stemmer plug-in (language analyzer) for searching Korean in Datahub? As far as I know, the Korean stemmer analyzer is basically provided in the Elasticsearch 7.1. How can I search in Korean in Datahub UI?
    βœ… 1
    s
    • 2
    • 2
  • b

    billions-family-12217

    02/15/2023, 6:28 AM
    hi team .... I'm trying to deploy a scheduled job in datahub UI. It is not running.
    βœ… 1
    f
    • 2
    • 5
  • b

    billions-family-12217

    02/15/2023, 6:28 AM
    image.png
  • b

    billions-family-12217

    02/15/2023, 6:47 AM
    but the execution count is iterating as time triggers....it is always in pending stage
1...343536...53Latest