https://datahubproject.io logo
Join SlackCommunities
Powered by
# all-things-deployment
  • b

    breezy-article-15996

    05/06/2022, 9:27 PM
    Hi All, I found that there maybe some version mismatch when running
    helm dep update
    Below is the full message
    Copy code
    bqi@bqi-mac datahub-helm % helm dep update charts/datahub
    Hang tight while we grab the latest from your chart repositories...
    ...Successfully got an update from the "datahub" chart repository
    ...Successfully got an update from the "elastic" chart repository
    Update Complete. ⎈Happy Helming!⎈
    Error: can't get a valid version for repositories datahub-gms, datahub-mae-consumer, datahub-mce-consumer. Try changing the version constraint in Chart.yaml
    Curious whether this PR is the reason
    h
    s
    • 3
    • 2
  • b

    better-orange-49102

    05/08/2022, 6:15 AM
    I noticed that quite a few cases of GitHub CI failing due to python libraries releasing new versions(for instance, great expectations 0.15.4 causes issues with
    Copy code
    from __future__ import annotations
    , causing some methods to not work as expected. We usually don't lock or constrain the versions until the version issue occur, but I was wondering if it makes more sense to just specify the version? Wondering what's the considerations behind not locking the versions (mostly), as a layman python user.
    f
    • 2
    • 1
  • c

    cuddly-arm-8412

    05/09/2022, 8:03 AM
    hi,team! i run ./gradlew metadata jobsmce-consumer-job:bootRun --debug-jvm to debug mce-job,but it's been prompting!I don't know if it will succeed
    e
    • 2
    • 3
  • c

    creamy-van-28626

    05/09/2022, 8:29 AM
    Hi team When I am clicking on ingestion tab I am getting this error . I have pinged the error logs from gms pod as well
    o
    b
    • 3
    • 26
  • c

    creamy-van-28626

    05/11/2022, 4:43 PM
    Hi team How to create custom source for ingesting our lineage file ?
    o
    • 2
    • 23
  • a

    agreeable-army-26750

    05/12/2022, 9:16 AM
    Hi guys! I am very new to the project and the project structure! I would like to rewrite a ConfigModel in the metadata-ingestion submodule, specifically I would like to add a new field for BusinessGlossarySourceConfig ingestion configuration. After I have written my modification I would like to redeploy the module, but as I see the datahub cli tool is not affected by my modifications. Is it possible locally to redeploy the cli with the modifications I did? Thank you for your help!
    c
    • 2
    • 4
  • a

    ambitious-exabyte-53451

    05/12/2022, 12:43 PM
    Hi all, https://datahubproject.io/docs/deploy/aws/
    s
    • 2
    • 4
  • c

    chilly-potato-57465

    05/12/2022, 1:12 PM
    Hello everyone! Just deployed DataHub on K8s cluster and now looking into how to install various plugins (I new to both DataHub and K8s). With my local docker deployment I used pip install but I doubt that is the correct approach with the K8s. How should I do that? Thank you in advance!
    s
    o
    s
    • 4
    • 8
  • r

    rapid-book-98432

    05/13/2022, 5:49 AM
    Hi there 🙂 Juste simple question : why do we have excalty in ES ? (SearchIndex & graphIndex) So lineage data, search history and more ?
    b
    • 2
    • 2
  • m

    modern-zoo-97059

    05/16/2022, 12:53 AM
    Can I synchronize mysql and ElasticSearch data? ElasticSearch has been initialized. Index count 0 .... x_x
    l
    • 2
    • 2
  • a

    ancient-apartment-23316

    05/16/2022, 3:12 PM
    Hi, I have a datahub deployed in AWS managed services. Where can I find instructions on how to update it?
    i
    • 2
    • 1
  • p

    proud-table-38689

    05/16/2022, 3:12 PM
    simple question - if I’m using the Airflow lineage backend do I also need connectors for the data sources as recipes stored somewhere?
    • 1
    • 1
  • p

    proud-table-38689

    05/16/2022, 9:27 PM
    is this jar - https://github.com/datahub-project/datahub/blob/master/.github/workflows/publish-datahub-jars.yml#L48 the same as Datahub Metadata service? https://datahubproject.io/docs/metadata-service
    i
    • 2
    • 5
  • r

    rich-policeman-92383

    05/17/2022, 11:13 AM
    In gms, frontend and other env files how can we specify multiple hosts for elasticsearch, kafka etc... https://github.com/datahub-project/datahub/blob/master/docker/datahub-gms/env/docker-without-neo4j.env#L9
    i
    • 2
    • 3
  • g

    gentle-lifeguard-35076

    05/17/2022, 7:18 PM
    Following the DataHub-on-Kubernetes instructions outlined here: https://datahubproject.io/docs/deploy/kubernetes/ Kubernetes is RedHat Openshift on AWS (ROSA), I am using the default prerequisites helm install:
    Copy code
    helm install prerequisites datahub/datahub-prerequisites
    results in prerequisites-cp-schema-registry crashing. Should Kafka be installed by default?? Error log from crashing pod in the replies to this comment....
    i
    • 2
    • 9
  • a

    ambitious-exabyte-53451

    05/18/2022, 11:18 AM
    Does somebody have an easy guide to use AzureAD as identity provider for authentication to datahub?
    m
    • 2
    • 1
  • c

    creamy-van-28626

    05/18/2022, 5:21 PM
    Hi team Can we change the default password for datahub ?
    i
    • 2
    • 2
  • m

    melodic-market-88762

    05/18/2022, 5:36 PM
    for additional context, if I kubectl describe ingress datahub-datahub-frontend, it suggests there something wrong with the certificate?
    i
    • 2
    • 6
  • b

    better-orange-49102

    05/19/2022, 8:25 AM
    Would like to ask the people who already deployed Datahub and are ingesting Glossary terms, is there a single party who maintains the glossary terms or are there multiple parties? If multiple, how do you deconflict and make sure they do not overwrite each other? I'm thinking a single glossary repo where everyone's edits are tracked before committing to Datahub. There will be some workflow logic to check for new terms and create those terms (but ignore existing terms because I intend for users to edit descriptions via UI. And if terms are removed, to delete them.) This is because once i write the information into datahub via REST, I do not know who is the person who edits any information. (Referring to the "createdby" column in the RDBMS store, which currently does not record identity of users if it is not a UI edit)
    m
    • 2
    • 8
  • m

    mammoth-fountain-32989

    05/19/2022, 9:45 AM
    Hi, Want to deploy datahub as production environment in my company, Can we use the quickstart docker setup with all the containers. What are the cons of using it for prod deployment (if any) Can plan for mysql dump as backup periodically to ensure minimal loss of metadata. How to backup/restore/rebuild the search indices Alternatively, Can I have a Postgesdb and Elastic Store outside and rest of the services being used from docker (kafka, front-end etc). If so, how to do the same. Number of datasets/entities would be in the order of few thousands may be. How to handle upgrades with this approach? Can the docker image upgrades be done without losing the ingested data. Planning to have this on different servers as prod and dev instances (to avoid port conflicts). Please suggest. Thanks
    i
    • 2
    • 1
  • c

    creamy-van-28626

    05/19/2022, 12:21 PM
    Hi team, In which use case we will use push based Kafka ?
    i
    • 2
    • 1
  • c

    cool-actor-73767

    05/19/2022, 6:44 PM
    Hello Guys! Does somebody have an easy guide to use LDAP for authentication to datahub? I follow doc https://datahubproject.io/docs/datahub-frontend and changed 'JAAS.CONF' with LDAP parameters, but doesn't work.
    i
    • 2
    • 2
  • b

    brash-sundown-77702

    05/20/2022, 5:42 AM
    Can someone help me plse ?
    i
    • 2
    • 1
  • r

    rich-policeman-92383

    05/20/2022, 10:24 AM
    Datahub frontend build is failing with below error: version: v0.8.35 Any suggestions on this.
    b
    o
    • 3
    • 13
  • c

    creamy-van-28626

    05/22/2022, 11:19 AM
    Hi team What's the difference between module file and module datahub lineage file ? And in which case we prefer one over other ?
    i
    • 2
    • 5
  • r

    rich-policeman-92383

    05/23/2022, 8:59 AM
    Hello Are there any plans for allowing an asset to belong to multiple domains.
    l
    b
    • 3
    • 9
  • c

    creamy-van-28626

    05/24/2022, 8:39 AM
    How can we see hidden dependencies in datahub?
    d
    f
    • 3
    • 8
  • b

    bright-receptionist-94235

    05/24/2022, 9:43 AM
    Hi I want to install datahub for production readiness and not using docker, is it possible?
    d
    • 2
    • 4
  • g

    great-cpu-72376

    05/24/2022, 1:32 PM
    Hi, I am trying to deploy a new datahub instance through docker-compose. But datahub-gms is not working properly in log I found:
    Copy code
    + exec dockerize -wait <http://elasticsearch:9200> -wait-http-header 'Accept: */*' -wait <tcp://mysql:3306> -wait <tcp://broker:29092> -timeout 240s java -Xms1g -Xmx1g -jar /jetty-runner.jar --jar jetty-util.jar --jar jetty-jmx.jar --config /datahub/datahub-gms/scripts/jetty.xml /datahub/datahub-gms/bin/war.war
    2022/05/24 13:28:43 Waiting for: <http://elasticsearch:9200>
    2022/05/24 13:28:43 Waiting for: <tcp://mysql:3306>
    2022/05/24 13:28:43 Waiting for: <tcp://broker:29092>
    2022/05/24 13:28:43 Connected to <tcp://broker:29092>
    2022/05/24 13:28:43 Problem with dial: dial tcp: lookup mysql on 127.0.0.11:53: server misbehaving. Sleeping 1s
    2022/05/24 13:28:43 Received 200 from <http://elasticsearch:9200>
    2022/05/24 13:28:44 Problem with dial: dial tcp: lookup mysql on 127.0.0.11:53: server misbehaving. Sleeping 1s
    .......
    2022/05/24 13:28:57 Problem with dial: dial tcp: lookup mysql on 127.0.0.11:53: server misbehaving. Sleeping 1s
    
    2022/05/24 13:29:11 Problem with dial: dial tcp 10.1.0.17:3306: connect: connection refused. Sleeping 1s
    2022/05/24 13:29:12 Problem with dial: dial tcp 10.1.0.17:3306: connect: connection refused. Sleeping 1s
    2022/05/24 13:29:13 Connected to <tcp://mysql:3306>
    2022-05-24 13:29:13.793:INFO::main: Logging initialized @293ms to org.eclipse.jetty.util.log.StdErrLog
    WARNING: jetty-runner is deprecated.
             See Jetty Documentation for startup options
             <https://www.eclipse.org/jetty/documentation/>
    2022-05-24 13:29:13.823:INFO:oejr.Runner:main: Runner
    2022-05-24 13:29:13.970:INFO:oeju.TypeUtil:main: JVM Runtime does not support Modules
    2022-05-24 13:29:14.105:INFO:oejs.Server:main: jetty-9.4.20.v20190813; built: 2019-08-13T21:28:18.144Z; git: 84700530e645e812b336747464d6fbbf370c9a20; jvm 1.8.0_302-b08
    2022-05-24 13:29:14.595:WARN:oejw.WebAppContext:main: Failed startup of context o.e.j.w.WebAppContext@50cbc42f{/,null,UNAVAILABLE}{file:///datahub/datahub-gms/bin/war.war}
    java.util.zip.ZipException: invalid entry CRC (expected 0xd9b0e036 but got 0x978dbe6a)
            at java.util.zip.ZipInputStream.readEnd(ZipInputStream.java:394)
            at java.util.zip.ZipInputStream.read(ZipInputStream.java:196)
            at java.util.jar.JarInputStream.read(JarInputStream.java:207)
            at org.eclipse.jetty.util.IO.copy(IO.java:172)
            at org.eclipse.jetty.util.IO.copy(IO.java:122)
            at org.eclipse.jetty.util.resource.JarResource.copyTo(JarResource.java:218)
            at org.eclipse.jetty.webapp.WebInfConfiguration.unpack(WebInfConfiguration.java:636)
            at org.eclipse.jetty.webapp.WebInfConfiguration.preConfigure(WebInfConfiguration.java:140)
            at org.eclipse.jetty.webapp.WebAppContext.preConfigure(WebAppContext.java:488)
            at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:523)
            at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
            at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
            at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:117)
            at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:106)
            at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
            at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
            at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:117)
            at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:106)
            at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
            at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
            at org.eclipse.jetty.server.Server.start(Server.java:407)
            at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:110)
            at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:106)
            at org.eclipse.jetty.server.Server.doStart(Server.java:371)
            at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
            at org.eclipse.jetty.runner.Runner.run(Runner.java:520)
            at org.eclipse.jetty.runner.Runner.main(Runner.java:565)
    I am using this compose file:
    Copy code
    networks:
      production-net:
        external: true
    services:
      broker:
        container_name: broker
        depends_on:
        - zookeeper
        environment:
        - KAFKA_BROKER_ID=1
        - KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
        - KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
        - KAFKA_ADVERTISED_LISTENERS=<PLAINTEXT://broker:29092>,PLAINTEXT_<HOST://localhost:9092>
        - KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1
        - KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0
        - KAFKA_HEAP_OPTS=-Xms256m -Xmx256m
        hostname: broker
        networks:
          production-net:
        image: confluentinc/cp-kafka:5.4.0
        ports:
        - 29092:29092
        - 9092:9092
    
      datahub-actions:
        depends_on:
        - datahub-gms
        environment:
        - GMS_HOST=datahub-gms
        - GMS_PORT=8080
        - KAFKA_BOOTSTRAP_SERVER=broker:29092
        - SCHEMA_REGISTRY_URL=<http://schema-registry:8081>
        - METADATA_AUDIT_EVENT_NAME=MetadataAuditEvent_v4
        - METADATA_CHANGE_LOG_VERSIONED_TOPIC_NAME=MetadataChangeLog_Versioned_v1
        - DATAHUB_SYSTEM_CLIENT_ID=__datahub_system
        - DATAHUB_SYSTEM_CLIENT_SECRET=JohnSnowKnowsNothing
        - KAFKA_PROPERTIES_SECURITY_PROTOCOL=PLAINTEXT
        hostname: datahub-actions
        networks:
          production-net:
        image: acryldata/acryl-datahub-actions:head
        restart: on-failure:5
        volumes:
        - ../hosts:/etc/hosts
    
      datahub-frontend-react:
        container_name: datahub-frontend-react
        depends_on:
        - datahub-gms
        environment:
        - DATAHUB_GMS_HOST=datahub-gms
        - DATAHUB_GMS_PORT=8080
        - DATAHUB_SECRET=YouKnowNothing
        - DATAHUB_APP_VERSION=1.0
        - DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB
        - JAVA_OPTS=-Xms512m -Xmx512m -Dhttp.port=9002 -Dconfig.file=datahub-frontend/conf/application.conf
          -Djava.security.auth.login.config=datahub-frontend/conf/jaas.conf -Dlogback.configurationFile=datahub-frontend/conf/logback.xml
          -Dlogback.debug=false -Dpidfile.path=/dev/null
        - KAFKA_BOOTSTRAP_SERVER=broker:29092
        - DATAHUB_TRACKING_TOPIC=DataHubUsageEvent_v1
        - ELASTIC_CLIENT_HOST=elasticsearch
        - ELASTIC_CLIENT_PORT=9200
        hostname: datahub-frontend-react
        networks:
          production-net:
        image: linkedin/datahub-frontend-react:v0.8.34
        ports:
        - 9002:9002
        volumes:
        - ${HOME}/.datahub/plugins:/etc/datahub/plugins
        - ../hosts:/etc/hosts
    
      datahub-gms:
        container_name: datahub-gms
        depends_on:
        - mysql
        environment:
        - DATASET_ENABLE_SCSI=false
        - EBEAN_DATASOURCE_USERNAME=datahub
        - EBEAN_DATASOURCE_PASSWORD=datahub
        - EBEAN_DATASOURCE_HOST=mysql:3306
        - EBEAN_DATASOURCE_URL=jdbc:<mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8>
        - EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver
        - KAFKA_BOOTSTRAP_SERVER=broker:29092
        - KAFKA_SCHEMAREGISTRY_URL=<http://schema-registry:8081>
        - ELASTICSEARCH_HOST=elasticsearch
        - ELASTICSEARCH_PORT=9200
        - GRAPH_SERVICE_IMPL=elasticsearch
        - JAVA_OPTS=-Xms1g -Xmx1g
        - ENTITY_REGISTRY_CONFIG_PATH=/datahub/datahub-gms/resources/entity-registry.yml
        - MAE_CONSUMER_ENABLED=true
        - MCE_CONSUMER_ENABLED=true
        - PE_CONSUMER_ENABLED=true
        - UI_INGESTION_ENABLED=true
        - UI_INGESTION_DEFAULT_CLI_VERSION=0.8.32.1
        hostname: datahub-gms
        networks:
          production-net:
        image: linkedin/datahub-gms:v0.8.34
        ports:
        - 9090:8080
        volumes:
        - ${HOME}/.datahub/plugins:/etc/datahub/plugins
        - ../hosts:/etc/hosts
    
      elasticsearch:
        container_name: elasticsearch
        environment:
        - discovery.type=single-node
        - xpack.security.enabled=false
        - ES_JAVA_OPTS=-Xms256m -Xmx256m -Dlog4j2.formatMsgNoLookups=true
        healthcheck:
          retries: 4
          start_period: 2m
          test:
          - CMD-SHELL
          - curl -sS --fail '<http://localhost:9200/_cluster/health?wait_for_status=yellow&timeout=0s>'
            || exit 1
        hostname: elasticsearch
        networks:
          production-net:
        image: elasticsearch:7.9.3
        mem_limit: 1g
        ports:
        - 9200:9200
        volumes:
        - esdata:/usr/share/elasticsearch/data
        - ../hosts:/etc/hosts
    
      elasticsearch-setup:
        container_name: elasticsearch-setup
        depends_on:
        - elasticsearch
        environment:
        - ELASTICSEARCH_HOST=elasticsearch
        - ELASTICSEARCH_PORT=9200
        - ELASTICSEARCH_PROTOCOL=http
        hostname: elasticsearch-setup
        networks:
          production-net:
        image: linkedin/datahub-elasticsearch-setup:v0.8.34
    
      kafka-setup:
        container_name: kafka-setup
        depends_on:
        - broker
        - schema-registry
        environment:
        - KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
        - KAFKA_BOOTSTRAP_SERVER=broker:29092
        hostname: kafka-setup
        networks:
          production-net:
        image: linkedin/datahub-kafka-setup:v0.8.34
    
      mysql:
        command: --character-set-server=utf8mb4 --collation-server=utf8mb4_bin
        container_name: mysql
        environment:
        - MYSQL_DATABASE=datahub
        - MYSQL_USER=datahub
        - MYSQL_PASSWORD=datahub
        - MYSQL_ROOT_PASSWORD=datahub
        hostname: mysql
        networks:
          production-net:
        image: mysql:5.7
        ports:
        - 3306:3306
        volumes:
        - ../mysql/init.sql:/docker-entrypoint-initdb.d/init.sql
        - mysqldata:/var/lib/mysql
        - ../hosts:/etc/hosts
    
      mysql-setup:
        container_name: mysql-setup
        depends_on:
        - mysql
        environment:
        - MYSQL_HOST=mysql
        - MYSQL_PORT=3306
        - MYSQL_USERNAME=datahub
        - MYSQL_PASSWORD=datahub
        - DATAHUB_DB_NAME=datahub
        hostname: mysql-setup
        networks:
          production-net:
        image: acryldata/datahub-mysql-setup:v0.8.34.1
    
      schema-registry:
        container_name: schema-registry
        depends_on:
        - zookeeper
        - broker
        environment:
        - SCHEMA_REGISTRY_HOST_NAME=schemaregistry
        - SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL=zookeeper:2181
        hostname: schema-registry
        networks:
          production-net:
        image: confluentinc/cp-schema-registry:5.4.0
        ports:
        - 8081:8081
    
      zookeeper:
        container_name: zookeeper
        environment:
        - ZOOKEEPER_CLIENT_PORT=2181
        - ZOOKEEPER_TICK_TIME=2000
        hostname: zookeeper
        networks:
          production-net:
        image: confluentinc/cp-zookeeper:5.4.0
        ports:
        - 2181:2181
        volumes:
        - ./zkdata:/var/opt/zookeeper
        - ../hosts:/etc/hosts
    
    version: '2.3'
    volumes:
      esdata: null
      mysqldata: null
    Have you ever seen this error?
    l
    • 2
    • 2
  • c

    cool-actor-73767

    05/24/2022, 1:34 PM
    Hi team! Can someone help me how to run datahub with ldap authentication using jaas? I wrote jaas.conf and deploy datahub-frontend locally but doens't work.
    i
    b
    f
    • 4
    • 11
1...121314...53Latest