https://datahubproject.io logo
Join Slack
Powered by
# all-things-deployment
  • b

    brave-secretary-27487

    02/11/2022, 2:42 PM
    I'm trying to apply authentication on the gms service following this guide(https://datahubproject.io/docs/introducing-metadata-service-authentication/). I want to apply the changes with helm. My values.yaml looks like
    Copy code
    datahub-frontend:
      ingress:
        enabled: false 
        annotations:
          <http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: traefik
        hosts:  
          - host : "url"
            paths: ['/']
    
      datahub:
        metadata_service_authentication:
          enabled: true
      
    datahub-gms:
      datahub:
        metadata_service_authentication:
          enabled: true
    
    global:
        metadata_service_authentication:
          enabled: true
          systemClientId: "__datahub_system"
          systemClientSecret:
            secretRef: "datahub-auth-secrets"
            secretKey: "token_service_signing_key"
          tokenService:
            signingKey:
              secretRef: "datahub-auth-secrets"
              secretKey: "token_service_signing_key"
          # Set to false if you'd like to provide your own auth secrets
          provisionSecrets: true
    Should I enable metadata_service_authentication in both front-end, gms and global? And where can I find the
    systemClientId
    ? I tried the setup without the global tag and this didn't work. So i assume the
    metadata_service_authentication
    should also be enabled in global?
    i
    • 2
    • 1
  • a

    agreeable-plastic-37919

    02/11/2022, 7:12 PM
    We integrated with Okta, but then how can I log out to login to the admin account? Every time I logout, the page redirects to Okta and automatically relogs me back in?
    i
    h
    p
    • 4
    • 16
  • b

    better-orange-49102

    02/14/2022, 10:06 AM
    im not using helm, for datahub-upgrade, I built the container based on the dockerfile. I tried to run datahub-upgrade as a deployment(not job, i know its not the correct form) but just wanted to debug it easier (env is in the thread) I keep seeing "error creating bean with name configEntityRegistry defined in class path resource......... nested exception is java.io.FilenotFound: ../../metadata-models/src/main/resources/entity-registry.yml" I got the feeling that the pod is expecting to find entity-registry.yml somewhere.... how should i define this for a non-helm k8s deployment? EDIT: added the entity-registry.yml as a volume mount works
    e
    • 2
    • 5
  • t

    tall-queen-61078

    02/14/2022, 11:13 AM
    Hey, I am trying to build frontend on my local laptop (mac) and getting following errors:
    Copy code
    executor failed running [/bin/sh -c cd datahub-src && ./gradlew :datahub-frontend:dist -PenableEmber=${ENABLE_EMBER} -PuseSystemNode=${USE_SYSTEM_NODE} -x test -x yarnTest -x yarnLint     && cp datahub-frontend/build/distributions/datahub-frontend.zip ../datahub-frontend.zip     && cd .. && rm -rf datahub-src && unzip datahub-frontend.zip]: exit code: 137
    Command which I am trying to run is following:
    Copy code
    docker build -t datahub_frontend -f ./docker/datahub-frontend/Dockerfile .
    • 1
    • 1
  • b

    better-orange-49102

    02/15/2022, 3:12 AM
    how is the RestoreBackup in the datahub-upgrade meant to be used? I assume that it would be something like
    Copy code
    java -jar datahub-upgrade.jar -u RestoreBackup
    but what would be the arguments after that? something something -a dump_file? What format is expected? Seems like parquet? running the command as-is resulted my DB being wiped 😅. Also see ES indices being cleared except for metadata_service_v1 and datasetprofileaspect_v1
    e
    s
    • 3
    • 3
  • w

    wonderful-jordan-36532

    02/15/2022, 2:58 PM
    Kubernetes' deployment creates two (front-end and gms) internet-facing load-balancers. How can the load-balancers be created internal? Changing the front-end ingress annotation at https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/values.yaml to
    <http://alb.ingress.kubernetes.io/scheme|alb.ingress.kubernetes.io/scheme>: internal
    is not sufficient. Does it require changing of service type from load_balancer to nodeport in the helm charts?
    plus1 1
    e
    d
    • 3
    • 5
  • b

    billions-receptionist-60247

    02/16/2022, 7:27 PM
    Hi I'm deploying datahub on kubernetes. Deployment is getting stuck at this. Can someone help me with this
    l
    e
    • 3
    • 13
  • a

    alert-teacher-6920

    02/16/2022, 8:33 PM
    Are there any compose files that show how to standup Data Hub without building it, specifically just using the images on Docker Hub? Is Kafka actually required if not actually using Kafka to inject metadata? Are both Elasticsearch and MySQL required? Trying to set up the bare minimum to test a custom Java emitter that uses a RestEmitter, not a Kafka one, and I’d also specifically like to be able to see the entity in the UI.
    o
    • 2
    • 6
  • g

    gorgeous-optician-32034

    02/17/2022, 4:26 PM
    Quick question on database init scripts. I see two relevant for us, one for MySQL and one for MariaDB. The MariaDB one seems to just not create the database or the
    metadata_index
    table. Am I right that it's just slightly out of date? Or is there some reason MariaDB actually doesn't need that?
    b
    • 2
    • 1
  • a

    acceptable-architect-70237

    02/17/2022, 8:57 PM
    For whoever might be interested, I wrote two blogs about deploying OS Datahub without K8s. Basically, you will only need to deploy the
    datahub-frontend-react
    and
    datahub-gms
    services. You can find it here. https://liangjunjiang.medium.com/deploy-open-source-datahub-fd597104512b
    plus1 2
    🙌 1
    đŸ€© 5
    l
    • 2
    • 1
  • h

    high-hospital-85984

    02/18/2022, 11:12 AM
    We been getting
    java.lang.OutOfMemoryError: Java heap space
    when loading a particularly large entity (large schema, with descriptions and tags), and we realized we haven't touched the
    JAVA_OPTS
    . As a complete Java beginner, are there any rule of thumb or recommendations for setting the heap size?
    s
    • 2
    • 10
  • c

    careful-insurance-60247

    02/18/2022, 4:39 PM
    I have setup the AWS ALB for k8 but now I want to remove it from my setup. Whats the best way to do that?
    e
    • 2
    • 40
  • b

    better-orange-49102

    02/21/2022, 1:28 PM
    if we're not using UI ingestion, is there any purpose for the datahub-action container? What does the container do, actually?
    s
    m
    • 3
    • 7
  • b

    bland-barista-59197

    02/22/2022, 11:09 PM
    Hello Team I’m exploring Rest.li. looking for way to get schema’s all tables and there columns. any advice is appreciated e.g
    Copy code
    curl --location --request POST '<http://localhost:8080/entities?action=list>' \
    --header 'Content-Type: application/json' \
    --data-raw '{
        "entity": "dataset",
        "filter": {
            "criteria": [
                {
                    "condition": "CONTAIN",
                    "field": "name",
                    "value": "hive"
                }
            ]
        },
        "start": 0,
        "count": 10000
    }'
    e
    • 2
    • 15
  • b

    better-orange-49102

    02/23/2022, 6:05 AM
    i built the datahub-frontend image internally inside a corporate env, now the version of Datahub shows up as "null" when you mouse over top right corner. I'm ok to not show the version number, so where can we get rid of the "null"?
    e
    • 2
    • 8
  • n

    numerous-camera-74294

    02/23/2022, 2:23 PM
    Hi folks! I have my datahub running in EKS with an ELB for the frontend, and I would love expose behind that same ELB the gms aswell, is there any way to do so?
    e
    t
    • 3
    • 23
  • a

    able-rain-74449

    02/23/2022, 3:32 PM
    👋 Hello, team!
    teamwork 1
    b
    e
    • 3
    • 4
  • b

    better-orange-49102

    02/28/2022, 8:23 AM
    does anyone have any issue running multiple datahub-frontend-react or gms pods (with oidc) in k8s?
    g
    b
    • 3
    • 8
  • a

    ancient-pharmacist-31624

    03/01/2022, 1:36 PM
    Hi Peeps! I have been learning how to deploy datahub and settled on using docker on a single VM for a PoC. I reviewed
    docker-compose.yml
    and set some volume mounts (AuthN) and the var
    DATAHUB_VERSION=v0.8.26
    as I noted with previous releases that sometimes it does not come up so allowed me to control when to go to new version etc. I normally run
    docker-compose up -d
    but today I get...
    datahub-actions_1     | 2022/03/01 13:28:16 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.27.0.8:8080: connect: connection refused. Sleeping 1s
    datahub-gms        | 2022/03/01 13:28:16 Problem with dial: dial tcp: lookup broker on 127.0.0.11:53: server misbehaving. Sleeping 1s
    datahub-actions_1     | 2022/03/01 13:28:17 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.27.0.8:8080: connect: connection refused. Sleeping 1s
    Can anyone share tips how to debug pls? Much appreciated.
    e
    • 2
    • 1
  • a

    adamant-magazine-62649

    03/01/2022, 4:06 PM
    Hi, I am trying to deploy datahub to a docker container using the docker pull linkedin/datahub-ingestion command however, I am receiving an error stating the manifest is missing / unknown. I am using this url: https://hub.docker.com/r/linkedin/datahub-ingestion Anyone got any insights / ideas? thanks in advance :)
    l
    e
    • 3
    • 3
  • r

    ripe-sunset-20897

    03/02/2022, 3:19 AM
    Hi Peeps! I want to deploy a Datahub using helm charts and using Google to logn using OIDC, i found a similar question here. How can we set a certain user to become a certain role using Google Auth Consent ?, and how to assign a person to a certain group ?.
    g
    b
    a
    • 4
    • 5
  • m

    magnificent-hospital-52323

    03/02/2022, 10:10 AM
    Hi all! I'm trying to setup a django application to serve as the OIDC provider for DataHub (using https://django-oidc-provider.readthedocs.io/en/latest/), with both applications running through the same docker-compose YML file. I'm allowing my django containers to be on the same docker network as DataHub. However, whenever I try to log into DataHub, the redirect to django fails. I tried allowing containers to access localhost for testing (by adding
    extra_hosts: "host.docker.internal:host-gateway"
    to the containers and adding the localhost=host.docker.internal alias in my /etc/hosts) but the redirect still fails saying "Connection Refused". Any idea what I might be doing wrong? Alternatively, is there a better way of achieving the same thing? Here's what my frontend container looks like:
    Copy code
    datahub-frontend-react:
        container_name: datahub-frontend-react
        depends_on:
          - datahub-gms
        environment:
          - DATAHUB_GMS_HOST=datahub-gms
          - DATAHUB_GMS_PORT=8080
          - DATAHUB_SECRET=YouKnowNothing
          - DATAHUB_APP_VERSION=1.0
          - DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB
          - JAVA_OPTS=-Xms512m -Xmx512m -Dhttp.port=9002 -Dconfig.file=datahub-frontend/conf/application.conf
            -Djava.security.auth.login.config=datahub-frontend/conf/jaas.conf -Dlogback.configurationFile=datahub-frontend/conf/logback.xml
            -Dlogback.debug=false -Dpidfile.path=/dev/null
          - KAFKA_BOOTSTRAP_SERVER=broker:29092
          - DATAHUB_TRACKING_TOPIC=DataHubUsageEvent_v1
          - ELASTIC_CLIENT_HOST=elasticsearch
          - ELASTIC_CLIENT_PORT=9200
          - AUTH_OIDC_ENABLED=true
          - AUTH_OIDC_CLIENT_ID=some-client-id
          - AUTH_OIDC_CLIENT_SECRET=some-client-secret
          - AUTH_OIDC_DISCOVERY_URI=<http://localhost:8000/openid/.well-known/openid-configuration/>
          - AUTH_OIDC_BASE_URL=<http://localhost:9002>
        hostname: datahub
        image: linkedin/datahub-frontend-react:${DATAHUB_VERSION:-head}
        ports:
          - 9002:9002
        extra_hosts:
          - "host.docker.internal:host-gateway"
    And here's the error present in the docker logs:
    Copy code
    datahub-frontend-react    | 09:53:02 [application-akka.actor.default-dispatcher-22] ERROR application - 
    datahub-frontend-react    | 
    datahub-frontend-react    | ! @7mp05igbp - Internal server error, for (GET) [/authenticate?redirect_uri=%2F] ->
    datahub-frontend-react    |  
    datahub-frontend-react    | play.api.UnexpectedException: Unexpected exception[TechnicalException: java.net.ConnectException: Connection refused (Connection refused)]
    n
    b
    n
    • 4
    • 14
  • m

    mysterious-portugal-30527

    03/03/2022, 6:41 PM
    Hello All! Performing due diligence, using Amazon Container Services, scanning the docker images provided by Datahub identified 2 critical and 21 high vulnerabilities along with 79 medium & 26 lows, 175 informational and 25 undefined. Thoughts? Comments? Is this on anybodies radar / roadmap?
    l
    e
    l
    • 4
    • 12
  • e

    elegant-traffic-96321

    03/03/2022, 9:45 PM
    hello! My cleanup job seems to be running into an error where it’s trying to cleanup some neo4j stuff (we’re running elastic search). Here’s the error we’re running into:
    Copy code
    Failed to delete legacy data from graph: java.lang.ClassCastException: com.linkedin.metadata.graph.elastic.ElasticSearchGraphService cannot be cast to com.linkedin.metadata.graph.Neo4jGraphService
    Failed to delete legacy data from graph: java.lang.ClassCastException: com.linkedin.metadata.graph.elastic.ElasticSearchGraphService cannot be cast to com.linkedin.metadata.graph.Neo4jGraphService
    Failed Step 3/4: DeleteLegacyGraphRelationshipStep. Failed after 1 retries.
    g
    b
    p
    • 4
    • 6
  • e

    elegant-traffic-96321

    03/03/2022, 10:03 PM
    Also the acryl-actions container is yelling about the schema_registry_url. It seems it requires a kafka schema registry url, but we’re using MSK and glue as supported by this doc here: https://datahubproject.io/docs/deploy/aws/
    e
    b
    • 3
    • 7
  • b

    bored-dress-52175

    03/07/2022, 1:13 PM
    I have deployed datahub in kubernetes cluster. But it is not showing stats and queries, how do I enable it?
    s
    n
    • 3
    • 4
  • q

    quiet-pilot-28237

    03/08/2022, 4:57 AM
    I got this error
    kubelet Error: secret "datahub-encryption-secrets" not found
    e
    • 2
    • 18
  • s

    some-pizza-26257

    03/08/2022, 7:02 AM
    Hi all, Can anyone provide any insights into how scalable DataHub is?
    g
    • 2
    • 3
  • l

    lively-jackal-83760

    03/08/2022, 10:17 AM
    Hi guys. Question - is it possible to create a policy that will affect all datasets with some tag or business term?
    s
    • 2
    • 3
  • m

    most-nightfall-36645

    03/08/2022, 11:52 AM
    Hi, I would like to deploy datahub without kafka. I just want to use the restful gms-service for ingesting metadata. Can I disable kafka by simply setting
    kafka.enabled
    to false in the prerequisite chart?
    b
    b
    • 3
    • 4
1...8910...53Latest