https://datahubproject.io logo
Join Slack
Powered by
# all-things-deployment
  • h

    handsome-football-66174

    11/22/2021, 10:34 PM
    Quick Question - I am trying to upgrade to v0.8.17 Getting the following error -
    e
    • 2
    • 5
  • a

    agreeable-hamburger-38305

    11/23/2021, 1:27 AM
    Hi! I set up the ingestion cronjob on kubernetes, but when I run a job from the cronjob this is what I get. Not sure if it’s because I haven’t added the equivalent of the plugin mentioned here https://datahubproject.io/docs/metadata-ingestion/source_docs/bigquery#setup, but I also kinda dont know how
    Copy code
    [2021-11-23 00:56:07,285] INFO     {datahub.cli.ingest_cli:57} - Starting metadata ingestion
    
    /usr/local/lib/python3.8/site-packages/google/cloud/bigquery/client.py:513: UserWarning: Cannot create BigQuery Storage client, the dependency google-cloud-bigquery-storage is not installed.
      warnings.warn(
    [2021-11-23 01:00:09,074] INFO     {datahub.cli.ingest_cli:59} - Finished metadata ingestion
    Source (bigquery) report:
    {'failures': {}, 'filtered': [], 'tables_scanned': 0, 'views_scanned': 0, 'warnings': {}, 'workunit_ids': [], 'workunits_produced': 0}
    Sink (datahub-rest) report:
    {'downstream_end_time': None,
     'downstream_start_time': None,
     'downstream_total_latency_in_seconds': None,
     'failures': [],
     'records_written': 0,
     'warnings': []}
    
    Pipeline finished successfully
    g
    • 2
    • 2
  • m

    millions-notebook-72121

    11/23/2021, 3:27 PM
    Hi - I'm trying to deploy Datahub on AWS and followed the steps to switch the schema registry to AWS glue. Once I do that, I see the following in the GMS logs. Is this expected? I'm wondering if it is related to this? https://github.com/linkedin/datahub/issues/3373
    b
    e
    • 3
    • 2
  • b

    broad-sandwich-74544

    11/24/2021, 11:25 AM
    Hi! We have been exploring datahub for quite sometime now. We integrated it with Okta as the Identity Provider using this document - https://datahubproject.io/docs/how/auth/sso/configure-oidc-react-okta/. The integration seems to be working perfectly. However, there are a few concerns. 1. How does role-mapping from Okta to Datahub happen? a. As soon as we completed the integration, the previous functionality of logging in using username/password disappears. Hence, we are unable to use the admin user anymore. How do we mark users as admin/writer/reader in Datahub when authenticating from a third party IdP. 2. How do we do token based authentication for APIs instead of username/password based authentication? a. Since we integrated it with Okta, hence the usernames/passwords don't exist anymore. Is there any option to generate API tokens for ingesting lineage? Couldn't find an option in the UI. cc: @powerful-telephone-71997
    s
    b
    b
    • 4
    • 5
  • p

    plain-farmer-27314

    11/24/2021, 2:30 PM
    Hi all - I'm trying to update our helm deployment to the latest release so everyone can checkout the new UI. I ran the below commands but am still not seeing the UI updated:
    Copy code
    helm repo update
    Hang tight while we grab the latest from your chart repositories...
    ...Successfully got an update from the "datahub" chart repository
    Update Complete. ⎈Happy Helming!⎈
    Copy code
    helm upgrade datahub datahub/datahub --namespace=datahub --values values.yaml
    W1124 09:17:37.881889   30156 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
    W1124 09:17:38.022348   30156 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
    W1124 09:17:38.200681   30156 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
    W1124 09:17:38.365292   30156 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
    W1124 09:17:38.505592   30156 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
    W1124 09:17:38.675228   30156 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
    Release "datahub" has been upgraded. Happy Helming!
    NAME: datahub
    LAST DEPLOYED: Wed Nov 24 09:15:46 2021
    NAMESPACE: datahub
    STATUS: deployed
    REVISION: 4
    Are there any further steps that need to be taken? Thanks!
    e
    • 2
    • 19
  • n

    nice-planet-17111

    11/25/2021, 12:58 AM
    Hello, i'm edit some values in datahub prerequisites helm chart template - but i don't see any template in the github repository. 😞 Does anyone know where can i find the pre-reqs templates?
    m
    • 2
    • 1
  • a

    aloof-forest-55926

    11/25/2021, 6:40 AM
    Hello everyone, i'm new in datahub, is there any doc on how to install & setup datahub
    b
    • 2
    • 2
  • s

    some-cricket-23089

    11/25/2021, 8:40 AM
    Quick Question- I want to ingest the Glossary Terms in datahub. Do we have any API and curl command to ingest them into.
    d
    • 2
    • 25
  • q

    quaint-lighter-4354

    11/30/2021, 12:36 PM
    Hi guys, I was wondering if there is any way to use a password for all the GMS api requests. There used to be an env variable into the GMS deployment, DATAHUB_SECRET, but it's not there anymore. Am I missing something or isn't this implemented? Regards
    plus1 1
    b
    s
    • 3
    • 5
  • c

    curved-jordan-15657

    12/02/2021, 11:30 AM
    Hello team! I have a problem from argocd side. I was upgraded datahub to v0.8.17 about 10 days ago. Suddenly i saw that app is Degraded because of gms container can’t pull the image. In the logs i saw below error:
    Copy code
    container "datahub-gms" in pod "datahub-dev-datahub-gms-57594565c8-887jj" is waiting to start: trying and failing to pull image
    And also not only in gms image, kafka and mysql pods also have some errors like:
    Copy code
    Failed to pull image "acryldata/datahub-mysql-setup:v0.8.17.0": rpc error: code = Unknown desc = Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: <https://www.docker.com/increase-rate-limit>
    I don’t really know how to resolve the issue.
    i
    • 2
    • 1
  • e

    early-lawyer-39982

    12/03/2021, 6:22 AM
    Hello! I have a question. We are thinking about collecting both prod and dev data sets from datahub. Currently we only have a dev environment, and when extended to a prod environment, we assume that we will keep the same dataset in all environments. However, most companies separate their datasets by environment. Why are the datasets different for each environment in general?
    s
    • 2
    • 1
  • a

    aloof-forest-55926

    12/06/2021, 10:04 AM
    hello, I'm traying to find how to implement BrickSchema on datahub?
    s
    b
    • 3
    • 5
  • b

    billions-receptionist-60247

    12/07/2021, 8:39 PM
    Hi i'm trying to deploy datahub on local kubernetes. I'm getting this error in elastic-search setup pod
    Copy code
    kubectl logs -f pods/datahub-elasticsearch-setup-job-x4j48
    2021/12/07 20:07:48 Waiting for: <http://elasticsearch-master:9200>
    2021/12/07 20:07:48 Problem with request: Get <http://elasticsearch-master:9200>: dial tcp 10.97.31.240:9200: connect: connection refused. Sleeping 1s
    2021/12/07 20:07:49 Problem with request: Get <http://elasticsearch-master:9200>: dial tcp 10.97.31.240:9200: connect: connection refused. Sleeping 1s
    2021/12/07 20:07:50 Problem with request: Get <http://elasticsearch-master:9200>: dial tcp 10.97.31.240:9200: connect: connection refused. Sleeping 1s
    2021/12/07 20:07:51 Problem with request: Get <http://elasticsearch-master:9200>: dial tcp 10.97.31.240:9200: connect: connection refused. Sleeping 1s
    2021/12/07 20:07:52 Problem with request: Get <http://elasticsearch-master:9200>: dial tcp 10.97.31.240:9200: connect: connection refused. Sleeping 1s
    2021/12/07 20:07:53 Problem with request: Get <http://elasticsearch-master:9200>: dial tcp 10.97.31.240:9200: connect: connection refused. Sleeping 1s
    e
    • 2
    • 12
  • a

    ambitious-cartoon-15344

    12/09/2021, 2:18 AM
    hello,How to create Glossary Terms?
    e
    • 2
    • 2
  • c

    calm-airplane-47634

    12/09/2021, 6:05 AM
    Hi folks, trying out datahub on gcp and got thinking. What is Kafka used for exactly ? More importantly I would like things to be easy so doing a thought experiment about replacing it with pubsub or better yet a interface so that it's swappable. Anyone else has thoughts on this?
    m
    • 2
    • 1
  • n

    nice-country-99675

    12/12/2021, 11:37 PM
    👋 Hi team! I'm using DataHub deployed using the Helm chart... I would like to try out the new version, v0.8.18. Is there any guide to upgrade? Or we can just use the latests charts? Thanks!
    e
    b
    s
    • 4
    • 9
  • o

    orange-flag-48535

    12/13/2021, 6:30 AM
    is the Datahub docker app available as a Maven dependency? I've used Postgres in this way via TestContainers.org and it is very convenient for integration testing.
    s
    b
    • 3
    • 5
  • a

    aloof-airline-3441

    12/13/2021, 4:42 PM
    When running the mysql setup job on RDS V8 has anyone run into this?
    b
    • 2
    • 2
  • b

    best-planet-6756

    12/13/2021, 9:12 PM
    Hi All, I am trying to setup nginx so that I can route port 9002 to 443 with my ssl cert. Here are my nginx config and docker block:
    Copy code
    events {}
    http {
    server {
        listen 443 ssl;
        ssl_certificate /etc/nginx/certs/certificate.crt;
        ssl_certificate_key /etc/nginx/certs/server.key;
        location / {
          proxy_pass <http://datahub-frontend-react:9002>;
          proxy_set_header   Host $host;
          proxy_set_header   X-Real-IP $remote_addr;
          proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
          proxy_set_header   X-Forwarded-Host $server_name;
          proxy_set_header   X-Forwarded-Proto https;
      }
    }
    }
    Copy code
    nginx:
        image: nginx:latest
        container_name: nginx
        volumes:
          - ../nginx.conf:/etc/nginx/nginx.conf
          - ../certs:/etc/nginx/certs
        ports:
          - 443:443
        depends_on:
          - datahub-frontend-react
    Im getting this error when I go to the url, which shows what I believe is a datahub error. Has anyone faced this when using nginx to serve the page on port 443?
    m
    b
    • 3
    • 22
  • b

    billions-receptionist-60247

    12/14/2021, 7:20 AM
    Hi i'm trying to deploy datahub on kubernetes. i have setup kafka,mysql on external system. Can someone tell what values i have to change to point it to my external kafka and mysql
    m
    w
    • 3
    • 2
  • a

    acceptable-architect-70237

    12/14/2021, 5:31 PM
    Hi team, is there anyway to specify
    prefix
    for ES index names? We have the cases that the company requires a certain
    prefix
    for ES index name. It seems I can change the code to do that, but ask if it's a configuration option.
    b
    e
    • 3
    • 3
  • c

    calm-airplane-47634

    12/16/2021, 4:42 PM
    Hi folks, maybe this is more of a deployment related question.
    s
    l
    • 3
    • 8
  • n

    nutritious-bird-77396

    12/16/2021, 6:34 PM
    Team...I am working on deploying datahub using AWS Stack - AWS OpenSearch, AWS MSK, RDS. I am having some challenges in connecting to MSK. MSK Cluster has TLS Encryption enabled. Do i have to set all these variables in config?
    Copy code
    # KAFKA_PROPERTIES_SECURITY_PROTOCOL=SSL
    # KAFKA_PROPERTIES_SSL_KEYSTORE_LOCATION=
    # KAFKA_PROPERTIES_SSL_KEYSTORE_PASSWORD=
    # KAFKA_PROPERTIES_SSL_KEY_PASSWORD=
    # KAFKA_PROPERTIES_SSL_TRUSTSTORE_LOCATION=
    # KAFKA_PROPERTIES_SSL_TRUSTSTORE_PASSWORD=
    # KAFKA_PROPERTIES_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM=
    Are there any options for these to be set thru MSK IAM jar? I have seen quite a few ppl have made MSK work..trying to understand this a little better....
    teamwork 2
    o
    f
    +3
    • 6
    • 22
  • p

    plain-farmer-27314

    12/16/2021, 7:46 PM
    Hey all! Wondering if anyone could provide an example of how to set vars like below in the datahub helm chart:
    Copy code
    AUTH_OIDC_ENABLED=true
    AUTH_OIDC_CLIENT_ID=your-client-id
    AUTH_OIDC_CLIENT_SECRET=your-client-secret
    AUTH_OIDC_DISCOVERY_URI=<https://your-okta-domain.com/.well-known/openid-configuration>
    AUTH_OIDC_BASE_URL=your-datahub-url
    AUTH_OIDC_SCOPE="openid profile email groups"
    b
    • 2
    • 3
  • p

    plain-farmer-27314

    12/16/2021, 9:31 PM
    Follow up on the above - what is the ideal way to configure certain users as administrators?
    b
    • 2
    • 4
  • b

    billions-receptionist-60247

    12/17/2021, 6:16 AM
    Hi getting this error in datahub-elasticsearch-setup-job
    Copy code
    Timeout after 2m0s waiting on dependencies to become available: [<http://elasticsearch-master:9200>]
    i'm able to curl elastic search on port forward
    Copy code
    ->curl localhost:9200
    
    {
      "name" : "elasticsearch-master-0",
      "cluster_name" : "elasticsearch",
      "cluster_uuid" : "s-2yYjHTTgy4c9p2I6w0Ug",
      "version" : {
        "number" : "7.9.3",
        "build_flavor" : "default",
        "build_type" : "docker",
        "build_hash" : "c4138e51121ef06a6404866cddc601906fe5c868",
        "build_date" : "2020-10-16T13:34:25.304557Z",
        "build_snapshot" : false,
        "lucene_version" : "8.6.2",
        "minimum_wire_compatibility_version" : "6.8.0",
        "minimum_index_compatibility_version" : "6.0.0-beta1"
      },
      "tagline" : "You Know, for Search"
    }
    s
    i
    b
    • 4
    • 18
  • b

    bland-orange-13353

    12/23/2021, 6:39 AM
    This message was deleted.
    p
    b
    +2
    • 5
    • 6
  • b

    billions-receptionist-60247

    12/27/2021, 5:15 AM
    Hi i'm deploying datahub on kubernetes. Datahub gms and frontend are failing. Logs of gms:
    Copy code
    2021-12-27 04:59:13.599:INFO::main: Logging initialized @4991ms to org.eclipse.jetty.util.log.StdErrLog
    WARNING: jetty-runner is deprecated.
             See Jetty Documentation for startup options
             <https://www.eclipse.org/jetty/documentation/>
    2021-12-27 04:59:14.192:INFO:oejr.Runner:main: Runner
    2021-12-27 04:59:16.986:INFO:oejs.Server:main: jetty-9.4.20.v20190813; built: 2019-08-13T21:28:18.144Z; git: 84700530e645e812b336747464d6fbbf370c9a20; jvm 1.8.0_302-b08
    2021-12-27 05:00:03.685:INFO:oeju.TypeUtil:main: JVM Runtime does not support Modules
    2021-12-27 05:00:13.284:WARN:oeja.AnnotationParser:main: Unrecognized ASM version, assuming ASM7
    2021/12/27 05:01:14 Command exited with error: signal: killed
    s
    • 2
    • 19
  • b

    breezy-camera-11182

    01/04/2022, 5:37 AM
    in case it was posted on wrong channel
    e
    l
    • 3
    • 10
  • a

    adamant-sugar-28445

    01/04/2022, 2:05 PM
    Hi everyone. I want to ingest metada from airflow into datahub. In my airflow py. file,  I gave inlets and outlets and configured the connection to datahub (as guided in https://datahubproject.io/docs/lineage/airflow). The inlets and outlets are about HDFS and they looked like outlets={"datasets": [Dataset("hdfs", "/general/project1/folder1/file1.parquet")]}. The problem is file1's schema didn't show up in the datahub UI, and I saw the whole path rather the file alone in the UI. Can anyone tell me what's the cause here?
    h
    m
    • 3
    • 5
1...567...53Latest