https://datahubproject.io logo
Join Slack
Powered by
# all-things-deployment
  • r

    rapid-sundown-8805

    10/04/2021, 9:07 AM
    @some-cricket-23089 @some-glass-26087 Please use threads 🙏
    ☝️ 1
    👍 2
    thanks ewe 1
    s
    • 2
    • 1
  • s

    some-cricket-23089

    10/04/2021, 11:15 PM
    Hi Team , I have done some org specific ui changes in datahub-frontend and wants to move all the UI changes in org private network . Now because that network is private not able to build docker image. So to build docker image is there any other way than setting up the HTTP_Proxy , Pl suggest ?
    s
    • 2
    • 3
  • r

    red-pizza-28006

    10/05/2021, 12:32 PM
    HI everyone, i recently started looking into datahub and deploying this via kubernetes, and wanted to get your opinion/reviews on it. There are quite a bit stateful dependencies here (mysql/es/kafka etc.) and whether it is a good idea to run this within k8
    b
    s
    • 3
    • 6
  • h

    handsome-football-66174

    10/05/2021, 5:42 PM
    Question - Deployed Datahub via Helm charts on EKS Cluster - Do we need to expose gms also via ingress ? How do we do Ingestion ?
    e
    a
    • 3
    • 35
  • k

    kind-dawn-17532

    10/05/2021, 7:00 PM
    Hi All, If you run DataHub in cloud, would you please share ballpark cost incurred per month or per year... I am looking to get a rough idea cloud deployed DataHub cost to help prepare my business case.. containerized deployments are also ok.. Thanks in advance!
    r
    • 2
    • 3
  • h

    handsome-football-66174

    10/08/2021, 1:52 PM
    Question - Deployed Datahub via Helm charts on EKS Cluster -Getting the following error when clicking on Analytics on top right
    Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [<hostname>:443], URI [/demo_datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
    Copy code
    {
      "error": {
        "root_cause": [
          {
            "type": "illegal_argument_exception",
            "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
          }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
          {
            "shard": 0,
            "index": "demo_datahub_usage_event",
            "node": "abc",
            "reason": {
              "type": "illegal_argument_exception",
              "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
            }
          }
        ],
        "caused_by": {
          "type": "illegal_argument_exception",
          "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
          }
        }
      },
      "status": 400
    }
    at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
    at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
    ... 21 common frames omitted
    Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]
    at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
    at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
    at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
    at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:603)
    at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:179)
    ... 24 common frames omitted
    e
    • 2
    • 23
  • h

    handsome-football-66174

    10/11/2021, 3:51 PM
    General - Using Terraform to deploy the Helm charts on the EKS cluster , getting this Release "datahub-prerequisites" does not exist. Installing it now. 98Error: template: datahub-prerequisites/templates/serviceaccount.yaml114: executing "datahub-prerequisites/templates/serviceaccount.yaml" at <.Values.serviceAccount.create>: nil pointer evaluating interface {}.create Has anyone faced this , or any directions on this is greatly appreciated.
    e
    • 2
    • 33
  • v

    victorious-ambulance-32469

    10/11/2021, 5:20 PM
    Question - How to adjust values.yaml to used a external eslasticsearch service, on gcp? Deploying on GKE with helm.
    e
    • 2
    • 21
  • f

    fresh-fish-73471

    10/11/2021, 9:09 PM
    Query : How to change 8080 port for datahub-gms for docker based installation?            Tried approaches: 1. Changed port for datahub-gms mentioned in docker-compose.yml file pre-container creation. Container creation successful but below error encountered when trying to access the datahub-gms rest backend. 2. Installed through quickstart.sh after making changes to supposedly required files. Container creation successful but same error again encountered. 3. Tried alternate open ports with same issue being replicated.   ERROR: ConnectionError: HTTPConnectionPool(host='X.Y.Z.A', port=8082): Max retries exceeded with url: /config (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at  0x7f8537d00f50>:   Failed to establish a new connection: [Errno 111] Connection refused'))
    e
    • 2
    • 2
  • q

    quaint-lighter-4354

    10/13/2021, 11:58 AM
    Hello guys, I am facing the following problem with deploying datahub v0.8.15 on kubernetes. I am running the elasticsearch-init-job and these are the logs that I am getting:
    Copy code
    2021/10/13 11:17:40 Waiting for: https://<my-es-url>:443
    2021/10/13 11:17:45 Received 200 from https://<my-es-url>:443
    creating datahub_usage_event_policy
    {
      "policy": {
        "policy_id": "datahub_usage_event_policy",
        "description": "Datahub Usage Event Policy",
        "default_state": "Rollover",
        "schema_version": 1,
        "states": [
          {
            "name": "Rollover",
            "actions": [
              {
                "rollover": {
                  "min_index_age": "1d"
                }
              }
            ],
            "transitions": [
              {
                "state_name": "ReadOnly",
                "conditions": {
                  "min_index_age": "7d"
                }
              }
            ]
          },
          {
            "name": "ReadOnly",
            "actions": [
              {
                "read_only": {}
              }
            ],
            "transitions": [
              {
                "state_name": "Delete",
                "conditions": {
                  "min_index_age": "60d"
                }
              }
            ]
          },
          {
            "name": "Delete",
            "actions": [
              {
                "delete": {}
              }
            ],
            "transitions": []
          }
        ],
        "ism_template": {
          "index_patterns": [
            "datahub_usage_event-*"
          ],
          "priority": 100
        }
      }
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    
      0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
    100  1061    0     0  100  1061      0   5055 --:--:-- --:--:-- --:--:--  5052
    100  1807  100   746  100  1061    842   1198 --:--:-- --:--:-- --:--:--  2041
    }{"_id":"datahub_usage_event_policy","_version":1,"_primary_term":1,"_seq_no":0,"policy":{"policy":{"policy_id":"datahub_usage_event_policy","description":"Datahub Usage Event Policy","last_updated_time":1634123866020,"schema_version":1,"error_notification":null,"default_state":"Rollover","states":[{"name":"Rollover","actions":[{"rollover":{"min_index_age":"1d"}}],"transitions":[{"state_name":"ReadOnly","conditions":{"min_index_age":"7d"}}]},{"name":"ReadOnly","actions":[{"read_only":{}}],"transitions":[{"state_name":"Delete","conditions":{"min_index_age":"60d"}}]},{"name":"Delete","actions":[{"delete":{}}],"transitions":[]}],"ism_template":[{"index_patterns":["datahub_usage_event-*"],"priority":100,"last_updated_time":1634123866020}]}}}
    creating datahub_usagAe_event_index_template
    {
      "index_patterns": ["datahub_usage_event-*"],
      "mappings": {
        "properties": {
          "@timestamp": {
            "type": "date"
          },
          "type": {
            "type": "keyword"
          },
          "timestamp": {
            "type": "date"
          },
          "userAgent": {
            "type": "keyword"
          },
          "browserId": {
            "type": "keyword"
          }
        }
      },
      "settings": {
        "index.opendistro.index_state_management.rollover_alias": "datahub_usage_event"
      }
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    
      0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
    100   467  100    21  100   446    180   3824 --:--:-- --:--:-- --:--:--  4025
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    
      0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
    100    59    0     0  100    59      0    287 --:--:-- --:--:-- --:--:--   287
    100   144  100    85  100    59    283    196 --:--:-- --:--:-- --:--:--   480
    2021/10/13 11:17:47 Command finished successfully.
    }{"acknowledged":true}{"acknowledged":true,"shards_acknowledged":true,"index":"datahub_usage_event-000001"}
    So, it seems good. Then, when I am trying to access datahub, I am getting the following error on GMS:
    Copy code
    11:47:09.155 [qtp544724190-13] ERROR c.l.metadata.dao.search.ESSearchDAO - Search query failed:Elasticsearch exception [type=index_not_found_exception, reason=no such index [corpuserinfodocument]]
    11:47:09.155 [qtp544724190-15] ERROR c.l.metadata.dao.search.ESSearchDAO - Search query failed:Elasticsearch exception [type=index_not_found_exception, reason=no such index [dataflowdocument]]
    11:47:09.155 [qtp544724190-11] ERROR c.l.metadata.dao.search.ESSearchDAO - Search query failed:Elasticsearch exception [type=index_not_found_exception, reason=no such index [dashboarddocument]]
    
    11:47:09.129 [qtp544724190-11] ERROR c.l.metadata.dao.search.ESSearchDAO - Search query failed:Elasticsearch exception [type=index_not_found_exception, reason=no such index [datajobdocument]]
    11:47:09.129 [qtp544724190-12] ERROR c.l.metadata.dao.search.ESSearchDAO - Search query failed:Elasticsearch exception [type=index_not_found_exception, reason=no such index [chartdocument]]
    11:47:09.129 [qtp544724190-9] ERROR c.l.metadata.dao.search.ESSearchDAO - Search query failed:Elasticsearch exception [type=index_not_found_exception, reason=no such index [datasetdocument]]
    Does anyone know why is this happening? I am running the elasticsearch job, but then it looks like there are indexes still missing.. I am using the AWS opensearch and I am passing these env variables into the elasticsearch init job: DATAHUB_ANALYTICS_ENABLED: true and USE_AWS_ELASTICSEARCH: true
    e
    • 2
    • 4
  • m

    millions-elephant-81123

    10/14/2021, 1:00 AM
    Hi Team, Another similar issue. Trying to install datahub on K8s and having problems with Kafka,MySQL,zookeeper pods with below errors. Any pointers ?
    b
    • 2
    • 9
  • q

    quaint-lighter-4354

    10/14/2021, 2:46 PM
    Hi guys, I am getting the following error:
    Copy code
    Caused by: 
    org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'entityClientFactory': Unsatisfied dependency expressed through field 'gmsPort'; nested exception is org.springframework.beans.TypeMismatchException: Failed to convert value of type 'java.lang.String' to required type 'int'; nested exception is java.lang.NumberFormatException: For input string: "<tcp://172.20.248.129:8080>"
    	at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:643)
    	at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:116)
    Does anyone know where it's coming from, as I am kind of stuck? In my values.yaml(under global)
    Copy code
    datahub:
        gms:
          port: "8080"
        mae_consumer:
          port: "9091"
        appVersion: "1.0"
    Thanks, Yianni
    b
    r
    • 3
    • 11
  • h

    handsome-football-66174

    10/14/2021, 7:03 PM
    Hi Everyone - I am using Terraform to deploy my helm charts- Getting this error Error: found in Chart.yaml, but missing in charts/ directory: datahub-gms, datahub-frontend, datahub-mae-consumer, datahub-mce-consumer, datahub-ingestion-cron, datahub-jmxexporter 75Traceback (most recent call last): Any ideas ?
    b
    e
    • 3
    • 11
  • f

    fierce-action-87313

    10/15/2021, 7:17 AM
    Hey all. QQ: has anyone deployed datahub using confluent kafka and use AWS Glue Schema registry for schemas ? I would like to use MSK in the future but confluent basic small cluster is just a ton cheaper to get started
    r
    e
    • 3
    • 5
  • b

    better-orange-49102

    10/19/2021, 11:45 PM
    Did anyone set persistent volumes for gms and frontend in their k8s deployments? I've noticed restarted pods because ephemereal storage space has ran out. I assume it's because of the datahub logs or something. I'm not using helm, but it doesn't seem like they specified pvs as well.
    e
    • 2
    • 1
  • a

    agreeable-hamburger-38305

    10/22/2021, 7:40 PM
    I deployed with
    helm install
    and ran
    helm uninstall
    this morning, but the pods are still running now (2 hours later). Anyone know what might be the reason?
    helm ls -a
    shows nothing
    e
    • 2
    • 3
  • h

    handsome-football-66174

    10/26/2021, 8:09 PM
    General Question - How do we edit policies for Datahub running on EKS ? Only see this https://datahubproject.io/docs/policies/
    plus1 1
    e
    b
    +3
    • 6
    • 37
  • c

    calm-morning-92759

    10/28/2021, 3:28 PM
    Hello together, after some local test with docker we have deployed datahub on kubernetes using the existing helm charts. This works great. Now we want to enable Oauth consent screen with google. Therefore we used this tutorial. https://github.com/linkedin/datahub/blob/master/docs/how/auth/sso/configure-oidc-react-google.md I have now bring these env variables to the frontend pod / deployment.
    Copy code
    AUTH_OIDC_ENABLED=true
    AUTH_OIDC_CLIENT_ID=your-client-id
    AUTH_OIDC_CLIENT_SECRET=your-client-secret
    AUTH_OIDC_DISCOVERY_URI=<https://accounts.google.com/.well-known/openid-configuration>
    AUTH_OIDC_BASE_URL=your-datahub-url
    AUTH_OIDC_SCOPE="openid profile email"
    AUTH_OIDC_USER_NAME_CLAIM=email
    AUTH_OIDC_USER_NAME_CLAIM_REGEX=([^@]+)
    Anyway to change the deployment config manually seems not the right way. I am wondering what could be the best way to combine custom config with helm charts. It seems that it is not possible to add the values in the values.yaml file. Hope, that someone can share his experience...
    b
    l
    p
    • 4
    • 5
  • s

    salmon-cricket-21860

    10/29/2021, 8:22 AM
    Hi all, I want to allow only registered users to log-in.
    Copy code
    - name: AUTH_OIDC_USER_NAME_CLAIM
          value: "email"
        - name: AUTH_OIDC_USER_NAME_CLAIM_REGEX
          value: "([^@]+)"
        - name: AUTH_OIDC_JIT_PROVISIONING_ENABLED
          value: "false"
        - name: AUTH_OIDC_PRE_PROVISIONING_REQUIRED
          value: "true"
    But failed to restrict non-existing users to login. 1. Removed urn rows from RDB (
    DELETE FROM metadata_aspect_v2 WHERE urn = 'urn:li:corpuser:test-user'
    ) 2. Deployed datahub with those env variables 3. Tried to login with
    <mailto:test-use@my.company.com|test-use@my.company.com>
    Still possible to login, but extraction of group and other information didn't work. (included screenshot too) Am I missing something? I want to disable login for non-registered users.
    test-user
    in this case. (I am using google oauth. FYI)
    b
    c
    • 3
    • 11
  • h

    high-hospital-85984

    10/29/2021, 4:26 PM
    Running into some issues with setting up a new datahub deployment with Postgres. The setup job runs this: https://github.com/linkedin/datahub/blob/master/docker/postgres-setup/init.sh However, it doesn't seem to work. When I break down the command and run
    psql -U $POSTGRES_USERNAME -h $POSTGRES_HOST -p $POSTGRES_PORT -tc "SELECT 1 FROM pg_database WHERE datname = '${DATAHUB_DB_NAME}'"
    where
    POSTGRES_USERNAME=datahub
    and
    DATAHUB_DB_NAME=<something else>
    I get an error
    psql: error: FATAL:  database "datahub" does not exist
    . Looks like psql tries to attach to a database named after the user, and this fails when we want to create a db with a non-standard name. Or am I missing something?
    e
    • 2
    • 3
  • s

    some-cricket-23089

    11/01/2021, 10:34 AM
    Hi Team , can we restore the any dataset aspects to its previous version ?
    g
    • 2
    • 8
  • b

    bulky-intern-2942

    11/01/2021, 3:16 PM
    Hey guys, I´m gonna start using DataHub on AWS. Could you please tell me the best way to deploy DataHub on AWS?
    g
    • 2
    • 1
  • r

    red-pizza-28006

    11/01/2021, 3:25 PM
    what is the easiest way to delete all data from datahub?
    f
    b
    +2
    • 5
    • 7
  • h

    handsome-football-66174

    11/01/2021, 8:51 PM
    General - Trying to Integrate another app to manage the Groups and users . How to we achieve this. ? Currently I see we have https://datahubproject.io/docs/metadata-service/#ingesting-entities . Also if we were to use this, how do authenticate such request (OAuth2, API token) ?
    b
    • 2
    • 49
  • v

    victorious-dream-46349

    11/02/2021, 7:03 AM
    Is it possible to make rest api requests to GMS through FE ?
    b
    b
    • 3
    • 15
  • c

    chilly-analyst-561

    11/02/2021, 6:14 PM
    Hi team, I'm using a deploy with OIDC authentication and it's working ok, but I have a doubt. How do I associate OIDC token groups with profiles on Datahub? For example, an OIDC group called admins_users is Datahub admins.
    b
    l
    +2
    • 5
    • 17
  • a

    agreeable-hamburger-38305

    11/02/2021, 9:34 PM
    Hi all, because my organization doesnt support Helm yet, I rendered raw yamls from datahub’s helm chart and used Kustomize to deploy on k8s. Does anyone know how I can get this new configurable profiling working there? https://github.com/linkedin/datahub/pull/3453
    e
    h
    • 3
    • 14
  • a

    agreeable-hamburger-38305

    11/02/2021, 9:36 PM
    I am also wondering if the helm chart version
    0.2.1
    is tied to Datahub version
    0.8.16
    . A little confused about the relationship between the helm chart version and Datahub version
    e
    • 2
    • 4
  • c

    curved-jordan-15657

    11/04/2021, 12:01 AM
    Hi team! I wan’t to use postgres as a db of datahub. I’ve tried to get the datahub-postgres-setup from acryldata dockerhub, but since the mysql-setup-job includes MYSQL_PORT, MYSQL_HOST etc, it can’t connect the db since docker image requires POSTGRES_HOST etc. What can i do about this problem? We were using mysql but now we need postgresql. Basically, how do we do that 😄
    e
    • 2
    • 4
  • r

    red-pizza-28006

    11/04/2021, 6:17 PM
    today i moved the mysql host out of k8 into a dedicated rds instance and all went well. But i started noticing that the UI in general became slow. The size of the instance is a 2 core 1gb Ram, so the smallest instance...could that be the reason of the slowness, and if so, how can i set the correct size of the instance?
    e
    • 2
    • 2
12345...53Latest