https://datahubproject.io logo
Join Slack
Powered by
# all-things-deployment
  • p

    polite-application-51650

    10/10/2022, 4:24 AM
    Hi Team, I deployed datahub on my k8s cluster, upon doing the ingestion through UI this is the error I'm getting
    Copy code
    04:17:34.148 [ThreadPoolTaskExecutor-1] WARN  c.l.m.k.DataHubUsageEventsProcessor:56 - Failed to apply usage events transform to record: {"type":"HomePageViewEvent","actorUrn":"urn:li:corpuser:datahub","timestamp":1665375452822,"date":"Mon Oct 10 2022 09:47:32 GMT+0530 (India Standard Time)","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36","browserId":"bbaaab7a-052b-40d6-a41f-715191432a39"}
    04:17:34.159 [pool-14-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 7ms
    04:17:34.237 [I/O dispatcher 1] INFO  c.l.m.k.e.ElasticsearchConnector:41 - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
    04:17:43.580 [pool-14-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 37ms
    04:17:43.659 [I/O dispatcher 1] INFO  c.l.m.k.e.ElasticsearchConnector:41 - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
    04:17:43.677 [Thread-62] WARN  c.l.m.s.e.q.r.SearchRequestHandler:444 - Found invalid filter field for entity search. Invalid or unrecognized facet ingestionSource
    04:17:46.502 [Thread-66] WARN  c.l.m.s.e.q.r.SearchRequestHandler:444 - Found invalid filter field for entity search. Invalid or unrecognized facet ingestionSource
    04:19:14.391 [ThreadPoolTaskExecutor-1] INFO  c.l.m.k.h.i.IngestionSchedulerHook:56 - Received UPSERT to Ingestion Source. Rescheduling the source (if applicable). urn: urn:li:dataHubIngestionSource:e6330868-94ec-4339-9df5-3c72f9d628ea, key: null.
    04:19:14.392 [ThreadPoolTaskExecutor-1] INFO  c.d.m.ingestion.IngestionScheduler:105 - Unscheduling ingestion source with urn urn:li:dataHubIngestionSource:e6330868-94ec-4339-9df5-3c72f9d628ea
    04:19:14.393 [ThreadPoolTaskExecutor-1] INFO  c.d.m.ingestion.IngestionScheduler:138 - Scheduling next execution of Ingestion Source with urn urn:li:dataHubIngestionSource:e6330868-94ec-4339-9df5-3c72f9d628ea. Schedule: 0 0 * * *
    04:19:14.401 [ThreadPoolTaskExecutor-1] INFO  c.d.m.ingestion.IngestionScheduler:167 - Scheduled next execution of Ingestion Source with urn urn:li:dataHubIngestionSource:e6330868-94ec-4339-9df5-3c72f9d628ea in 51045601ms.
    04:19:15.403 [I/O dispatcher 1] ERROR c.l.m.s.e.update.BulkListener:25 - Failed to feed bulk request. Number of events: 8 Took time ms: -1 Message: failure in bulk execution:
    [3]: index [datahubexecutionrequestindex_v2], type [_doc], id [urn%3Ali%3AdataHubExecutionRequest%3A7846970b-514c-4a9e-931a-980dca6c8e52], message [[datahubexecutionrequestindex_v2/wfhCmJ_jR0e48O5ItrryJA][[datahubexecutionrequestindex_v2][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn%3Ali%3AdataHubExecutionRequest%3A7846970b-514c-4a9e-931a-980dca6c8e52]: document missing]]]
    04:19:16.965 [Thread-77] WARN  c.l.m.s.e.q.r.SearchRequestHandler:444 - Found invalid filter field for entity search. Invalid or unrecognized facet ingestionSource
    04:19:18.367 [Thread-80] WARN  c.l.m.s.e.q.r.SearchRequestHandler:444 - Found invalid filter field for entity search. Invalid or unrecognized facet ingestionSource
    04:19:18.921 [Thread-83] WARN  c.l.m.s.e.q.r.SearchRequestHandler:444 - Found invalid filter field for entity search. Invalid or unrecognized facet ingestionSource
    f
    t
    +2
    • 5
    • 10
  • w

    white-beard-86056

    10/10/2022, 6:53 AM
    Hi Guys, Need some Help for Datahub Deployment. We are trying to connect to an external Kafka Cluster (Authentification with mTLS). Kafka Setup Container fails on startup.
    Copy code
    Caused by: org.apache.kafka.common.KafkaException: Failed to load SSL keystore /mnt/datahub/certs/keystore.jks of type PKCS12
    at org.apache.kafka.common.security.ssl.DefaultSslEngineFactory$FileBasedStore.load(DefaultSslEngineFactory.java:377)
    at org.apache.kafka.common.security.ssl.DefaultSslEngineFactory$FileBasedStore.<init>(DefaultSslEngineFactory.java:349)
    at org.apache.kafka.common.security.ssl.DefaultSslEngineFactory.createKeystore(DefaultSslEngineFactory.java:299)
    at org.apache.kafka.common.security.ssl.DefaultSslEngineFactory.configure(DefaultSslEngineFactory.java:161)
    at org.apache.kafka.common.security.ssl.SslFactory.instantiateSslEngineFactory(SslFactory.java:138)
    at org.apache.kafka.common.security.ssl.SslFactory.configure(SslFactory.java:95)
    at org.apache.kafka.common.network.SslChannelBuilder.configure(SslChannelBuilder.java:71)
    at org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:157)
    at org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:73)
    at org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:105)
    at org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:508)
    ... 4 more
    Caused by: java.io.IOException: DerInputStream.getLength(): lengthTag=111, too big.
    at sun.security.util.DerInputStream.getLength(DerInputStream.java:601)
    at sun.security.util.DerValue.init(DerValue.java:384)
    at sun.security.util.DerValue.<init>(DerValue.java:325)
    at sun.security.util.DerValue.<init>(DerValue.java:338)
    at sun.security.pkcs12.PKCS12KeyStore.engineLoad(PKCS12KeyStore.java:1958)
    at java.security.KeyStore.load(KeyStore.java:1445)
    at org.apache.kafka.common.security.ssl.DefaultSslEngineFactory$FileBasedStore.load(DefaultSslEngineFactory.java:374)
    Our truststore and keystore commands look like this:
    Copy code
    keytool -noprompt -keystore truststore.jks -storetype pkcs12 -alias kafka-ca01-q -trustcacerts -import -file <ca> -deststorepass <password>
    openssl pkcs12 -export -in <cert> -inkey <key> -out client.p12 -name localhost -passout pass:<password>
    keytool -importkeystore -srckeystore client.p12 -srcstoretype pkcs12 -srcstorepass <password> -destkeystore ./keystore.jks -deststoretype pkcs12 -deststorepass <password> -destkeypass <password>
    keytool -noprompt -keystore keystore.jks -storetype pkcs12 -alias kafka-ca01-q -import -file <ca> -storepass <password>
    plus1 1
    b
    a
    g
    • 4
    • 13
  • p

    prehistoric-room-17640

    10/10/2022, 3:30 PM
    hi datahub community, we've integrated datahub with OKTA and can add users and those users can log-in, however we have a problem that users who are administrators/admins can not add users to groups. When I try to use my OKTA user to login and add a user to a group I get "Not Authorized" error:
    b
    m
    e
    • 4
    • 11
  • t

    thousands-solstice-2498

    10/11/2022, 2:05 AM
    Hi Team, Please advise. Error from server (Cannot have service type of LoadBalancer): error when creating "STDIN": admission webhook "validating-webhook.openpolicyagent.org" denied the request: Cannot have service type of LoadBalancer
    b
    • 2
    • 9
  • m

    microscopic-mechanic-13766

    10/11/2022, 2:33 PM
    Good afternoon, last time I checked, validations weren't possible for Hive due to not having the executor needed implemented. I would like to know if the situation has changed, and if such executor has been finally implemented, or I would have to do something similar to what is said here. Thanks in advance!!
  • c

    cuddly-arm-8412

    10/12/2022, 8:56 AM
    hi,team,I want to know the difference between these annotations TEXT_ PARTIAL TEXT
    Copy code
    @Searchable = {
      "fieldType": "TEXT_PARTIAL",
      "enableAutocomplete": true,
      "boostScore": 10.0
    }
    name: optional string
  • q

    quiet-wolf-56299

    10/12/2022, 1:35 PM
    Hi i’m working on a deployment with a local onPrem static MySQL instance, I was wondering what are the minimum required permissions for the user account which datahub accesses its MySQL database from?
  • q

    quiet-wolf-56299

    10/12/2022, 1:35 PM
    EG Insert Select Delete only? etc
  • q

    quiet-wolf-56299

    10/12/2022, 3:59 PM
    An additional question. Does Datahub have an overall “healthcheck” endpoint that I could point our ingress controllers at? (We use F5 bigIP for ingress control to the network, they require a health check of some kind to ensure the server is up before it sends traffic)
    p
    b
    • 3
    • 4
  • w

    wonderful-book-58712

    10/12/2022, 5:26 PM
    Do we have any best practices for enabling UI via https : I see there is a placeholder for sslContext do we have an INterface defined which we can use
    q
    • 2
    • 6
  • b

    better-orange-49102

    10/13/2022, 4:51 AM
    For k8s deployment, has anyone encountered situations where the bitnami MySQL helm chart is not enough to ensure reliability? Just wondering if there are other helm charts we can use or something
    b
    • 2
    • 5
  • l

    lemon-cat-72045

    10/13/2022, 8:25 AM
    Hi all, I'm trying to deploy datahub while using our own Kafka cluster instead of creating a new one in Kubernetes. Is there an example helm chart I can follow along? Thanks!
    b
    • 2
    • 4
  • c

    calm-dinner-63735

    10/13/2022, 9:03 AM
    Hi team helm install i am getting this error
    b
    • 2
    • 12
  • c

    calm-dinner-63735

    10/13/2022, 10:48 AM
    can i get some help here
  • f

    fast-ice-59096

    10/13/2022, 1:08 PM
    Hi, everyone, I am trying to install datahub in the azure kubernetes service. When I use the instruction helm install datahub datahub/datahub, I get the error Error: INSTALLATION FAILED: failed pre-install: timed out waiting for the condition. All the prerequises are running. Does anyone Know what I can do?
    b
    b
    • 3
    • 26
  • r

    rhythmic-army-76816

    10/13/2022, 1:50 PM
    Hi, I have basic question about mce and mae Consumers. The documentation (datahub-helm repo) states that these components are optional. I read through docs and Slack history and I find that this information is misleading. Is it true that they are not at all optional? It appears that only optional thing is whether they are run standalone or within gms container. Is my understanding correct?
    plus1 1
    b
    • 2
    • 4
  • a

    ambitious-magazine-36012

    10/13/2022, 4:18 PM
    ERROR: Trying to build from source on m1 Mac, getting an error for “Unable to download tool chain. This might indicate that the combination for the requested idk is not available. Could not read ‘https://api.adoptopenjdk.net/v3/binary/latest/8/ga/mac/aarch64/MDM/hotspot/normal/adoptopenjdk’ as it does not exist.
    b
    l
    • 3
    • 4
  • l

    lemon-cat-72045

    10/14/2022, 4:08 AM
    Hi everyone, I am having issues running the elasticsearch setup job during deployment. Logs can be found in the thread. Does anyone know how to fix this? thanks!
    f
    • 2
    • 3
  • h

    high-hospital-85984

    10/14/2022, 6:33 AM
    We’ve been using opensearch instead of elasticsearch for a while now, without issues. Now we’re looking to update to a newer version of opensearch, and got nervous about the compatibility. Are there any integration tests we could run locally to ensure the graph backend still works as expected?
    f
    h
    • 3
    • 5
  • b

    better-orange-49102

    10/17/2022, 1:02 PM
    potentially a noob question: I am interested in retaining the logs for GMS, however, as gms and frontend are Deployments, they dont scale up with RW Once PVs (I don't have access to ReadWriteMany PVs) hence my thoughts are that I will use podAffinity to colocate all the GMS pods (I wanted at least 2 instances) so that they can use the same RWO PV, my values.yaml for gms looks like:
    Copy code
    datahub-gms:
      enabled: true
      image:
        repository: linkedin/datahub-gms
        tag: "v0.8.44"
      service:
        type: ClusterIP
      replicaCount: 2
      extraVolumes:
        - name: gms-backup
          persistentVolumeClaim:
            claimName: temp2
      extraVolumeMounts:
        - name: gms-backup
          mountPath: /tmp/datahub/logs/gms      
          readOnly: false
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: <http://app.kubernetes.io/name|app.kubernetes.io/name>
                operator: In
                values:
                - datahub-gms
            topologyKey: <http://kubernetes.io/hostname|kubernetes.io/hostname>
    however, when i kubectl exec into gms, I see that /tmp/datahub/logs/gms has lost+found instead of the logs. any ideas why? I'm using GKE here
    ✅ 1
    • 1
    • 1
  • f

    full-chef-85630

    10/17/2022, 2:50 PM
    Using v0.9.0, ./gradlew datahub frontenddist @dazzling-judge-80093 env:
    Copy code
    shdzh-mbp-1:datahub shdzh$ java -version
    openjdk version "1.8.0_345"
    OpenJDK Runtime Environment (build 1.8.0_345-bre_2022_08_04_23_35-b00)
    OpenJDK 64-Bit Server VM (build 25.345-b00, mixed mode)
    
    shdzh-mbp-1:datahub shdzh$ gradle -v
    
    ------------------------------------------------------------
    Gradle 7.5
    ------------------------------------------------------------
    
    Build time:   2022-07-14 12:48:15 UTC
    Revision:     c7db7b958189ad2b0c1472b6fe663e6d654a5103
    
    Kotlin:       1.6.21
    Groovy:       3.0.10
    Ant:          Apache Ant(TM) version 1.10.11 compiled on July 10 2021
    JVM:          1.8.0_345 (Homebrew 25.345-b00)
    OS:           Mac OS X 12.5.1 x86_64
    err info:
    Copy code
    > Task :datahub-frontend:compileScala
    [Error] /Users/shdzh/work/source/0.9/datahub/datahub-frontend/app/auth/AuthModule.java:8:33:  error: cannot access Actor
    javac exited with exit code 1
    
    > Task :datahub-frontend:compileScala FAILED
    
    FAILURE: Build failed with an exception.
    
    * What went wrong:
    Execution failed for task ':datahub-frontend:compileScala'.
    > javac returned non-zero exit code
    
    * Try:
    Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
    
    * Get more help at <https://help.gradle.org>
    
    BUILD FAILED in 36s
    70 actionable tasks: 17 executed, 53 up-to-date
    s
    s
    • 3
    • 6
  • l

    lemon-cat-72045

    10/18/2022, 4:33 PM
    Does datahub support using postgres12 as its local database? Thanks.
    s
    f
    • 3
    • 3
  • p

    polite-alarm-98901

    10/18/2022, 5:12 PM
    trying to set up datahub behind an internal proxy, and am running into this error
    Copy code
    play.api.UnexpectedException: Unexpected exception[ServerResultException: HTTP 1.0 client does not support chunked response]
    I noticed that this issue mentions a similar thing but it never got a response; is there a way to do what I’m trying to do? if it helps, the proxy being used is
    haproxy
    (I think)
    plus1 1
    s
    q
    • 3
    • 7
  • f

    famous-florist-7218

    10/19/2022, 8:50 AM
    Hi everyone, Last week, I was about to use AWS Opensearch but faced to this issue. The
    elasticsearchSetupJob
    couldn’t communicate with the Opensearch domain. Any thoughts? Configuration is referenced from here. The error message is:
    Copy code
    Received 401 from https://<<vpc-endpoint>>:443. Sleeping 1s
    s
    • 2
    • 4
  • m

    mysterious-carpet-43629

    10/19/2022, 9:43 AM
    Hi Is there a good way to set all DH frontend and gms server logs to be in json format? We found a way to add jars and change logback config but it is super ugly…
    s
    s
    • 3
    • 3
  • f

    fast-ice-59096

    10/20/2022, 9:02 AM
    Hi, everyone, I sent an invite link to my team members to create their account in datahub. All of them reported the following error:
    s
    • 2
    • 2
  • f

    fast-ice-59096

    10/20/2022, 9:03 AM
    MicrosoftTeams-image.png
    s
    • 2
    • 7
  • f

    fast-ice-59096

    10/20/2022, 9:04 AM
    Does anyone have any idea of the reason for this error?
    s
    • 2
    • 3
  • m

    microscopic-mechanic-13766

    10/20/2022, 9:20 AM
    Hello, one quick question, is it possible to make the "View in airflow" no visible?? I have noticed that in the latest version of the demo, this button does not appear, so I wanted to know if it was the default behaviour or not Thanks in advance!
    s
    • 2
    • 5
  • l

    lemon-cat-72045

    10/20/2022, 4:44 PM
    Hello, a questions on configuring datahub with Kafka. • Do we need to enable
    datahub-kafka-setup
    job when we are connecting to our own Kafka? • What does
    datahub-kafka-setup
    do exactly? Since when I check the helm template it seems that all components have their own Kafka configurations. • I found that some of the
    deployment.yaml
    in the subchart template uses
    KAFKA_PROPERTIES_{{ $configName | replace "." "_" | upper }}
    but some uses
    SPRING_KAFKA_PROPERTIES_{{ $configName | replace "." "_" | upper }}
    . May I know the difference between this to configuration settings?
    b
    • 2
    • 12
1...252627...53Latest